Unicode Transformation Issue
Description
The Unicode Standard represents a very significant advance over all previous methods of encoding characters. For the first time, all of the world’s characters can be represented in a uniform manner, making it feasible for the vast majority of programs to be globalized: built to handle any language in the world. In many ways, the use of Unicode makes programs much more robust and secure. When systems used a hodge-podge of different charsets for representing characters, there were security and corruption problems that resulted from differences between those charsets, or from the way in which programs converted to and from them. However, because Unicode contains such a large number of characters, and incorporates the varied writing systems of the world, incorrect usage can expose programs or systems to possible security attacks like below one:
- Visual Security Issues
- UTF-8 Exploits
- Text Comparison (Sorting, Searching, Matching)
- Buffer Overflows
- Deletion of Code Points
Hackers can use these attacks to bypass WAFs and exploit XSS and SQL Injection vulnerabilities.
Recommendation
Check all the functions where the input is passed through and make sure all unicode security considerations in the reference are applied. If you are using a library, make sure it is up to date.
References
- Blackhat: Unicode Security
- Unicode Security Considerations
- CWE-176
- OWASP 2013-A1
- OWASP 2007-A2
- OWASP 2021-A3
- CWE-20