Just for my reference
http://csharpindepth.com/Articles/General/Unicode.aspx
http://stackoverflow.com/questions/496321/utf8-utf16-and-utf32
http://stackoverflow.com/questions/643694/utf-8-vs-unicode
In nutshell
1. .Net, by default use UTF - 16, and Encoding.Unicode means UTF - 16.
2. UTF8: Variable-width encoding, backwards compatible with ASCII.
ASCII characters (U+0000 to U+007F) take 1 byte, code points U+0080 to
U+07FF take 2 bytes, code points U+0800 to U+FFFF take 3 bytes, code
points U+10000 to U+10FFFF take 4 bytes. Good for English text, not so
good for Asian text.
3. UTF16: Variable-width encoding. Code points U+0000 to U+FFFF take 2
bytes, code points U+10000 to U+10FFFF take 4 bytes. Bad for English
text, good for Asian text.
4. UTF32: Fixed-width encoding. All code points take 4 bytes. An enormous memory hog, but fast to operate on. Rarely used.
No comments:
Post a Comment