Wednesday, 29 February 2012

Console.OutputEncoding

Sometimes we need to output non-english characters in Console, this is possible(really?) using Console.OutputEncoding property. Below is sample code to output latin & chinese characters

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;

namespace Test
{
    class Program
    {
        static string latinString = "蘇雷什";
        static string chineseString = "蘇雷什:密碼";

        [DllImport("kernel32.dll")]
        static extern bool SetConsoleOutputCP(uint wCodePageID);

        static void Main(string[] args)
        {
            var latinEncoding = Encoding.GetEncoding("ISO-8859-1");
            var utf8Encoding = Encoding.GetEncoding("UTF-8");

            //-------Start Chinese Characters------------

            SetConsoleOutputCP((uint)utf8Encoding.CodePage);

            Console.OutputEncoding = utf8Encoding;
            Console.WriteLine(chineseString);
          
            //-------End Chinese Characters------------

            //-------Start Latin Characters------------

            SetConsoleOutputCP((uint)latinEncoding.WindowsCodePage);

            Console.OutputEncoding = latinEncoding;
            Console.WriteLine(latinString);

            //-------End Latin Characters------------
        }
    }
}

But wait - it ain't going to display on windows(English) OS.

http://stackoverflow.com/questions/6405428/can-i-get-console-to-show-chinese
http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/fe549b29-3e09-4176-b317-1e0094abf71a/
http://www.csharp-examples.net/culture-names/
http://stackoverflow.com/questions/2213541/vietnamese-character-in-net-console-application-utf-8

Wednesday, 15 February 2012

Useful functions - contd.....

public static void PrintHexString(byte data)
{
     Console.WriteLine("Hex of {0} is {1}", data, BitConverter.ToString(BitConverter.GetBytes(data)));
}
-----------------------
private static void DemoNegativeNumberStorageInBinary()
{
            Console.WriteLine("Binary Representation of {0} - {1}", 1000, Convert.ToString(1000, 2));

            /* how negative is done
             * short value = -7;
             * 7 = 0000 0000 0000 0111
             * Not (7) = 1111 1111 1111 1000
             * Add 1 = 1111 1111 1111 1000
             *                           +
             *                           1
             *         1111 1111 1111 1001
             *         (1)111 1111 1111 1001 - is negative because it is 1
             *        
             *         -7 + 7 = 0
             *          1111 1111 1111 1001 (+)
             *          0000 0000 0000 0111
             *          0000 0000 0000 0000
            */
            Console.WriteLine("Binary Representation of {0} - {1}", -1000, Convert.ToString(-1000, 2));
}
-----------------------


Unicode and Encoding

Just for my reference
http://csharpindepth.com/Articles/General/Unicode.aspx
http://stackoverflow.com/questions/496321/utf8-utf16-and-utf32
http://stackoverflow.com/questions/643694/utf-8-vs-unicode

In nutshell
1. .Net, by default use UTF - 16, and Encoding.Unicode means  UTF - 16.
2.  UTF8: Variable-width encoding, backwards compatible with ASCII. ASCII characters (U+0000 to U+007F) take 1 byte, code points U+0080 to U+07FF take 2 bytes, code points U+0800 to U+FFFF take 3 bytes, code points U+10000 to U+10FFFF take 4 bytes. Good for English text, not so good for Asian text.
3. UTF16: Variable-width encoding. Code points U+0000 to U+FFFF take 2 bytes, code points U+10000 to U+10FFFF take 4 bytes. Bad for English text, good for Asian text.
4. UTF32: Fixed-width encoding. All code points take 4 bytes. An enormous memory hog, but fast to operate on. Rarely used.