Tuesday, 6 December 2011

GetHashCode

We basically use hash functions for Data integrity, faster look up etc.
Imagine doing a 'foreach' to look up an item in a hundred of thousands of List, it will certainly hamper the performance. One alternative is to use hashes to create something called _buckets_ of List. Essentially, it means List[] instead of List. Eric's article explains this, and he lists down all the guidelines for using GetHashCode, and i try to summarize below

1. Equal items should have equal hash.

Class Employee
{
      Public int Name { get; set; }
}

Employee suresh1 = new Employee { Name = "Suresh" };
Employee suresh2 = new Employee { Name = "Suresh" };

Ideally suresh1.GetHashCode() == suresh2.GetHashCode()
[Normally this is not the case, because String.GetHashCode will return different hash for same value, but the point here is we need to modify the GetHashCode()]

if this didn't return same value, ContainsKey will fail due to look up on different bucket than it is originally stored.

2. GetHashCode on a type should be based on its immutable fields or it's value should never change, if possible. At least it should not change while it is contained on a data structure based on hashing like Dictionary/HashSet/HashTable etc.

3. Should never throw exception.

4. Should be really fast.

5. Logically related records are physically stored together, but randomly. If we are going to store clustered data in a same bucket, it will affect performance as other buckets will never be used. So this needs to be random.

6. It is often suggested to use Big prime numbers while calculating Prime numbers instead of XOR.

int hash = 13;
hash = (hash * 7) + field1.GetHashCode();
hash = (hash * 7) + field2.GethashCode();
...
return hash;
 

7. Hash Collision - If two items have same hash code, it means they are equal(Rule 1), but not always. Hence we need to override equals() or  IEqualityComparer

Some useful stuffs

http://blogs.msdn.com/b/ericlippert/archive/2010/03/22/socks-birthdays-and-hash-collisions.aspx
http://msdn.microsoft.com/en-us/library/ms379571%28VS.80%29.aspx#datastructures20_2_topic5
http://stackoverflow.com/questions/371328/why-is-it-important-to-override-gethashcode-when-equals-method-is-overriden-in-c
http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode#263416
http://stackoverflow.com/questions/638761/c-gethashcode-override-of-object-containing-generic-array/639098#639098

No comments: