Hash tables are one of those things that I definitely understand and can implement but I have always struggled at recognizing when to use them. If you have been following my blog for a while you have probably noticed that I could have used a Hash Table to solve a problem at least once.
What is a Hash Table?
A hash table, also known as a hash map, is a data structure that “implements an associative array abstract data type, a structure that can map keys to values.”
Okay so what does that mean in English? Well let’s first look at a Hash Function. Hash functions take an input, for example: “Arya”. It then breaks down each letter, number, symbol into numbers that they are equal to in unicode, UTF-8, or ASCii. Most hash functions will then add those integers together and return of the sum modulo the limit that they set for their Hash Table or array.
For example: Let’s say I have an array (aka Hash Table) and it has 8 empty slots. I want to store “Arya” in that array and be able to refer back to that string at any given time at constant time. Let’s pretend A=1, r=2,=y=3, a=4. If we add up those values we get 10. 10 % 8 is 2. The string “Arya” would be stored at the 2nd index of that array (aka Hash Table).
Now you might be thinking, okay but what if two words equal the same hash? That is more likely to happen if you have a smaller array. If you have a larger array like 142 open slots then maybe this wouldn’t be too much of an issue. The issue I am referring to is “Collisions”. That’s the term for two inputs hashing to the same integer. You can solve this in a couple of different ways. One option is you create a bucket. For example, I also want to put “Oryx” into my array but lo and behold, it hashed to 2 as well. You would then create an array at index 2 that holds both “Arya” and “Oryx”. Then on look up you look inside the bucket and find the one you want. This could take up extra time though.
Pit-Falls of Hash Tables
Worst case scenario, your look up takes linear time. Meaning, the longer your array, the longer it is going to take. My array in my example only had 8 empty slots so that should not be an issue but if my array had 142 elements inside, that would take forever.
It’s not ordered. Keys are not stored in a special order. If you are looking for the smallest key, the largest key or all the keys in a range, you’ll need to look through every key to find it. This is mostly in reference to if you decide to use an object as your hash table. Or if you have sets inside of your array.
Single-directional look ups. While you can look up the value for a given key in constant time, looking up the keys for a given value requires looping through the entire dataset – which is linear time. Again not too great for large arrays.
It is not cache-friendly. Many hash implementations use linked lists which don’t put data next to each other in memory.
Again if you have anything to add, leave it in the comments!