What is the basic use of a Hash Table?
A hash table organizes data so you can quickly look up values for a given key.
Think of a hash map as a "hack" on top of an array to let us use flexible keys instead of being stuck with sequential integer "indices."
What are the basic strengths and weaknesses of a Hash Table?
Strengths:
Fast lookups. Lookups take time on average.
Flexible keys. Most data types can be used for keys, as long as they're hashable.
Weaknesses:
Slow worst-case lookups. Lookups take time in the worst case.
Unordered. Keys aren't stored in a special order. If you're looking for the smallest key, the largest key, or all the keys in a range, you'll need to look through every key to find it.
Single-directional lookups. While you can look up the value for a given key in time, looking up the keys for a given value requires looping through the whole dataset— time.
Not cache-friendly. Many hash table implementations use linked lists, which don't put data next to each other in memory.
What are some of the common uses for Hash Tables?
Associative arrays: Hash tables are commonly used to implement many types of in-memory tables. They are used to implement associative arrays (arrays whose indices are arbitrary strings or other complicated objects).
Database indexing: Hash tables may also be used as disk-based data structures and database indices (such as in dbm).
Caches: Hash tables can be used to implement caches i.e. auxiliary data tables that are used to speed up the access to data, which is primarily stored in slower media.
Object representation: Several dynamic languages, such as Perl, Python, JavaScript, and Ruby use hash tables to implement objects.
Hash Functions are used in various algorithms to make their computing faster
What are basic components of a Hash Map/Table?
1) A zero-initialized array of stored keys, where the indices will represent the keys calculated by a hashing function
2) A hashing function that converts the arbitrary key data type to an integer index (if the user wants a key to be a string for example, we could convert the characters of the string into individual ASCII integers and add them together, then modulo by some large enough prime number to produce a likely-unique key.
3) Collision reduction/proofing - some extra handling in case it is still possible for collisions to occur. This could be done by having a linked list to store the actual keys at the collision index (to iterate through as a secondary step) or by incrementing the index by some set amount repeatedly until we find an empty slot.
What is the difference between a Hash Map (Dictionary) and a Hash Table?
Essentially similar to a Hash Table with two main differences:
A HashTable doesn't allow null keys or values; a HashMap does.
A HashTable is synchronized to prevent multiple threads from accessing it at once; a HashMap isn't.
What is another common name for a Hash Map?
Dictionary
What is another common name for a Dictionary?
Hash Map
(not Hash Table!)
What is the average time complexity for searching a Hash Map/Table?
What is the WCS time complexity for searching a Hash Map/Table?
O(1)
O(n)
(usually due to implementation issue)
What is the average time complexity for inserting into a Hash Map/Table?
What is the WCS time complexity for inserting into a Hash Map/Table?
O(log n)
What is the average time complexity for removing from a Hash Map/Table?
What is the WCS time complexity for removing from a Hash Map/Table?
What is the average time complexity to traverse a Hash Map/Table in order?
O(n log n)
This is not a data structure optimized for this use. It is best for optimizing add, remove, and lookup.
What is the space consumption of a Hash Map/Table?
Because it is a constant size that is then proportional to the number of unique stored key/value pairs.
What is a simple hash function for keys that have an int data type?
int
key % capacity
Where capacity is the currently allotted size of the array used to store the key/value pairs.
To handle hash collisions, use another structure (such as a Linked List of key/value/next nodes) at each index of the Hash Map/Table’s array.
What is a simple hash function for keys that have a char data type?
char
ord(key) % capacity
Convert the char to an int and do the basic hashing by capacity.
capacity
What is a simple hash function for keys that have a string data type?
string
char_sum = 0
for c in key:
char_sum += ord(c)
return char_sum % capacity
Sum the integer values for all of the characters in the string to get an integer value, then hash that.
What are two of the most common ways to handle hash collisions?
If we hash a key and find that the resulting index is already occupied, we just iterate through the list by some amount (sometimes simple index++, sometimes index², etc.) until we find an empty position.
index++
index²
This also impacts search - if we get the wrong key at the expected hash, we iterate until we find it or an empty index - we can stop at empty because we know that’s where the key would have existed if it was in the table at all.
Store values as data structures that can chain to iterate through (e.g. a Linked List nodes with key/value/next), and store them in that single index.
What is the basic optimization for Hash Map/Table capacity sizing and resizing?
Prime numbers for initial size, then resize to next prime number that is at least twice as much as the current capacity.
Actually implementing “find next 2x -> prime” is not practical to do in an interview. If asked, simply say it’s some stubbed helper method.
Last changeda month ago