megamau wrote:It is also possible to positively avoid collisions, by using a large enough key.
The size of the key is not nearly as important as:
1. The number of entries in the hash table if it is not a "perfect hashing" scheme
2. The randomness of the process that generates the key. More random = better hashing and fewer collisions
As a counter-example to demonstrate the point, imagine a hash table with a single entry. No matter how good the key you select, you are guaranteed that every entry, except for one, is a collision. Everything will map to the one location. The first entry is the only one that is not a collision.
There are Perfect Hashing Functions, which will always map an entry into the hash table without duplicating the index, and, therefore, produces 0 collisions.
There are also Minimal Perfect Hashing Functions which are perfect hash functions that also have no unused entries.
In a sense, an Indexing Function is a Minimal Perfect Hashing Function without the overhead of creating a hash number.
If you have enough RAM, then the number of entries in your hash table should be a prime number that is twice as large as the number of entries you need to store. That way, your hash table will be less than 50% full, greatly reducing the chance of collisions, and allowing for faster collision resolution when collisions do occur.