Hash Tables part 2: Separate Chaining.

mgcadmin 03-09-2022

Part 2: expanding the collision resolution repertoire

Welcome back for part two of my articles on hash tables. In my previous article i covered the basics of linear probing. In this article i will discuss another popular collision resolution technique: separate chaining. While open address tables contain just one item per bucket, choosing instead a different slot when a key collision occurs, separate chaining takes the concept of "bucket" to its logical conclusion. By marrying a dynamically sized container, most commonly a linked list to each table slot we can store each key that maps to that slot in said bucket. Other data structures besides linked lists have been explored as well, in languages where dynamically sized arrays are available they can provide better cache locality than a linked list will. In implementations where the number of buckets must be kept small in turn leading to buckets of higher item counts, binary search trees cab be employed to cut down on search times. In this article i will be using singly linked lists to keep the concept its self the focus. Despite their similarities there are significant enough differences between open addressing tables and separate chained tables to warrant significant differences in the organization of the data structures. In that sense, this article may seem like its own stand alone piece than a part two. But without further delay lets dig in to this most useful data structure.

Many pieces makes a whole

As previously discussed a hash table is a marriage of concepts requiring many pieces working in concert. These pieces are:

The hash function - for converting keys to table indexes
The table data structure - an array of "buckets" that key/value pairs will be mapped to, with associated satellite info such as item count and table size.
The collision resolution strategy - a way to handle multiple keys that map to the same table slot.

I will be using the same cyclic shift hash function that i used in the open address implementation as that piece of the puzzle does not change: The main table data structure is remarkably similar to our previous open addressing iteration, the difference being that instead of an array of items, it will now be an array of linked list headers. I will also b making use of a special "list header" structure to carry additional satellite info, mainly the number of items in the bucket being queried. And the final change being the addition of 'next' pointer to our key/value structure to enable its use in a linked list. These last two points are strictly a personal design choice though, and a simple linked list with a pointer to the key value pair could be used instead.

Handling Table Insertion

Insertion into a table with separate chaining is a breeze, and is really where this collision resolution shines. Once we have computed the index from our key, we proceed exactly the same as we would for insertion into a singly list. Pushing the new item to the front of the bucket's linked list will yield an any case O(1) insertion algorithm. It is important to note that this is a worst case performance that open addressing implementations cannot match!

Retrieving data from the hash table

Once our table is populated, querying it becomes a very simple matter as well. We proceed by computing the hash value of the key for the item in question and then perform a simple sequential sort of the bucket, if the item is in the table, we return its associated value:

Deleting Items from the hash table

The deletion items also proceeds exactly as it would from a linked list. The one caveat being that we are using special list head structures, so an additional check is required to see if they item we are searching for is the first item on the list, as this case must be handled differently:

But wait! There's more....

With that we have a full implementation of our separate chaining hash table. But that does not conclude my series on hash tables. In the next article I will cover yet more strategies, such as double hashing, cuckoo hashing, and hashing values that are not based on strings.

MaxGCoding.com

For the love of programming

Hash Tables part 2: Separate Chaining.

Part 2: expanding the collision resolution repertoire

Many pieces makes a whole

Handling Table Insertion

Retrieving data from the hash table

Deleting Items from the hash table

But wait! There's more....

Leave A Comment

Top-Down Deletion for Red/Black Trees

Implementing Closures in Bytecode VMs: Heap Allocated Activation Records & Access Links

Pascal & Bernoulli & Floyd: Triangles

A Quick tour of MGCLex

Compiling Regular Expressions for "The VM Approach"

Composable Linked Digraphs: An efficient NFA Data Structure for Thompsons Construction

Improving the Space Efficiency of Suffix Arrays

Augmenting B+ Trees For Order Statistics

Top-Down AST Construction of Regular Expressions with Recursive Descent

Balanced Deletion for in-memory B+ Trees