Deterministic Skip Lists

The Skip List, introduced in 1990 by Pugh[1] is an interesting data structure. Skip lists came in to being as an evolution of the humble linked list. Indeed, skip lists have been described as "linked lists with a fast lane", which is certainly true, but they are also so, so much more than that. They have found use along side the normal ordered data structures as ordered sets and maps, priority queues, and for range searching, but they have also been utilized for such exotic tasks as flash memory file systems, block chains, database indexes, and both as concurrent and distributed data structures.

Skip Lists are almost always presented as randomized data structures in the literature, sometimes accompanied by vague mentions of their deterministic counterparts[3]. A quick survey of production implementations tells a different story about the real world use of skip lists: randomization isn't required for good performance, and may in fact be a hinderance.

A Bit of Theory

Munro and Sedgewick showed that skip lists can be implemented deterministically by altering the definition of a skip list to include the following two properties:

  1. Two nodes are considered "linked" if there exists at least one direct link from one node to the other.
  2. The "gap size" between two nodes linked at height h is equal to the number of nodes at height h-1 between them.

These definitions ensure the property that a constant number of forward links are traversed at each level before dropping to the next - bounding the worst case search operation to O(log N).

A skip list that conforms to the above definition, who's gap sizes are of either 1, 2 or 3 happen to be an isometric mapping of 2-3-4 tree's, the very same tree's Red/Black trees are modeled after. It also turns out that this class of skip list happens to be incredibly easy to implement.

A Left Child/Right Sibling Skip List

The "traditional" Skip List had a node consisting of an array of forward pointers, with each level represented explicitly in the same way as they are commonly depicted.

struct SkipNode {
      int info;
      SkipNode* forward[MAX_LEVELS];
};

Deterministic Skip Lists(DSL) on the other hand use just two pointers: down and right, doing away with the arrays, but not the towers they represent. The "towers" are not made of individual nodes explicitly linked together.

struct SkipNode {
      int info;
      SkipNode *down, *right;
};

That node looks an awful lot like a binary tree node, doesn't it? It turns out that searching in a DSL is much more closely related to searching in a binary search tree than searching in randomized skip list. In fact, the algorithm to perform a search is exactly the same as for binary search trees, which is phenomenal when one considers that we are now performing binary search on a linked list!

K find(K key) {
     bottom->element = key;
     link h = header;
     while (key != h->element)
           h = (key < h->element) ? h->down:h->right;
    return (h == bottom) ? std::numeric_limits<K>::max():h->element;
}  

And with good reason: we have done the same with a skip list as when you convert m-ary trees to binary trees by using the left child right sibling representation:

Now that I've whet your appetite a bit, lets get down to the nitty gritty and put some code together.

Initializing an Empty DSL

DSLs are built in a top down fashion, making ample use of sentinel nodes. Sentinel nodes are used the same way they are for top down red black trees in order to allow dereferencing pointers without having to worry about null pointers. This is because some operations on DSLs requiring dereferencing pointer chains and if explicit null checks were required the code would quickly become messy.

Before we can add elements to a skip list, we first have to initialize the sentinel nodes. The sentinel nodes are named header, tail, and bottom, and from there names I'm sure you can deduce what each of their responsibilities are. All three nodes are given then sentinel value of "infinity".

const int INFINITY = std::numeric_limits<K>::max();
class SkipList {
    private:
      using node = SkipNode;
      using link = node*;
      link header, tail, bottom;
    public:
      SkipList();
      int size() const { return count; }
      bool empty() const { return count == 0; }
      void insert(int key); 
      void find(int key);
      void show();
};

SkipList::SkipList() {
     header = new node(INFINITY);
     tail = new node(INFINITY);
     bottom = new node(INFINITY);
     header->down = bottom;
     header->right = tail;
     tail->right = tail;
     tail->down = tail;
     bottom->right = bottom;
     bottom->down = bottom;
     count = 0;
}

The result of this initialization leaves us with an empty Skip List that looks like this:

It may seem strange to set three sentinel nodes to all have the same value. At first it seems all we've done is create a cycle of nodes with no way out, But as you will see momentarily, it is an ingenious way of dictating branch results during insertion.

Insertion in Deterministic Skip List

The code for insertion really is delightfully simple. Inserting a new value into the skip list requires three steps. Two of these steps happen at each "level" of the skip list, and the third taking place as the final step of each insertion.

Insertion begins by pointing an iterator to the header node and setting the key of the "bottom" sentinel node to the value we are inserting. this ensures our insertion doesn't devolve into an endless loop, as well as placing the key in preparation for insertion. Insertion proceeds from the header, top down.

void insert(K key) {
      bottom->key = key;
      link curr = header; 
      while (curr != bottom) {
          curr = traverseRight(curr, key);
          curr = promoteOrSink(curr, key);
      }
      growIfNeeded();
} 

Upon entering a level of the skip list, the first thing we do is to traverse the current level as far right as we can in sorted order.

link traverseRight(link curr, K key) {
     link x = curr;
     while (x->key < key)
         x = x->right;
     return x;
} 

Once we have traversed laterally as far as possible, we must decide if we should simply drop to the next level, or if we need to promote the height of the current node upwards in order to preserve the gap height invariant. The decision of whether to promote the current node is a simple one. We need to ensure that when we add the new value on level h that it doesn't create a "gap" any larger than 3 nodes wide for the nodes on level h + 1.

link promoteOrSink(link curr, K key) {
     if (curr->down->right->right->key < curr->key) {
         curr = insertAndPromote(curr);
     } else {
         curr = curr->down;
     }
     return curr;
} 

We compare the key of the current node against the key of the node located one level down and two places over, as this is the position where an insertion would violate the specified invariant. If the current key is less than the node being compared against we simply drop down to the next level in the skip list and go back to the first step.

If however, this happens to be the position where we need to an insert a node, then we must do a little rearranging to ensure our gap size doesn't exceed 3. You can think of this as the skip-list analog of performing a rotation in a red/black tree. This is accomplished by pre-emptively splitting any gap of size 3 into two gaps of size one.

 
 link insertAndPromote(link curr) {
     curr->right = new node(curr->key, curr->right, curr->down->right->right);
     curr->key = curr->down->right->key; 
} 

Once we have reached the bottom level and installed our new value in the base list, the only thing left to do is go back and check that none of the promotions reached the level of the header node, and if it did we must promote the header node one level higher.

void growIfNeeded() {
     if (header->right != tail) {
          header = new node(numeric_limits<K>::max(), tail, header);
     }
} 

To relate back to red/black tree's one more time, this is analogous to "color flips" pushing a red node up and out of the root during rebalancing which in its self is analogous to a B Tree splitting at the root.

Wrapping Up

When you pause to think that we just implemented a data structure with identical functionality and comparable performance to red/black tree's in about 1/3rd the code, deterministic skip lists become even more impressive.

But it's not just red/black tree's or even specifically 2-3-4 trees that these trees share an isomorphism to: it's the entire family of B-Trees! And this revelation has opened the door to some truly amazing discoveries in the world of data structures.

Anyway, that's what I've got for today. Until next time, Happy Hacking.

Further Reading

[1] Pugh, William, "Skip Lists: A Probabilistic Alternative to Balanced Trees", 1990 https://15721.courses.cs.cmu.edu/spring2018/papers/08-oltpindexes1/pugh-skiplists-cacm1990.pdf

[2] Munro, Popadakis, and Sedgewick, "Deterministic Skip Lists"
Proceedings of the ACM-SIAM Third Annual Symposium on Discrete Algorithms, January 1992. https://www.ic.unicamp.br/~celio/peer2peer/skip-net-graph/deterministic-skip-lists-munro.pdf

[3] Sedgewick, R. "Algorithms in C++" (3rd. ed) Prentice Hall 1999

 


Leave A Comment