Deterministic Skip Lists

mgcadmin 24-02-2024

The Skip List, introduced in 1990 by Pugh[1] is an interesting data structure. Skip lists came in to being as an evolution of the humble linked list. Indeed, skip lists have been described as "linked lists with a fast lane", which is certainly true, but they are also so, so much more than that. They have found use along side the normal ordered data structures as ordered sets and maps, priority queues, and for range searching, but they have also been utilized for such exotic tasks as flash memory file systems, block chains, database indexes, and both as concurrent and distributed data structures.

1 Skip Lists

A "traditional" Skip List

Skip Lists are almost always presented as randomized data structures in the literature, sometimes accompanied by vague mentions of their deterministic counterparts[3]. A quick survey of production implementations tells a different story about the real world use of skip lists: randomization isn't required for good performance, and may in fact be a hinderance.

A Bit of Theory

Munro and Sedgewick showed that skip lists can be implemented deterministically by altering the definition of a skip list to include the following two properties:

Two nodes are considered "linked" if there exists at least one direct link from one node to the other.
The "gap size" between two nodes linked at height h is equal to the number of nodes at height h-1 between them.

These definitions ensure the property that a constant number of forward links are traversed at each level before dropping to the next - bounding the worst case search operation to O(log N).

A skip list that conforms to the above definition, who's gap sizes are of either 1, 2 or 3 happen to be an isometric mapping of 2-3-4 tree's, the very same tree's Red/Black trees are modeled after. It also turns out that this class of skip list happens to be incredibly easy to implement.

A Left Child/Right Sibling Skip List

The "traditional" Skip List had a node consisting of an array of forward pointers, with each level represented explicitly in the same way as they are commonly depicted.

struct SkipNode {
      int info;
      SkipNode* forward[MAX_LEVELS];
};

Deterministic Skip Lists(DSL) on the other hand use just two pointers: down and right, doing away with the arrays, but not the towers they represent. The "towers" are now made of individual nodes explicitly linked together.

template <typename K>
struct SkipNode {
    K key;
    SkipNode *down, *right;
    SkipNode(K k, SkipNode* r, SkipNode* d) : key(k), down(d), right(r) { }
    SkipNode(K k) : key(k) { }
    SkipNode() { }
};

That node looks an awful lot like a binary tree node, doesn't it? It turns out that searching in a DSL is much more closely related to searching in a binary search tree than searching in randomized skip list. In fact, the algorithm to perform a search is exactly the same as for binary search trees! This is truly phenomenal when one considers that the essence of what we are now doing, is performing binary search on a linked list!

template <typename K>
K SkipList<K>::find(K key) {
     bottom->element = key;
     link h = header;
     while (key != h->key)
           h = (key < h->key) ? h->down:h->right;
    return (h == bottom) ? std::numeric_limits<K>::max():h->key;
}

And with good reason: we have done the same with a skip list as when you convert m-ary trees to binary trees by using the left child right sibling representation.

Now that I've whet your appetite a bit, lets get down to the nitty gritty and put some more code together.

Initializing an Empty DSL

DSLs are built in a top down fashion, making ample use of sentinel nodes. Sentinel nodes are used the same way they are for top down red black trees. allowing the dereferencing of pointers without having to worry about null pointers. This is especially important because some operations on DSLs requiring dereferencing pointer chains and if explicit null checks were required the code would quickly become messy.

Before we can add elements to a skip list, we first have to initialize the sentinel nodes. The sentinel nodes are named header, tail, and bottom, and from the names I'm sure you can deduce what each of their responsibilities entails. All three nodes are given then sentinel value of "infinity" for ordering purposes.

template <typename K>
class SkipList {
    private:
        static constexpr int INFINITY = std::numeric_limits<K>::max();
        using node = SkipNode<K>;
        using link = node*;
        link header, tail, bottom;
        int count;
        link traverseRight(link curr, K key);
        link promoteOrSink(link curr, K key);
        link promoteCurrentNode(link curr);
        void growIfNeeded();
    public:
        SkipList();
        int size() const { return count; }
        bool empty() const { return count == 0; }
        void insert(K key); 
        K find(K key);
        void show();
};

template <typename K>
SkipList<K>::SkipList() {
     header = new node(INFINITY);
     tail = new node(INFINITY);
     bottom = new node(INFINITY);
     header->down = bottom;
     header->right = tail;
     tail->right = tail;
     tail->down = tail;
     bottom->right = bottom;
     bottom->down = bottom;
     count = 0;
}

The result of this initialization leaves us with an empty Skip List that looks like this:

PDF] Deterministic skip lists | Semantic Scholar

It may seem strange to set three sentinel nodes to all have the same value. At first it seems all we've done is create a cycle of nodes with no way out, But as you will see momentarily, it is an ingenious way of dictating branch results during insertion.

Insertion in Deterministic Skip List

The code for insertion really is delightfully simple. Inserting a new value into the skip list requires three steps. Two of these steps happen at each "level" of the skip list, and the third taking place as the final step of each insertion.

Insertion begins by pointing an iterator to the header node and setting the key of the "bottom" sentinel node to the value we are inserting. this ensures our insertion doesn't devolve into an endless loop, as well as placing the key in preparation for insertion. Insertion proceeds from the header, top down.

void insert(K key) {
      bottom->key = key;
      link curr = header; 
      while (curr != bottom) {
          curr = traverseRight(curr, key);
          curr = promoteOrSink(curr, key);
      }
      growIfNeeded();
}

Upon entering a level of the skip list, the first thing we do is to traverse the current level as far right as we can in sorted order.

link traverseRight(link curr, K key) {
     link x = curr;
     while (x->key < key)
         x = x->right;
     return x;
}

Once we have traversed laterally as far as possible, we must decide if we should simply drop to the next level, or if we need to promote the height of the current node upwards in order to preserve the gap height invariant. The decision of whether to promote the current node is a simple one. We need to ensure that when we add the new value on level h that it doesn't create a "gap" any larger than 3 nodes wide for the nodes on level h + 1.

        link promoteOrSink(link curr, K key) {
            if (curr->down->right->right->key < curr->key) {
                curr = promoteCurrentNode(curr);
            } else {
                curr = curr->down;
            }
            return curr;
        }

We compare the key of the current node against the key of the node located one level down and two places over, as this is the position where an insertion would violate the specified invariant. If the current key is less than the node being compared against we simply drop down to the next level in the skip list and go back to the first step.

If however, this happens to be the position where we need to an insert a node, then we must do a little rearranging to ensure our gap size doesn't exceed 3. You can think of this as the skip-list analog of performing a rotation in a red/black tree. This is accomplished by pre-emptively splitting any gap of size 3 into two gaps of size one.

        link promoteCurrentNode(link curr) {
            //move current key one space over, while raising the key down one and over one to the current position.
            //if we are on the row above bottom, this will insert the value into the list.
            curr->right = new node(curr->key, curr->right, curr->down->right->right);
            curr->key = curr->down->right->key; 
            return curr;
        }

Once we have reached the bottom level and installed our new value in the base list, the only thing left to do is go back and check that none of the promotions reached the level of the header node, and if it did we must promote the header node one level higher.

void growIfNeeded() {
     if (header->right != tail) {
          header = new node(numeric_limits<K>::max(), tail, header);
     }
}

To relate back to red/black tree's one more time, this is analogous to "color flips" pushing a red node up and out of the root during rebalancing which in its self is analogous to a B Tree splitting at the root.

Displaying Skip Lists

When it comes to viewing the skip list, things get a little interesting actually. We can view both the abstract view, or we can view it's actual tree form, both are interesting in their own right. Because the abstract-view is probably what most people are looking for, and indeed, is actually the easier of the two to implement, it makes a good place to start.

To produce the abstract view of the skip list, we want to make our way down the left most branch of the tree, and at each node, follow its right branch as far right as possible, before dropping down to the next left-most-node. This might at first sound like a breadth first traversal, and that is indeed the intention, but it's a breadth frist traversal of the abtract view.

template <typename K>
void SkipList<K>::show() {
    for (link t = header; t != bottom; t = t->down) {
        for (link p = t; p != tail; p = p->right) {
            cout<<p->key<<" ";
        }
        cout<<endl;
    }
}

int main() {
    string sed = "asearchingexample";
    SkipList<char> sl;
    for (char c : sed)
        sl.insert(c);
    sl.show();
    return 0;
}


i
e i n r
a c e g h i l m n p r s x

The ACTUAL breadth first traversal of the tree is subtantially different. The following code prints the LCRS representation of the skip list in a level order fashion, starting from the root (header node). Once each level has been printed, we print the next level on a new line, etc.

Despite having a similar description, the output is substantially different:

void SkipList<K>::bfs() {
    queue<link> fq;
    fq.push(header);
    while (!fq.empty()) {
        int nc = fq.size();
        while (nc > 0) {
            link curr = fq.front(); 
            fq.pop(); nc--;
            if (curr != bottom && curr != tail) {
                cout<<curr->key<<' ';
                fq.push(curr->down);
                fq.push(curr->right);
            }
        }
        cout<<endl;
    }
}

int main() {
    string sed = "asearchingexample";
    SkipList<char> sl;
    for (char c : sed)
        sl.insert(c);
    sl.bfs();
    return 0;
}

i
e
a i n
c g n l r
e h l r m p
g i m p  n r s
h l n r s p s x
i m p s x r x
l n r x  s
m p s  x
n r x
p s
r x
s
x

Wrapping Up

When you pause to think that we just implemented a data structure with identical functionality and comparable performance to red/black tree's in about 1/3rd the code, deterministic skip lists become even more impressive.

But it's not just red/black tree's or even specifically 2-3-4 trees that these trees share an isomorphism to: it's the entire family of B-Trees! And this revelation has opened the door to some truly amazing discoveries in the world of data structures.

Anyway, that's what I've got for today. Until next time, Happy Hacking.

MaxGCoding.com

For the love of programming

Deterministic Skip Lists

A Bit of Theory

A Left Child/Right Sibling Skip List

Initializing an Empty DSL

Insertion in Deterministic Skip List

Displaying Skip Lists

Wrapping Up

Further Reading

Leave A Comment

Other Readers Have Said:

The visitor pattern: OOP takes on the Expression Problem

Improving mgcLisp's define syntax

Separating the Men from the Boys, Knuth Style

Reducing rotations during deletion from AVL trees

Parsing Lisp: From Data To Code

Swiz Heaps: A deterministic modification of Randomized Meldable Heaps

Let's talk Eval/Apply

BST Deletion: Removal By Merge

Dictionary Based Compression: The LZW Algorithm

Taking Action: Compiling Procedures to P-Code