Weight Balanced Binary Search Trees

When it comes to self balancing binary search trees it is almost always assumed that the balance one is referring to is that of access path length. The "traditional" definition of a balanced binary search tree is a tree where the difference in path length for any given node between its right and left children is no more than 1. By adding this constraint to the tree, we can offer a worst case bound of O(log N) for insertion, search, and delete operations. AVL Tree's accomplish this through the careful application of sub tree rotations.

There is another class of balanced binary search trees whos balance is determined not by the height of a nodes subtrees, but instead by their size. Orginally termed "Tree's of Bounded Balance", Knuth gave them the moniker "Weight Balanced BST's". One somewhat more well-known variety of weight balanced BST is the scapegoat tree. While being less popular than the better known Red/Black Tree or AVL Tree, various types of weight balanced BSTs have garnered some populairty amongst the functional programming folks, particularly amongst scheme users.

They also have a pretty simple implementation: if you've ever implemented an order statistic tree, you've already covered most of the ground work. In today's post I'm going to cover insertion while maintaing balance in a variant of weight balanced binary search trees that can be thought of as the weight based analog of an AVL tree.

Node Structure Of WB BSTs

Being a type of binary search tree closeley related to order statistic trees the nodes of the weight balanced BST are for all intents and purposes identical, though in this particular instance we can dispens with the height field if we wish to save space, as it is not integral to any of the essential functionality.

struct node {
      K key;
      V value;
      int n;
      node* left;
      node* right;
      node(K k, V v) : key(k), value(v), n(0), left(nullptr), right(nullptr) { }
};

Searching in weight balanced BST's is identical to searching in any ther BST (except for splay trees, I suppose), and so I'll skip straight to implementing insertion.

Weight Balanced BST Insertion

I first came across this approach as an exercise in the 3rd edition of Sedgewicks "Algorithms in C++". The basic strategy is to modify a standard BST insertion so that whenever a node with 25% of its child nodes in one of it's subtrees is encountered, we partition that subtree around its median node. Theres a bit to unpack there so we'll start with the easy parts first: Performing a regular BST insertion and calculating the size of a sub tree for a given node.

node* put(node* h, K key) {
       if (h == nullptr)
           return new node(key);
       if (key < h->key) {
           h->left = put(h->left, key);
       } else {
           h->right = put(h->right, key);
       }
       h->n = setsize(h);
      return weightBalance(h);
}

The call to setheight is where we update the nodes size information. While setsize() is simple, it would be rather expensive if we were to calculate the height every time we needed it, so instead we cache it and use size(node*) when we need to read the height without re-calculating it.

int setsize(node* h) {
      if (h == nullptr)
          return 0;
      return 1 + setsize(h->left) + setsize(h->right);
}

int getsize(node* h) {
     return h == nullptr ? 0:h->n;
}

With the size calculation taken care of, we can move onto the balancing step. You'll recall from above that we want to compare the size of a nodes respective subtree's and perform our balancing step only if we encounter a node with 25% of it's children on one side.

node* weightBalance(node* h) {
       if (h != nullptr) {
           int sl = size(h->left);
           int sr = size(h->right);
           if (sl <= size(h)/4 || sr <= size(h)/4) {
               h = partition(h, (sl+sr)/2);
           }
       }
       return h;
}

The actual balancing is done by paritioning the subtree about the median, so I suppose now would be as good time a time as any to talk about paritioning.

Partitioning a BST: Rank and Roll

Partitioning a binary search tree is an interesting operation, and it's one that exemplifies the connection between binary search trees and quicksort quite well. The algorithm irtsself is almost identical to how one selects a node by rank. The difference is, everytime we branch left our right in the search, we perform a rotation. 

node* partition(node* x, int rank) {
      int leftSize = size(x->left);
      if (leftSize > rank) {
          x->left = partition(x->left, rank);
          x = rotateRight(x);
      }
      if (leftSize < rank) {
           x->right = partition(x->right, rank-leftSize-1);
           x = rotateLeft(x);
      }
      return x;
}

The rotations have the end result of the node being selected by rank ending up as the new root node of the subtree being partitioned. Following that logic, if we partition about the median node than it should follow that tree rooted at that node now has roughly half of its nodes on either side, leaving it weight balanced.

The rotations themselves are no different than for any othere balancing algorithm, but we need to remember to update the sub tree counts after performing the rotations.

node* rotateLeft(node* h) {
      node* x = h->right; h->right = x->left; x->left = h;
     h->n = setsize(h);
     x->n = setsize(x);
      return x;
}
node* rotateRight(node* h) {
     node* x = h->left; h->left = x->right; x->right = h;
     h->n = setsize(h);
     x->n = setsize(x);
     return x;
}

Didnt I say it was simple? With insertion implemented, let's see how it stacks up.

The Test Drive

It can be said that the achilles heel of the standard binary search tree is it's tendency to become unbalanced when series of monotonically increasing or decreasing ranges of values are inserted. Our O(log N) performing BST suddenly starts behaving like an O(n) linked list. Inserting the characters 'a' to 'z' into an initally empty BST exemplifies this.

a
   b
       c
           d
              e
                 f
                    g
                       h
                         ...
                            z

However, when we apply our balance-by-partitioning scheme from above, we are left with the following far more balanced BST from the same insertion sequence:

                                j
               f                                    r
       c            i                      o               v
   a    d     g                    l          q        t      x
     b    e    h               k   m   p    s  u    w      y
                                       n                            z

It is certainly a balanced binary search tree, and the implementation is simple enough, but what's the cost? Afterall, wouldn't this be a much more popular data structure If it was truly competetive with other self balancing binary search trees? Well actually, not necessarily. I ran quite a number of (not so scientific) tests, subjecting both WB BST and AVL trees to the same sequence of insertions and compared the results. In almost every case the WB BST performed fewer overall rotations for the same sequence as the AVL Tree would, and while the AVL Trees were shorter, the difference in path length was only differed by one.

And that's not to say that weight balanced binary search trees arent in common use: they are, just in some not-so-common places. For whatever reason, be it history or by convention, the imperative world has settled on the red black tree as their sorted map structure, and thats unlikely to change any time soon. But venture into the world of scheme or haskell, and a whole world of interesting data structures is opened up to you. Until next time, Happy Hacking!

 


Leave A Comment