Bottom Up AVL Tree: The OG Self-Balancing Binary Search Tree

mgcadmin 04-11-2024

Since their inception, Binary Search Tree's have come to be one of the most fundamental data structures in programming. Shortly after their initial development it was realized that with a bit more work we could gurantee a worst case performance of logarithmic time for all BST operations: all that was required is to keep the tree balanced. For all of their merits, a binary search trees performance suffers significantly if they become upbalanced. As such, much research has been done in efficient methods of maintaining balance in binary search trees. The first to figure out an efficient way of doing this were two Soviet computer scientists for whom the tree would get it's name: Georgy Adelson-Velsky and Evgenii Landis. Together, they invented what would come to be known as AVL Tree's In 1962. Pictured below is an AVL tree with it's balance information appearing below each node:

In previous posts I've gone over iterative algorithms for insertion and deltion in AVL trees. In todays post I'm going to cover implementing bottom up AVL tree's using simple recursive algorithms, as well as how to augment them with additional information and operations so they may also function as order statistic trees.

AVL Node Structure

AVL tree's can be built using just normal binary search tree nodes with no extra sattelite data. The reason you don't ever see this done is because we'd find our self calculating, and recalculating a tremendous amount of data during each operation. Instead, AVL nodes cache either the height of the node stored at the current node or the balance factor of the node - which is computed from the aformentioned height.

In addition to the height information for balancing, we will also be cacheing the size of the tree rooted at each node for the order statistic operations, with the size being the number of elements, and the height being the path length of the longest path from root to leaf.

template <class T>
struct avlnode {
    T key;
    int ht;
    int sz;
    avlnode* left;
    avlnode* right;
    avlnode(T k) : key(k), ht(0), sz(1), left(nullptr), right(nullptr) { }
};

Calculating AVL Node Attributes

The algorithms for determining the height and size of a given tree are very similar to each other, so its important to understand the difference in how they function. To calculate the size of a tree, we add 1 to the sum of the sizes of its right and left children. An empty tree has 0 nodes, making for a convenient base case in a very simple recursive algorithm:

template <class T>
int AVL<T>::setsize(avlnode<T>* h) {
    if (h == nullptr)
        return 0;
    return 1 + setsize(h->left) + setsize(h->right);
}

Calculating the height proceeds very similarly, except instead of taking the sum of the right and left child, we take the maximum of the two's height, as follows:

template <class T>
int AVL<T>::setheight(avlnode<T>* h) {
    if (h == nullptr) {
        return 0;
    }
    return 1 + max(setheight(h->left), setheight(h->right));
}

Once it's calculated however, unless we need to update it we can use the following to get the nodes height:

template <class T>
int AVL<T>::height(avlnode<T>* h) {
    return h == nullptr ? -1:h->ht;
}

template <class T>
int AVL<T>::size(avlnode<T>* h) {
    return h == nullptr ? 0:h->sz;
}

The balance factor of a given node is obtained by subtracting the height of the right subtree from the height of the left subtree, using the algorithm just presented we write this function like this:

template <class T>
int AVL<T>::balanceFactor(avlnode<T>* h) {
    return height(h->left) - height(h->right);
}

Ok, with those function's we have everything we need to determine when we need to rotate so let's take a moment to explain the different rotations needed for AVL trees.

Rotations In AVL Trees

To maintain balance in an AVL tree we use four (but really two) types of rotations to keep everything as we want it. I say we only really use two, because two of the cases are handled by combining the original two. If that sound's confusing, it really isn't and you'll see what I mean in just a moment.

template <class T>
avlnode<T>* AVL<T>::rotR(avlnode<T>* h) {
    avlnode<T>* x = h->left;
    h->left = x->right;
    x->right = h;
    h = updateRankAndBalance(h);
    x = updateRankAndBalance(x);
    return x;
}

template <class T>
avlnode<T>* AVL<T>::rotL(avlnode<T>* h) {
    avlnode<T>* x = h->right;
    h->right = x->left;
    x->left = h;
    h = updateRankAndBalance(h);
    x = updateRankAndBalance(x);
    return x;
}

We have two types of single rotations to consider and two types of double rotations. The single rotations are the same as used for splay tree's, red/black trees, any other balanced search tree. We must however also update the height and size values of the nodes after we perform the rotations.

template <class T>
avlnode<T>* AVL<T>::updateRankAndBalance(avlnode<T>* h) {
    h->sz = setsize(h);
    h->ht = setheight(h);
    return h;
}

We use two types of double rotations in AVL tree's. Right/Left rotations, and Left/Right rotations. As you may imagine, these are mirror images of each other. We begin by performing a rotation on the nodes child, and then perform the rotation on the node. for a Left/Right rotation of node x, we first perform a left rotation on x's left child followed by a right rotation on x. The opposite holds for a Right/Left rotation:

template <class T>
avlnode<T>* AVL<T>::rotRL(avlnode<T>* h) {
    h->right = rotR(h->right);
    return rotL(h);
}

template <class T>
avlnode<T>* AVL<T>::rotLR(avlnode<T>* h) {
    h->left = rotL(h->left);
    return rotR(h);
}

With the algorithms for rotations in combination with those for calculating height and balance factors, we implement the actual balancing algorithms.

Maintaining Balance In AVL Trees

One thing that sets AVL tree's apart from the Red/Black family of balanced search tree's, is that the code to maintain balance during insertion and deletion is exactly the same. That's because the rules for maintaining the AVL Tree property are very simple. The AVL property states that a node is considered balanced if the difference of the height of its subtree's is no greater than one. That's it. So long as that applies to every node in the tree, we have a valid AVL tree. With that in mind, we also have an idea of when to rotate, the only other thing we need to determine is which rotation to apply.

An AVL Tree can be in one of three states: left heavy, right heavy, or balanced. We can transform a node from being left heavy or right heavy to balanced using the following rules:

- When a Node is right heavy, we perform a left rotation.

- When a Node is left heavy, we perform a right rotation.

In Addition:

- When a Node is Right Heavy, but its child is Left Heavy, we perform a Right/Left Rotation

- When a Node is Left Heavy, but its child is Right Heavy, we perform a Left/Right Rotation

These rules directly translate to the following recursive procedures:

template <class T>
avlnode<T>* AVL<T>::rightHeavy(avlnode<T>* x) {
    if (bf(x->right) > 0) {
        x = rotRL(x);
    } else {
        x = rotL(x);
    }
    return x;
}

template <class T>
avlnode<T>* AVL<T>::leftHeavy(avlnode<T>* x) {
    if (bf(x->left) < 0) {
        x = rotLR(x);
    } else {
        x = rotR(x);
    }
    return x;
}

template <class T>
avlnode<T>* AVL<T>::balance(avlnode<T>* x) {
    switch (bf(x)) {
        case -2: return rightHeavy(x);
        case 2:  return leftHeavy(x);
        default: break;
    }
    return x;
}

Now that we've placed the framework to maintain balance of our tree, we are ready to get to business. Now we can implement insertion, deletion, and search safe in the knowledge that that will all complete in O(log N) time.

Bottom up Insertion in AVL Trees

To insert a value into an AVL tree, we begin by performin a normal binary search tree insertion. Once the new node has been added to the tree, we update the height and rank information of the effected nodes on the path back up the tree, performing any rotations when the adjusted balance factors call for it. Using the procedures laid out above, we can implement insertion in the following way:

template <class T>
avlnode<T>* AVL<T>::put(avlnode<T>* h, T v) {
    if (h == nullptr) {
        count++;
        return new avlnode<T>(v);
    }
    if (v < h->key) h->left = put(h->left, v);
    else h->right = put(h->right, v);
    h = updateRankAndBalance(h);
    return balance(h);
}

template <class T>
void AVL<T>::insert(T info) {
    root = put(root, info);
}

Seeing as deletion is the only other operation that can effect the balance of the tree, it makes sense we cover that next.

Deleteing Entries From AVL Trees

As It was with insertion, bottom up deletion in AVL tree's also will proceed very similarly to the algorithms for unbalanced binary search trees. We will use a slightly modified hibbard deletion algorithm, in combination with the balance procedures from above to remove items from the tree while maintaining the AVL property through rotations. An essential operation to hibbard deltion is finding the minimum node of a tree, which is done the normal way:

template <class T>
avlnode<T>* AVL<T>::min(avlnode<T>* h) {
    avlnode<T>* x = h;
    while (x->left != nullptr) {
        x = x->left;
    }
    return x;
}

The changes begin with the eraseMin portion of the hibbard deletion algorithm. Because this procedure can cause a change in the balance factor of the nodes on its path, we must begin our rebalancing from this procedure.

template <class T>
avlnode<T>* AVL<T>::eraseMin(avlnode<T>* h) {
    if (h == nullptr)
        return nullptr;
    if (h->left == nullptr)
        return h->right;
    h->left = eraseMin(h->left);
    h = updateRandAndBalance(h);
    return balance(h);
}

Once the node to be removed has been replaced with it's in-order successor, we continue checking and balancing as the stack unwinds, ultimately leaving us with a valid AVL tree.

template <class T>
avlnode<T>* AVL<T>::erase(avlnode<T>* h, T v) {
    if (h == nullptr)
        return h;
    if (v < h->key) {
        h->left = erase(h->left, v);
    } else if (v > h->key) {
        h->right = erase(h->right, v);
    } else {
        avlnode<T>* t = h;
        if (h->right == nullptr) {
            h = h->left;
        } else {
            h = min(t->right);
            h->right = eraseMin(t->right);
            h->left = t->left;
        }
        delete t;
    }
    return balance(h);
}

Searching In AVL Trees

AVL Trees are binary search trees. As such, any of the operations which do not mutate the tree, such as searching for a value by key or finding the minimum node in the tree, require no changes to their code. Search is performed by comparing the current nodes key to the key we are searching for. If the value we are searching for is less than the current nodes value, we recur on the left subtree. If it is greater, we recur on the right.

template <class T>
avlnode<T>* AVL<T>::search(avlnode<T>* h, T key) {
    if (h == nullptr)
        return h;
    if (key == h->key)
         return h;
    else if (key < h->key) 
         return search(h->left, key);
    else 
         return search(h->right, key);
}

template <class T>
T& AVL<T>::search(T info) {
    avlnode<T>* resultnode = search(root, info);
    return resultnode == nullptr ? nilValue:resultnode->key;
}

With insertion, search, and deletion completed, we now have the basic search tree functionality complete for our AVL tree. We continue with augmenting our AVL tree to support oder statistics.

Order Statistic Operations

The added utility gained by binary search tree's by augmenting them with order statistic operations cannot be understated. Just being able to select items by rank would make the additional effort worth it, but order statistic trees (OST) allows us to do much more than that. I've laid out in a previous post how you can use selection by rank to implement the iterator pattern in binary search trees.

Selection is done by traversing the root from the tree by comparing the size field instead of the nodes key. More specifically, we're concerned with the size of the current nodes left sub tree. If the left subtree of the current node is greater than the rank we are searching for, we search the left tree recursively for the rank we want. When the size of the left sub tree is less then the rank we are searching for, we have to get a little creative to find the correct path in the right sub tree, specifically, we subtract the size of the current nodes left subtree minus one from the rank for the recursive call. If neither of those two cases apply, then we have found the node with the rank we are looking for, returning its value.

template <class T>
T& AVL<T>::select(avlnode<T>* h, int rank) {
    if (h == nullptr)
        return nilValue;
    int curr = size(h->left);
    if (curr > rank) return select(h->left, rank);
    if (curr < rank) return select(h->right, rank - curr - 1);
    return h->key;
}

The complimentary operation to selecting by rank is determining rank by key, which functions much in the same way. This time as we traversde the tree, we are comparing the nodes keys to find the correct node:

template <class T>
int AVL<T>::rank(avlnode<T>* h, T info) {
    if (h == nullptr)
        return -1;
    if (info < h->key) return rank(h->left, info);
    else if (info > h->key) return 1 + rank(h->right, info) + size(h->left);
    else return size(h->left);
}

MaxGCoding.com

A software Engineering Space

Bottom Up AVL Tree: The OG Self-Balancing Binary Search Tree

AVL Node Structure

Calculating AVL Node Attributes

Rotations In AVL Trees

Maintaining Balance In AVL Trees

Bottom up Insertion in AVL Trees

Deleteing Entries From AVL Trees

Searching In AVL Trees

Order Statistic Operations

Leave A Comment

Lossless Compression Part III: Huffman Coding

Lossless Compression Part II: The LZ77 Algorithm

Lossless Compression Part I: Working with Bits in a Byte Oriented World

Bottom Up AVL Tree: The OG Self-Balancing Binary Search Tree

A Different Take on Merge Sort: Binary Search Trees?

Deleting Entries From Open-Address Hash tables

Transform any Binary Search Tree in to a Sorted Linked List

From Regular Expressions To NFA by way of Abstract Syntax Trees: Thompsons Construction

Extending Sedgewick's explicit Digraph NFA

Iterative In-order B Tree Traversal