Deleting Arbitrary Values from Binary Search Trees
Binary Search Tree's are great. A couple dozen lines of code yields you with an efficient ordered collection suitable for all kinds of stuff from sets to symbol tables. A couple dozen more lines, and that efficiency can be guaranteed through self balancing, either as an AVL tree or Red Black tree. When it comes to their implementation, the majority of the operations have algorithms that are generally easy to understand, nor are they particularly lengthy as the code for insert, get, and in-order printing shown below. Many times however, especially with their self balancing variety, the algorithms for deletion are either convoluted, not well explained, or as is often the case - left out entirely.
struct node {
int key;
node* left;
node* right;
node(int k) : key(k), left(nullptr), right(nullptr) { }
};
typedef node* link;
void print(link h) {
if (h != nullptr) {
print(h->left);
cout<<h->key<<" ";
print(h->right);
}
}
link insert(link h, int key) {
if (h == nullptr)
return new node(key);
if (key < h->key) h->left = insert(h->left, key);
else h->right = insert(h->right, key);
return h;
}
link get(link h, int key) {
if (h == nullptr)
return h;
if (key < h->key) get(h->left, key);
else if (key > h->key) get(h->right, key);
else return h;
}
There are several different approaches (and sub approaches!) to removing an arbitrary value from a BST, some less performant than others. One of the earliest algorithms for removing a node from a BST was "deletion by merging" which has the undesirable potential of leaving the tree quite unbalanced. The method which I prefer is a variant of the "deletion by copying" strategy developed by Hibbard, where a node is replaced by its in order predecessor or successor. Hibbard deletion, as this method is sometimes referred is a straightforward bottom up algorithm that lends itself nicely to a recursive implementation.
The General Idea
Hibbard Deletion from begins by searching for the node we want to remove. Once we have found the node we want to remove there are several scenarios we may encounter that can effect how we proceed. If the node is a leaf, we can simply remove it from the tree. This is the most desirable scenario. Likewise, we may find that our node only has a left child node, or only has right child node. In this scenario, we simply replace the node with its sole child. So far so good. That last scenario we may encounter is the trickiest, and it is deleting a node from the tree that has both a left and right child node. When this happens, we replace the node's key/value with that of it's in order successor, which we then move to delete. In this way, we maintain the ordered property of our binary search tree.
Find the min, Erase the min
This seems like a lot of work, and no doubt if we were to choose to address all of the above mentioned cases explicitly, it certainly would be. Thankfully, the situation isn't that bad. The insight that makes Hibbard deletion so simple, is the observation that there are really only two cases we need to handle explicitly: the case when the node to be deleted only has a left child, and the case when a node has both children. All of the other cases can be reduced to one of these two scenarios. That is, we can transform removing an arbitrary node from the tree, into removing the minimum node from a tree. And removing the minimum node from a tree is by definition removing a leaf node, the simplest case! So how do we do that? Well, Check me out..
link min(link h) {
if (h == nullptr || h->left == nullptr)
return h;
return min(h->left);
}
link eraseMin(link h) {
if (h == nullptr)
return h;
if (h->left == nullptr)
return h->right;
h->left = eraseMin(h->left);
return h;
}
The two methods pictured above function very similarly, as they perform similar tasks. The first method retrieves the node with the lowest value in the tree rooted at h. It does this by traversing the left most branch until it finds a node which doesn't have a left child. This node is the min node, and is returned when found. The second method does the same traversal down the left most branch, however this time when we encounter the minimum node, we replaces it with its right child, effectively removing the min node from the tree.
If you're still unsure of how this fits into deleting any node from a tree, allow me to introduce you to the in order successor.
The in-order successor: a min by any other name...
As mentioned above, we remove a node with two children from a tree by replacing it with it's in-order successor, or the node which would be visited after the current node in an in-order traversal of the tree. And what is so important about the in-order successor of a node? It is found by obtaining the minimum node from three tree rooted at the current nodes right child. 🤯
link inorderSuccessor(link h) {
return min(h->right);
}
Remember when I said the only explicit cases we must handle are when a node has two children, or if the node doesn't have a right child? This is the reason we care specifically about the right child, as without one this trick doesn't work. With this bit of knowledge in mind, we're now armed with everything needed to remove any node from a binary search tree. So lets piece it all together.
Find the node, Erase the node
We begin by searching for the node we want to remove, which is done in the normal way: If the current node is null, the value we are searching for isn't in the tree and we are done. Otherwise we proceed with the search by comparing the value being searched for to that of the current node. If the value is less than the current nodes, we continue the search recursively on the current nodes left subtree, and if the value we are searching for is greater than that of the current node we do the same on the current nodes right subtree. If however the current node is the one we want to remove, than we have a bit of work to do.
link findAndRemove(link h, int key) {
if (h == nullptr)
return h;
if (key < h->key)
h->left = findAndRemove(h->left, key);
else if (key > h->key)
h->right = findAndRemove(h->right, key);
else
h = removeNode(h);
return h;
}
Upon locating the node to remove the first think to be done is to save a reference to the current node in a temporary variable. Next, we determine which of our two explicit cases from above it is that we have encountered. If the current node doesn't have a right child, than we simply assign the current nodes left child to replace the current node. Otherwise, we replace the current node with its in order successor while removing the node of the in order successor node from it's old position in the right subtree of tmp using the eraseMin function detailed above. The left subtree is restored by assigning it from our tmp node to the current node's left child.
The only remaining task to do is free the memory pointed to by tmp, which is now occupied by the node we wanted to erase and which we replaced with its in order successor.
link removeNode(link h) {
link tmp = h;
if (h->right == nullptr) {
h = h->left;
} else {
h = inorderSuccessor(tmp);
h->right = eraseMin(tmp->right);
h->left = tmp->left;
}
delete tmp;
return h;
}
Take careful note of which node we are assigning to, and which node we are operating on. We are operating on our tmp reference to the current node, and assign the result of which to the pointer to our current node. Take a minute to digest that, you don't want to get that backwards.
So there you have it, simple bottom up deletion in binary search trees. That's all I've got for today, so until next time Happy Hacking.
Leave A Comment