Removing an entry from a B+ Tree without Rebalancing: A viable approach?
When it comes to data structures - especially self balancing data structures - it is no secret that the algorithms for removing an entry are often many, many times more complex than the algorithms for adding a value. Anyone who's tried to wrap their noggin around balanced deletion in Red/Black tree knows exactly what I'm talking about. The situation for B trees - the family of data structures which Red/Black trees abstract - isn't much different.
This dispartiy in complexity has let to all sorts of... shall we say, "creative" solutions to the problem of easing the implementation of these difficult operations. One common method, adopted from use in hashtables is the so called "Lazy" or "tombstone" deletion method, where an entry is marked as removed, despite its place in the structure being preserved.
In todays post I'm going to cover an implementation of a different approach: the "naive" deletion method for B+ trees. With this method, underflow is allowed to occur during deletion in leaf nodes without performing any redistribution of keys from sibling nodes. As it turns out this isnt nearly as impactful on performance as one might initially assume, especially when compared to the extensive amount of repair operations it would take to restore balance to the tree after a deletion - balance which may possibly be restored by the next insertion anyway.
I'm going to be building off the bottom up B+ tree code presented in my post on Memory Resident B+ Trees so if you haven't read that post, you might want to give it a read first to get up to speed. With that said, lets do it.
A Unique Challenge
Traditional B Trees which are structured in a fashion akin to a binarys earch tree with data stored in the same node as the key that references it. A B+ tree however, is an indexing data structure: Keys and Values stored separately. In this case, it can help to reason about B+ Trees by thinking of them as a combination of two different data structures instead of as one unified data structure. Using this model, we imagine the upper portion of the tree to serve as an index to guide us to the correct node of a linked list which serves as the leaf level of the tree.
A B+ trees upper levels index into the linked list at the leaf level
We can now think of removing a value from a B+ tree being performed in two steps. The first step is the actual removal of the referenced value from the linked list. Once that is accomplished, we then move to the second step of performing any rebuilding of the index which may have become invalid by removing the leaf value.
Removing an entry from the leaf level is easy enough, Once we find the correct leaf node, removing a value reduces to the case of deleting an entry from a sorted array:
template <class K, class V>
void BPNode<K,V>::eraseAt(int idx) {
for (int j = idx; j < n-1; j++) {
data[j] = data[j+1];
}
n--;
}
But what do we do about the upper levels of the tree? After all, being an indexed structure, other values may be depending on that value to drive a search to them, or the value that indexed to the item being erased is a related albeit different key. We can't orphan the other values in a node when we erase one value from it.
Erasing Values While Preserving the Integrity of the Index
Removing a value from the index starts at the root, and works towards the leaf where if the value exists it will be removed as was depicted in the previous section. Regardless of if the current node is a leaf or not, the first thing we do is check the node for the key we want to remove.
If we are not at a leaf not, we first need to check if the result of the search for the key was successful. If the key we are searching for is not in the node, then we need to find the node with a the largest key still less than the key we are searching for. This is the purpose of the floor() method.
void remove(BPNode<K,V>* node, K key) {
int idx = node->find(key);
if (node->isLeaf()) {
if (idx != -1) {
node->eraseAt(idx);
}
} else {
idx = (idx == -1) ? node->floor(key):idx;
remove(node->at(idx).next, key);
if (node->at(idx).next->isEmpty()) {
node->eraseAt(idx);
}
}
}
After traversing the tree to the leaf (and presumably removing a value from the leaf level) we check each node as the stack unwinds for the case where we may have erased the only entry in a node, which would leave an empty node with nothing pointing at it. In this case, we delete both the empty node and the key in the index which pointed to the empty node. This process cascades upward towards the root. In this way, the viability of the Index is preserved, despite letting nodes underflow as we erase values.
max@MaxGorenLaptop:~/btrees$ ./bpt
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a a a ] [ b e ] [ e e e ] [ l l l ] [ m m ] [ p r ] [ s t x ]
Erase: a
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a a ] [ b e ] [ e e e ] [ l l l ] [ m m ] [ p r ] [ s t x ]
--------------
Erase: s
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a a ] [ b e ] [ e e e ] [ l l l ] [ m m ] [ p r ] [ t x ]
--------------
Erase: m
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a a ] [ b e ] [ e e e ] [ l l l ] [ m ] [ p r ] [ t x ]
--------------
Erase: a
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a ] [ b e ] [ e e e ] [ l l l ] [ m ] [ p r ] [ t x ]
--------------
Erase: l
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a ] [ b e ] [ e e e ] [ l l ] [ m ] [ p r ] [ t x ]
--------------
Erase: l
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a ] [ b e ] [ e e e ] [ l ] [ m ] [ p r ] [ t x ]
--------------
Erase: b
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a ] [ e ] [ e e e ] [ l ] [ m ] [ p r ] [ t x ]
--------------
Erase: t
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a ] [ e ] [ e e e ] [ l ] [ m ] [ p r ] [ x ]
--------------
Erase: r
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a ] [ e ] [ e e e ] [ l ] [ m ] [ p ] [ x ]
--------------
Erase: e
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a ] [ e ] [ e e ] [ l ] [ m ] [ p ] [ x ]
--------------
Erase: e
[ a e m ]
[ a b ] [ e l ] [ m p s ]
[ a ] [ e ] [ e ] [ l ] [ m ] [ p ] [ x ]
--------------
Erase: e
[ a e m ]
[ a b ] [ l ] [ m p s ]
[ a ] [ e ] [ l ] [ m ] [ p ] [ x ]
--------------
Erase: x
[ a e m ]
[ a b ] [ l ] [ m p ]
[ a ] [ e ] [ l ] [ m ] [ p ]
--------------
Erase: a
[ a e m ]
[ b ] [ l ] [ m p ]
[ e ] [ l ] [ m ] [ p ]
--------------
Erase: m
[ a e m ]
[ b ] [ l ] [ p ]
[ e ] [ l ] [ p ]
--------------
Erase: p
[ a e ]
[ b ] [ l ]
[ e ] [ l ]
--------------
Erase: l
[ a ]
[ b ]
[ e ]
--------------
Erase: e
--------------
A funny side effect of how deletion was implemented is, as you can see, the tree's height does not shrink as we remove items. I suppose there's always a price to pay for taking a shortcut.
That's all I've got for today, until next time Happy Hacking!
-
Removing an entry from a B+ Tree without Rebalancing: A viable approach?
-
Implementing An Iterator for In-Memory B-Trees
-
Weight Balanced Binary Search Trees
-
Parsing Array Subscript Operators with Recursive Descent
-
Implementing Map & Filter in Scheme & C
-
Persistent Symbol Tables for Lexical Scoping
-
The Festival of 1 + n + f(n-1) Lights
-
The Heart of Pratt Parsing: Top-Down Operator Precedence
-
Compiling expressions to P-Code by AST Traversal
-
Ternary Search Tries: String Specific Ordered Symbol Tables
Leave A Comment