Implementing An Iterator for In-Memory B-Trees

mgcadmin 27-01-2025

Well, it looks like I'm back at it, implementing iterators for all the things and up this time: In-memory B Trees. Now, It's no secret that that an iterator for an ordered container is simply an In-order traversal performed step-wise over said container. In a traditional B Tree where both keys and values are held both in internal as well as leaf nodes, implementing an iterator presents us with a series of challenges. I've previously covered some of these challenges in my post on iterative B Tree traversal, while others are unique to the step-wise processing required of an iterator.

In todays post I'm going to connect the dots of converting a stack based traversal algorithm into an iterator class for in-memory B Trees. (Not B+ tree's - those are implemented completely differently - but if you're looking for those I've still got you covered in this post here). So grab some coffee, fire up your code editor, and lets get to it,

The Iterator Class

Im going to be using stack based iteration, as it avoids having to add parent pointers to my existing B Tree implementation. Thanks to the high degree of fanout present in BTree's, they tend to stay rather shallow in practice (Just like my ex), so maintaining the path with a stack isn't as memory intensive as it is for say, a binary search tree. In addition to the stack, we also maintain a pointer to the current node, as well as the index of the position in that node that we are currently processing.

template <class K, class V>
class BTreeIterator {
    private:
        using link = BTNode<K,V>*;
        link currNode;
        int currIndex;
        stack<pair<int, link>> sf;
        void handleInternalNode();
        void handleLeafNode();
        void savePathToLeaf(link node);
    public:
        BTreeIterator(link root);
        bool done();
        KVPair<K,V>& get();
        void next();
};

For the public API I've decided to go with a simple get(), done(), next() interface so we can focus on the actual implementation logic of the iterator without the clutter of implementing everything needed for STL style iterator.

Constructing The Iterator

Theres a few things that have to happen in order for our iterator to be ready for use. The constructor for the class is passed a pointer to the root of the tree when it is instantiated and proceeds to call the savePathToLeave() method.

        BTreeIterator(link root) {
            savePathToLeaf(root);
            next();
        }

The aptly named savePathToLeaf() method follows the left most path to a leaf node from which ever node is passed to it. As one can surmise from the name, the path to the leaf is added to the stack as it goes.

        void savePathToLeaf(link node) {
            link x = node;
            while (x != nullptr) {
                sf.push(make_pair(0, x));
                x = x->next[0];
            }
        }

Once our stack has been "primed" with a path to the first value, the next() method is called, resulting in making the first element available for access.

Iterating To the Next Element

The next() method actually serves two purposes: its main purpose, as can be gleened from the name is to iterate to the next element in the tree (in sorted order). It's other major function is retrieving the value of the current node and current index in that node from the top of the stack.

We check the condition of the stack before attempting to access the top item, as uncharacteristically for an iterator, ours makes a recursive call to next. This occurs when the top item of the stack is an index for the last child pointer in the node: we need to follow that link but there is no value to display first.

      void next() {
            if (sf.empty())
                return;
            currIndex = sf.top().first;
            currNode = sf.top().second;
            sf.pop();
            if (currIndex > currNode->n) {
                next();
            } else {
                if (currNode->isleaf)
                    handleLeafNode();
                else 
                    handleInternalNode();
            }
        }

Upon arriving at a valid node and index, we take different actions depending on if the current node is an internal node or leaf node. If the current node is a leaf node, we simply push the next index in the current node on to the stack, assuming that the proposed index would be a legal index, otherwise we have consumed the entire node and need to call the next() method to obtain the next node to be iterated over.

        void handleLeafNode() {
                if (currIndex+1 <= currNode->n) 
                      sf.push(make_pair(currIndex+1, currNode));
                else next();
        }

Internal nodes have a bit more complex logic. Remembering that a B Tree has have N-1 keys for every N children, we take actions depending on the value of currentIndex+1. If the current index+1 is less than (but not equal to) the number of keys in the node, we need to place an entry for the next position in the current node on the stack. Then, so long as currIndex+1 is less than or equal to the number of entries in the node, we save the path from the child at currIndex+1 to a leaf node.

        void handleInternalNode() {
                if (currIndex+1 <= currNode->n) {
                    if (currIndex+1 < currNode->n)
                        sf.push(make_pair(currIndex+1, currNode));
                    savePathToLeaf(sf, currNode->next[currIndex+1]);
                }
        }

Not so bad, right?

Accessing the Current Element And Testing For More

Finally, we come to the easy parts: accessing the current element, and testing if we finished iterating over the entire tree. The call to done() returns whether there aren any items left on the stack, if not we are done.

Do I really need to explain get()?

        bool done() {
            return sf.empty(); 
        }
        KVPair<K,V>& get() {
            return currNode->data[currIndex];
        }

Well, there you have it, a B Tree iterator. Now go forth and iterate. Until next time, Happy Hacking!

template <class K, class V>
void printBTree(BTree<K,V>& bt) {
    for (auto it = bt.iter(); !it.done(); it.next()) {
        cout<<it.get().key()<<" ";
    }
    cout<<endl;
}

int main() {
    BTree<int, int> bt;
    for (int c = 1; c < 17; c++)
        bt.add(c,c);
    cout<<"In Order: "<<endl;
    printBTree(bt);
    return 0;
}

In Order: 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

MaxGCoding.com

For the love of programming

Implementing An Iterator for In-Memory B-Trees

The Iterator Class

Constructing The Iterator

Iterating To the Next Element

Accessing the Current Element And Testing For More

Leave A Comment

Procedural Map Generation with Binary Space Partitioning

Exact Match String Searching: The Knuth-Morris-Pratt Algorithm

The visitor pattern: OOP takes on the Expression Problem

Improving mgcLisp's define syntax

Separating the Men from the Boys, Knuth Style

Reducing rotations during deletion from AVL trees

Parsing Lisp: From Data To Code

Swiz Heaps: A deterministic modification of Randomized Meldable Heaps

Let's talk Eval/Apply

BST Deletion: Removal By Merge