Iterative In-order B Tree Traversal

Efficient sorted collections are extremely important. There are many situations where a developer may choose a sorted collection with logarithmic access times over a hashtable with constant access times because the benefit of maintaining the data in sorted order is either a requirement or highly advantageous. B Trees in particular have held an established role in filesystems and database management systems practically since their inception. More recently, B Tree's have also been seeing something of a rebirth as in-memory data structures. In some instances even rivaling traditional in-memory structures like Red/Black Trees - which themselves are an abstraction of B Trees implemented over binary search trees. This is possible because B Trees maintain "the search tree property" which dictates that values in the subtree to the left of a key are less than that key, and values in the subtree to the right of that key are greater.

It is because of this constraint placed on the ordering of the keys that allow an in-order traversal to yield the keys in ascending order. For both binary search trees and B Tree's  an in order traversal can easily be written recursively. For many applications, such as implementing an iterator, a non-recursive traversal is not only preferable: it's necessary. As we shall see shortly, those easy recursive traversals hide quite a bit of details behind the scenes.

//BST traversal
void traverse(link node) {
     if (node != nullptr) {
          traverse(h->left);
          cout<<h->info.key()<<" ";
          traverse(h->right);
     }
}

//B Tree traversal
void traverse(link node) {
    if (node != nullptr) {
        int i = 0;
        while (i < node->n) {
            traverse(node->next[i]);
            cout<<node->data[i++].key()<<" ";
        }
        traverse(node->next[i]);
    }
}
In todays post I'm going to show you how to replace the recursion in the above algorithm with an explicit stack. Iterative implementations allow us to more easily "pause" the traversal so that it can be performed step-wise. I've covered building in-memory B Tree's in a previous post, which I will use as the basis for this post.

Warm Up: Iterative Binary Search Tree Traversal

As both Binary Search Trees and B Trees maintain the search tree property their traversal algorithms follow the same general strategy. As such, it makes sense to review the iterative traversal of binary search tree's as it will lay a good foundation for us to start from. Starting from the root of the BST, an in order traversal begins by traversing the current nodes left subtree, then processing the current node, and then doing the same to the right subtree. This is sometimes written as (LNR) for Left, Node, Right. In the recursive variant it is the call stack which stores the paths as we iterate over the tree, which makes finding the next node possible. For us to perform the traversal iteratively we must then maintain that stack ourselves.

void traverse(node* h) {
    stack<node*> sf;
    node* x = h;
    while (x != nullptr) {
        sf.push(x);
        x = x->left;
    }
    while (!sf.empty()) {
        node* x = sf.top();
        sf.pop();
        if (x != nullptr) {
            cout<<x->info<<" ";
            x = x->right;
            while (x != nullptr) {
                sf.push(x);
                x = x->left;
            }
        }
    }
    cout<<endl;
}

There are two important pieces of the following code I would like to highlight. The first piece is the "priming of the stack" before we enter the main loop. To prime the stack we store the left most path of the tree in the stack from root to leaf. Now when we pop the first value off of the stack for processing, it is the min node and we can immediately display it. We will perform this "primeing of the stack" for B Tree's as well. The second piece I want to highlight is the jump to the right branch after displaying the nodes contents. In a binary search tree it is simple as we only have one choice of where to go as binary trees only store one value, but for B Tree's this will require a bit of thought, as the way we traverse the internal nodes is a bit more complicated.

We can now adapt this algorithm to work with the m-ary nodes of B Trees. 

The main event: B Tree Traversal

The key to iterartively traversing a B Tree is to remember not only the node we came from, but also the index of the node we were previously processing. To do this, we "tag" the nodes we add to the stack with the index to proceed from when we remove the node from the stack. To do this tagging, we use an std::pair<int, node*> for the stack item. The traversal once again begins with a priming of the left most branch, and then jumping into the main loop. 

const int M = 8;
typedef bnode<int,int,M>* link;

void savePathToLeaf(stack<pair<int,link>>& sf, link node) {
    link x = node;
    while (x != nullptr) {
        sf.push(make_pair(0, x));
        x = x->next[0];
    }
}

When we remove the top item from the stack, we have the current node and the index of the element we in that node which we are processing. We begin by checking if the current node is a leaf, as we process leaf nodes differently than internal nodes. 

void inorder(link root) {
    stack<pair<int, link>> sf;
    savePathToLeaf(sf, root);
    while (!sf.empty()) {
        int idx = sf.top().first;
        link x = sf.top().second;
        sf.pop();
        if (x->isleaf) {
            printLeaf(x);
        } else {
            handleInternalNode(sf, x, idx);
        }
    }
    cout<<endl;
}

As leaf nodes have no children, they have no subtrees to process and as such we simply iterate over the leafs entire array of values without adding any nodes to the stack.

void printLeaf(link x) {
    for (int j = 0; j < x->n; j++)
        cout<<x->data[j].key()<<" ";
}

Internal nodes are a bit trickier. We first need to check if the index we are to process is valid. If so, we display the data at that index, saving the next index to the stack. With that done, we next check if the child node at index+1 is a valid node, and if so we move to that node, marking it as current: this is analagous to moving to the right branch during BST traversal. From here we once again proceed to the next leaf node by following the left most path, saving the path to the stack as we go.

void handleInternalNode(stack<pair<int,link>>& sf, link x, int idx) {
    //print current position in internal node and save where to resume on stack.
    if (idx < x->n) {
        cout<<x->data[idx].key()<<" ";
        if (idx+1 < x->n) {
            sf.push(make_pair(idx+1, x));
        }
    }
    if (idx+1 < M) { //valid index?
        savePathToLeaf(sf, x->next[idx+1]);
    }
}

Et voila, Iterative in-order traversal.

Thats what I've got for you today. Until next time, Happy Hacking!


Leave A Comment