Iterative Quicksort: removing recursion via explicit stack
Quicksort
In my previous article on quick sort I covered in detail the recursive implementation of the famous sorting algorithm. A downside of Quick sort is that it has the potential to encounter worst-case quadratic complexity, which means that in the worst case the recursive version can reach a very steep stack depth leading to high memory usage. One way to avoid this scenario is to replace the call stack with an explicit stack thus removing the recursion all together. Many library implementations of quick sort, including glibc's qsort() do exactly this.
void qs(int a[], int l, int r) {
if (r > l) {
int i = partition(a, l, r);
qs(a, l, i - 1);
qs(a, i + 1, r);
}
}
Partitioning Procedure
Quicksort falls into the family of sorting algorithms known as partition exchange sorts. Thus, to perform Quicksort you need a partitioning procedure, which arranges an array based on a pivot element, with all elements less than the pivot falling to its left, and those greater to it's right. I'm partial to Sedgewick's "crossing pointers" algorithm, which is derived from Hoare's original partitioning alogirthm:
int partition(int a[], int l, int r) {
int i = l - 1, j = r, v = a[r];
for (;;) {
while (a[++i] < v);
while (a[--j] > v);
if (i >= j) break;
exch(a, i, j);
}
exch(a,i,r);
return i;
}
You can also use Lamuto's algorithm or whichever other partitioning procedure you prefer just keep in mind some of them treat the pivoting element differently and will necessitate altering your code accordingly.
Removing Recursion
Every recursive algorithm has its iterative analog, and Quicksort is no different. As is the case with mergesort, the recursive version of Quicksort has a very simple algorithm which leads alot of developers to not want to tinker with its elegant simplicity. While removing the recursion in mergesort is done via a simple nested loop, with Quicksort it is a bit more involved. because of the way the l and r parameters are determined we need a way to save them for use in the partitioning function. A simple stack data structure is used to accomplish this.
Both Java and C++ have well known stack data structures as part of their standard libraries, But to use them would be over kill and lead to code bloat, but theres a solution to this, thanks to the dragons. GLIBC uses simple macro definitions to implement a lightweight stack, but you can get even more lightweight than that. A stack can be made from little more than an array and an integer pointing to the top item yielding a VERY simple and performant way to keep track of the array boundaries being sorted. The basic technique then follows:
void iterQuick(int a[], int l, int r) {
int *stack = new int[(r-l)+1];
int top = -1;
stack[++top] = l;
stack[++top] = r;
while (top >= 0) {
r = stack[top--];
l = stack[top--];
int i = partition(a, l, r);
if (i - 1 > l) {
stack[++top] = l;
stack[++top] = i - 1;
}
if (i + 1 < r) {
stack[++top] = i + 1;
stack[++top] = r;
}
}
}
While not as simple as removing recursion from mergesort, I personally think it is one of the more beautiful examples of manual recursion removal. But were not done yet!
Optimizations for Iterative Quicksort
With the recursion removed there are still a several optimizations we can use to squeeze a couple more horsepower from our algorithm. One such optimization is pushing the smaller array to the stack first so that the larger of the two subarrays gets sorted before the smaller. We can also observe that in the above algorithm, we are pushing 2 pairs of values to the stack, and the immediately removing one one of them. We can fix this redundant push/pop with a method similar to tail call elimination in recursive algorithms, except instead of replacing a recursive call with a loop, we replace pushing one of the pairs to the stack. This optimization is covered in Sedgewick's "Algorithms in C++". In addition, it should be noted that most optimizations that work for the recursive implementation are directly transferable to the iterative version: Insertion sorting arrays below a certain threshold being one such optimization. Proper pivot selection also goes a very long way to avoiding worst case behavior in quicksort with randomized pivot selection, Median of three, or Median of Medians can all be dropped right in.
The following code is an example of iterative quicksort with median of 3 pivot selection, small array insertion sort, and sorting larger sub array first with tail call elimination.
const int CUTOFF = 6;
void insertionsort(int a[], int l, int r) {
for (int i = l; i <= r; i++) {
int j = i, v = a[i];
while (j > l && a[j - 1] > v) {
a[j] = a[j - 1];
j--;
}
a[j] = v;
}
}
void exch(int a[], int l, int r) {
int tmp = a[l];
a[l] = a[r];
a[r] = tmp;
}
void medOf3(int a[], int l, int r) {
cout<<"Median of 3."<<endl;
int m = (l+r)/2;
if (a[l] > a[m]) exch(a, l, m);
if (a[l] > a[r]) exch(a, l, r);
if (a[m] > a[r]) exch(a, m, r);
exch(a, m, r);
}
void push(int stack[], int& top, int l, int r) {
stack[++top] = l;
stack[++top] = r;
}
void iterQuick(int a[], int l, int r) {
int *stack = new int[(r-l)+1];
int top = -1;
while (true) {
if (r - l < CUTOFF) {
insertionsort(a, l, r);
} else {
while (r > l) {
int i = partition(a, l, r);
if ((i-1) - l < r - (i+1)) {
if (i - 1 > l) push(stack, top, l, i - 1);
l = i + 1;
} else {
if (i + 1 < r) push(stack, top, i+1, r);
r = i - 1;
}
}
}
if (top < 0)
break;
r = stack[top--];
l = stack[top--];
}
}
Introsort, a hybrid of recursive Quicksort found in the C++ standard library as std::sort. Introsort works by switching to heapsort when the stack depth indicates that Quicksort is approaching its worst case. Interestingly, the optimization of pushing the smaller array onto the stack first when combined with tail call elimination gurantees the number of items on the stack will not exceed log N. As such, it would be interesting to see how much faster - if at all introsort actually is when compared to a properly tuned iterative quicksort, as introsort only switches to heapsort when the stack depth exceeds 2*logN.
-
Generating P-Code by AST Traversal
-
Ternary Search Tries: String Specific Ordered Symbol Tables
-
Digital Search Trees
-
Lossless Compression Part III: Huffman Coding
-
Lossless Compression Part II: The LZ77 Algorithm
-
Lossless Compression Part I: Working with Bits in a Byte Oriented World
-
Bottom Up AVL Tree: The OG Self-Balancing Binary Search Tree
-
A Different Take on Merge Sort: Binary Search Trees?
-
Deleting Entries From Open-Address Hash tables
-
Transform any Binary Search Tree in to a Sorted Linked List
Leave A Comment