Quicksort Pivot Selection
I've been playing about with a visual sorting app that I made while messing about with SFML. I was adding quicksort variants, and If you know anything about quicksort's performance, than you know that it is heavily dependent upon it's pivot selection, or the value selected to divide the array around. Seeing as I was fiddling around with an algorithm visualization tool, I wanted to see what kind of impact the different pivots had on data movement in the array during sorting.
There has been a truly astonishing amount of research performed over the years in order to determine the optimal pivot selection method, and while the jury is still out on the absolute best, practical application of the various single pivot methods for quicksort has generally agreed on three "generally good choices" - four if you count "just picking one" (which actually tends to work pretty well, until it doesn't). The three most popular methods of choosing a pivot for quicksort are:
- Using a Randomized Pivot - This works by choosing a random value in the bounds of the array being sorted and use that as the pivot value.
- Choosing the Median of Three - The middle value is chosen from three values taken from the beginning, middle, and end of the array slice being sorted.
- Choosing the 'Ninther' - The median of the result of 3 Median of 3's, this is only applicable to larger arrays, but works on the same principle as Median of Three
While the randomized algorithm does work well in practice, the algorithm is at the mercy of the random number generator. Because of this non-deterministic approach, it is unsuitable for some applications. This turns out to not be much of a big deal because Median of three works just as well, without resorting to randomness.
Median Of Three
The median of three method is very popular, both for its ease of implementation and its simplicity. An array, and its bounds are supplied as input, and a middle value is calculate from the supplied bounds. The low bound, high bound and middle value are then sorted to obtain the median. In most implementations the median value is also placed into the pivot position, though I have left the three values sorted. The algorithm is then easily implemented as described:
int medianOf3(int a[], int l, int r) {
int m = (l+r)/2;
if (a[l] > a[m])
exch(a, l, m);
if (a[l] > a[r])
exch(a, l, r);
if (a[m] > a[r])
exch(a, m, r);
return m;
}
The implementation of introspective sort supplied in Go's standard library implemented the median of three algorithm for it's pivot selection, as well as many other library quicksort implementations.
Ninether
The ninether method is recursive application of median of 3, where the supplied bounds are divided into three sections, and the median of each section is compared to obtain the median of 9. In this method it is important to leave the median of 3 result in its sorted order and not to perform the swap of the median into the pivot position until you have exited the procedure. You also must be careful to choose unique values: remember that the bounds are inclusive.
Unlike the median of 3 method, this method is recursive. The base case should be obvious: returning the median of three.
int medOf9(int a[], int l, int r) {
if (r - l < 9) {
int m = (l+r)/2;
if (a[l] > a[m])
exch(a, l, m);
if (a[l] > a[r])
exch(a, l, r);
if (a[m] > a[r])
exch(a, m, r);
return m;
}
int width = (r-l)/3;
int lo = medOf9(a, l, l+width-1);
int mid = medOf9(a, l+width, r-width-1);
int end = medOf9(a, r-width, r);
if (a[lo] > a[mid])
exch(a, lo, mid);
if (a[lo] > a[end])
exch(a, lo, end);
if (a[mid] > a[end])
exch(a, mid, end);
return mid;
}
The median of 3 approach leaves the array relatively un changed. The algorithm shown above for median of 9 can end up making significant changes to the input array, and as it turns out this is in our favor. If we count the inversions in an array before and after a call to medOf9(), the number goes down - sometimes by quite a bit. Seeing as the number of inversions in an array is a measure of its sorted-ness, anything we can do to reduce the number of inversions speeds up our sort. As an example, the following is a trace of one call to medOf9(0, n-1):
int* copyArr(int a[], int n) {
int *copy = new int[n];
for (int i = 0; i < n; i++)
copy[i] = a[i];
return copy;
}
int countInversions(int a[], int aux[], int l, int r) {
if (r - l <= 1)
return 0;
int m = (l+r) / 2;
int fc = countInversions(a, aux, l, m);
int rc = countInversions(a, aux, m, r);
int ic = 0;
for (int k = l; k < r; k++) aux[k] = a[k];
int i = l, j = m, k = l;
while (i < m && j < r) {
if (aux[i] < aux[j]) {
a[k++] = aux[i++];
} else {
a[k++] = aux[j++];
ic = ic + (m - i); // <- inversion detected.
}
}
while (i < m) a[k++] = aux[i++];
while (j < r) a[k++] = aux[j++];
return fc + rc + ic;
}
int inversionCount(int a[], int n) {
int *aux = new int[n];
for (int i = 0; i < n; i++) {
aux[i] = a[i];
}
int ic = countInversions(a, aux, 0, n);
delete [] aux;
return ic;
}
int main() {
int n = 30;
int a[n];
for (int i = 0; i < n; i++) {
a[i] = rand() % 100;
}
print(a, n);
cout<<"Inversions: "<<inversionCount(copyArr(a, n), n)<<endl;
medOf3(a, 0, n-1);
cout<<"Inversions: "<<inversionCount(copyArr(a, n), n)<<endl;
print(a, n);
}
83 86 77 15 93 35 86 92 49 21 62 27 90 59 63 26 40 26 72 36 11 68 67 29 82 30 62 23 67 35
Inversions: 241
Median of 9 for bounds: 0 - 29, slice size: 9
83 86 77 15 93 35 86 92 49 21 62 27 90 59 63 26 40 26 72 36 11 68 67 29 82 30 62 23 67 35
Median of 9 for bounds: 20 - 29, slice size: 3
11 68 67 29 82 30 62 23 67 35
62 < 23 < 35 ->
Swap: 62 <-> 23
Swap: 62 <-> 35
Result: 23 < 35 < 62
Median of 3: 35
11 < 68 < 67 ->
Swap: 68 <-> 67
Result: 11 < 67 < 68
Median of 3: 67
29 < 82 < 30 ->
Swap: 82 <-> 30
Result: 29 < 30 < 82
Median of 3: 30
Returned Medians: 67 < 30 < 35
Result: 30 < 35 < 67
Final Median: 35
83 < 93 < 49 ->
Swap: 83 <-> 49
Swap: 93 <-> 83
Result: 49 < 83 < 93
Median of 3: 83
Median of 9 for bounds: 9 - 19, slice size: 3
21 62 27 90 59 63 26 40 26 72 36
40 < 26 < 36 ->
Swap: 40 <-> 26
Swap: 40 <-> 36
Result: 26 < 36 < 40
Median of 3: 36
21 < 62 < 27 ->
Swap: 62 <-> 27
Result: 21 < 27 < 62
Median of 3: 27
90 < 59 < 26 ->
Swap: 90 <-> 59
Swap: 59 <-> 26
Swap: 90 <-> 59
Result: 26 < 59 < 90
Median of 3: 59
Returned Medians: 27 < 59 < 36
Result: 27 < 36 < 59
Final Median: 36
Returned Medians: 83 < 36 < 35
Result: 35 < 36 < 83
Final Median: 36
49 86 77 15 35 35 86 92 93 21 27 62 26 36 63 90 26 59 72 40 11 30 68 29 83 82 23 67 67 62
Inversions: 227
Anyway, today's post is a short one, as that's all I've got for today. Until next time, Happy Hacking!
-
Lossless Compression Part III: Huffman Coding
-
Lossless Compression Part II: The LZ77 Algorithm
-
Lossless Compression Part I: Working with Bits in a Byte Oriented World
-
Bottom Up AVL Tree: The OG Self-Balancing Binary Search Tree
-
A Different Take on Merge Sort: Binary Search Trees?
-
Deleting Entries From Open-Address Hash tables
-
Transform any Binary Search Tree in to a Sorted Linked List
-
From Regular Expressions To NFA by way of Abstract Syntax Trees: Thompsons Construction
-
Extending Sedgewick's explicit Digraph NFA
-
Iterative In-order B Tree Traversal
Leave A Comment