Comb sort: Bubble sort learns from Donald Shell
Computer Sciences better mouse trap
In the beginning the large majority of sort implementations were the more commonly known quadratic sorts: selection sort, insertion sort, and the omnipresent bubble sort. Radix sort was in use for the physical sorting of punch cards, and while merge sort was known by the late 1940s, the small main memory of computers at the time meant it was relegated to external sorting of tape drives. The quest for better, faster sorting algorithms is almost as old as algorithmic design its self, and certainly older than the analysis of algorithm in any formal way. A flip through the list of some of the more esoteric sorts shows that alot of this energy focused on trying to improve one notoriously bad algorithm: bubble sort. Indeed, cock tail shaker sort, odd/even sort, straight exchange sort(arguably even worse than bubble sort), comb sort and many others can find trace their lineage to bubble sort.Better Than Quadratic (But not by much)
The first sub quadratic algorithm to see wide spread adoption was shell sort. Shell sort was the brain child of Donald Shell who exploited the known flaw in insertion sort that only items close together were compared. By devising a series of dwindling gaps that ultimately ended in a gap size of 1 (plain insertion sort) shell sort was able to break the O(n2) barrier in the average case. It's from this break through that perhaps one of the more interesting, if still inefficient bubble sort varieties took its lead. Where shell sort is "Gapped insertion sort", Comb sort is "Gapped Bubble Sort". And while shell sort managed to break the quadratic barrier, comb sort still falls short. You would think that this would be wear the story ends, leaving comb sort as nothing but an interesting side note in history, and perhaps an academic exercise. Interestingly, though destiny had (slightly) bigger plans for comb sort. By some twist of fate, and of reasoning that is not easy to ascertain, comb sort managed to end up in the linux kernel code as a part of klibc. It could be that the code is meant to be invoked ONLY on very small data sets that are already mostly sorted, and area where believe it or not, bubblesort actually shines. It could be that someone was just bored, who knows. With that being said, what follows is the crux of comb sort.Bubble sort by any other name
Comb sort is an interesting frankestein, though unlike frankensteins real monster, comb sort is ground in some basis of good idea, though good ideas arent always meant to be followed through. Regardless, someone did. Since bubble sort is the basis of comb sort, lets take a quick gander at the basic algorithm. It works like this: iterate through the array comparing each element to the one next to it, if they are out of order, swap them. repeat until the array is sorted. Were not exactly making any ground breaking discoveries but it works, and if someone with no knowledge of sorting algorithms was asked to create one, its probably along the lines of something they would come up with. In C/C++ it looks like this:void bubblesort(int a[], int n) { bool sorted = false; do { sorted = true; for (int i = 0; i < n - 1; i++) { if (a[i+1] < a[i]) { swap(a, i, i+1); sorted = false; } } } while (!sorted); }I'm guessing the birth of comb sort went something like this: "One day in the late 1950's, per haps even the early 1960's, a young budding programmer was discovering the joys and wonders of sorting algorithms, and frustrated with the lack luster performance of his progress in making anything decently quick was elated to discover the work of Donald Shell and his (at the time) new "gapped insertion sort". Thinking to himself: "So the secret is to spread out the comparisons!" he had his eureka moment. Setting off to the computer lab at what for him was a quick pace but to all outside observers, was in fact an awkward limping jog, he made his way to the labs lgp30 console. After furiously clacking away at the flexoriter he had it, bubble sort. but different. bubble sort with gaps. "ITs called comb sort!" he said to no one at all. The room had emptied out, the result of the BO his jog had induced." And what did his code look like? Something like the following, except in Algol 30. Since i don't know what algol 30 is even supposed to look like, heres comb sort in C++:
void combsort(int a[], int n) { bool sorted = false; int gap = n; double shrink = 1.3; do { gap = floor(gap / shrink); if (gap < 1) { gap = 1; sorted = true; } for (int i = 0; i + gap < n; i++) { if (a[i+gap] < a[i]) { swap(a, i, i+gap); sorted = false; } } } while (!sorted); }The end. But it doesn't end there, because like anything, once the ball of research starts rolling, it tends to gain momentum.
A Better, Faster, Frankenstein
In the pages of one of the most seminal texts in Computer Science, Knuth's The Art of Computer Programming there exists an open question regarding comb sort, and some people have discovered a thing or too to turn comb sort into a rather quick sorting algorithm. It could be said that the more one tinkers with comb sort to extract the most potential from it, the less it resembles bubble sort, and the argument could be made any resemblance is in fact superficial. Performant versions of comb sort or actually more closely related to a highly modified shell sort. And with good reason. Knuth in fact never claimed that comb sort was flushed out bubble sort. He describes it as a modified "diminishing gap sort" - another name for shell sort. You see, comb sort can be made incredibly fast by ping-ponging in two directions: sorting from the bottom of the array to the top with one gap sequence, and then sorting from the top to the bottom with another gap sequence. This is akin to a cocktail shaker sort utilizing diminishing gaps, and the trip back down the array looks an awful lot like shell sort. You can also use any method one desires to calculate the gaps - even the 3*h + 1 shell sort sequence a'la Knuth. The only reason i'm using the method below for generating the gap sequence is that their is compelling evidence that the resultant algorithm is in fact an O(n log n) average case sort, based on some rather convincing math that i'm not going to get in to here. So whats the code look like with these newer modifications? Take a look: "How can something so simple perform so well?" don't just take my word for, lets run some tests. How about sorting 250,000 integers? Well compare it to both Shell sort and Merge sort.Shell Sort | 159.8931 ms |
Merge Sort (Bottom up) | 61.7194 ms |
Cocktail Comb Sort | 68.8121 ms |
Search
Recent Posts
-
Lossless Compression Part III: Huffman Coding
-
Lossless Compression Part II: The LZ77 Algorithm
-
Lossless Compression Part I: Working with Bits in a Byte Oriented World
-
Bottom Up AVL Tree: The OG Self-Balancing Binary Search Tree
-
A Different Take on Merge Sort: Binary Search Trees?
-
Deleting Entries From Open-Address Hash tables
-
Transform any Binary Search Tree in to a Sorted Linked List
-
From Regular Expressions To NFA by way of Abstract Syntax Trees: Thompsons Construction
-
Extending Sedgewick's explicit Digraph NFA
-
Iterative In-order B Tree Traversal
Meta
Leave A Comment