Comb sort: Bubble sort learns from Donald Shell

mgcadmin 04-14-2022

Computer Sciences better mouse trap

In the beginning the large majority of sort implementations were the more commonly known quadratic sorts: selection sort, insertion sort, and the omnipresent bubble sort. Radix sort was in use for the physical sorting of punch cards, and while merge sort was known by the late 1940s, the small main memory of computers at the time meant it was relegated to external sorting of tape drives. The quest for better, faster sorting algorithms is almost as old as algorithmic design its self, and certainly older than the analysis of algorithm in any formal way. A flip through the list of some of the more esoteric sorts shows that alot of this energy focused on trying to improve one notoriously bad algorithm: bubble sort. Indeed, cock tail shaker sort, odd/even sort, straight exchange sort(arguably even worse than bubble sort), comb sort and many others can find trace their lineage to bubble sort.

Better Than Quadratic (But not by much)

The first sub quadratic algorithm to see wide spread adoption was shell sort. Shell sort was the brain child of Donald Shell who exploited the known flaw in insertion sort that only items close together were compared. By devising a series of dwindling gaps that ultimately ended in a gap size of 1 (plain insertion sort) shell sort was able to break the O(n²) barrier in the average case. It's from this break through that perhaps one of the more interesting, if still inefficient bubble sort varieties took its lead. Where shell sort is "Gapped insertion sort", Comb sort is "Gapped Bubble Sort". And while shell sort managed to break the quadratic barrier, comb sort still falls short. You would think that this would be wear the story ends, leaving comb sort as nothing but an interesting side note in history, and perhaps an academic exercise. Interestingly, though destiny had (slightly) bigger plans for comb sort. By some twist of fate, and of reasoning that is not easy to ascertain, comb sort managed to end up in the linux kernel code as a part of klibc. It could be that the code is meant to be invoked ONLY on very small data sets that are already mostly sorted, and area where believe it or not, bubblesort actually shines. It could be that someone was just bored, who knows. With that being said, what follows is the crux of comb sort.

Bubble sort by any other name

Comb sort is an interesting frankestein, though unlike frankensteins real monster, comb sort is ground in some basis of good idea, though good ideas arent always meant to be followed through. Regardless, someone did. Since bubble sort is the basis of comb sort, lets take a quick gander at the basic algorithm. It works like this: iterate through the array comparing each element to the one next to it, if they are out of order, swap them. repeat until the array is sorted. Were not exactly making any ground breaking discoveries but it works, and if someone with no knowledge of sorting algorithms was asked to create one, its probably along the lines of something they would come up with. In C/C++ it looks like this:

void bubblesort(int a[], int n)
{
   bool sorted = false;
   do {
      sorted = true;
      for (int i = 0; i < n - 1; i++)
      {
         if (a[i+1] < a[i])
         {
            swap(a, i, i+1);
            sorted = false;
         }
       }
    } while (!sorted);
}

I'm guessing the birth of comb sort went something like this: "One day in the late 1950's, per haps even the early 1960's, a young budding programmer was discovering the joys and wonders of sorting algorithms, and frustrated with the lack luster performance of his progress in making anything decently quick was elated to discover the work of Donald Shell and his (at the time) new "gapped insertion sort". Thinking to himself: "So the secret is to spread out the comparisons!" he had his eureka moment. Setting off to the computer lab at what for him was a quick pace but to all outside observers, was in fact an awkward limping jog, he made his way to the labs lgp30 console. After furiously clacking away at the flexoriter he had it, bubble sort. but different. bubble sort with gaps. "ITs called comb sort!" he said to no one at all. The room had emptied out, the result of the BO his jog had induced." And what did his code look like? Something like the following, except in Algol 30. Since i don't know what algol 30 is even supposed to look like, heres comb sort in C++:

void combsort(int a[], int n)

{
   bool sorted = false;
   int gap = n;
   double shrink = 1.3;

   do {
      gap = floor(gap / shrink);
      if (gap < 1)
      {
         gap = 1;
         sorted = true;
      }

      for (int i = 0; i + gap < n; i++)
      {
         if (a[i+gap] < a[i])
         {
           swap(a, i, i+gap);
           sorted = false;
         }
      }
    } while (!sorted);
}

The end. But it doesn't end there, because like anything, once the ball of research starts rolling, it tends to gain momentum.

A Better, Faster, Frankenstein

In the pages of one of the most seminal texts in Computer Science, Knuth's The Art of Computer Programming there exists an open question regarding comb sort, and some people have discovered a thing or too to turn comb sort into a rather quick sorting algorithm. It could be said that the more one tinkers with comb sort to extract the most potential from it, the less it resembles bubble sort, and the argument could be made any resemblance is in fact superficial. Performant versions of comb sort or actually more closely related to a highly modified shell sort. And with good reason. Knuth in fact never claimed that comb sort was flushed out bubble sort. He describes it as a modified "diminishing gap sort" - another name for shell sort. You see, comb sort can be made incredibly fast by ping-ponging in two directions: sorting from the bottom of the array to the top with one gap sequence, and then sorting from the top to the bottom with another gap sequence. This is akin to a cocktail shaker sort utilizing diminishing gaps, and the trip back down the array looks an awful lot like shell sort. You can also use any method one desires to calculate the gaps - even the 3*h + 1 shell sort sequence a'la Knuth. The only reason i'm using the method below for generating the gap sequence is that their is compelling evidence that the resultant algorithm is in fact an O(n log n) average case sort, based on some rather convincing math that i'm not going to get in to here. So whats the code look like with these newer modifications? Take a look: "How can something so simple perform so well?" don't just take my word for, lets run some tests. How about sorting 250,000 integers? Well compare it to both Shell sort and Merge sort.

Sorting 250k Integers
Shell Sort	159.8931 ms
Merge Sort (Bottom up)	61.7194 ms
Cocktail Comb Sort	68.8121 ms

It's pretty clear our juiced up comb sort is performing more in the range of an O(n log n) algorithm than an O(n^4/3) or O(n²) algorithm. The choice of using Bottom up merge sort for the comparison was that both Shell Sort and Cocktail Comb Sort are non-recursive sort algorithms, so i wanted to keep the comparisons to strictly iterative sorts. For those interested, you can see the exact implementations of the 3 sort algorithms tested here:

MaxGCoding.com

For the love of programming

Comb sort: Bubble sort learns from Donald Shell

Computer Sciences better mouse trap

Better Than Quadratic (But not by much)

Bubble sort by any other name

A Better, Faster, Frankenstein

Leave A Comment

Making Sense of LALR Parser Construction

Compiling Set Comprehensions to Bytecode: an exercise in managing abstractions

Fast Multi-Pattern String Searching with the Aho-Corasick Algorithm

Capture Groups: Tracking Regular Expression Sub Matches

Resolving Shift/Reduce Conflicts With Operator Precedence

Squeezing DFAs with Pair Compression

Designing Abstract Syntax Trees: Homogenous vs. Heterogenous Node Structures

LR(0) Closure: From LR Items to LR States

Calculating Follow Sets of a Context Free Grammar

Streaming Images from ESP32-CAM for viewing on a CYD-esp32