Searching: Different Algorithms for Different Data Structures
Different Searches for Different Data Structures
This is another article developed from a question posed on Quora. A user asked how different data structures effect the search operation, and if so how? This is a great question, and indeed a fundamental question as it highlights the importance of choosing the right data structure for the task at hand.
Some of the data structures we will look at include linked lists, unsorted arrays, sorted arrays, and binary search trees. Hash Tables are a special case and we will deal them as well.
The right tool for the job
A perfect example that highlights the importance this kind of comparison is implementing the "Dictionary" abstract data type. A dictionary is a container for holding key.value pairs. It many languages it is a built-in datatype. Perl has %hash's, python, and PHP have their versions, Java has TreeMap and HashMap and the C++ STL has std::map, std::unordered_map, and the multi- versions of both.
Dictionary is a very simple ADT with few necessary features, The fundamental operations being insert, remove, find, and size.
Because of the way the question was posed on Quora, i'll be focusing on find(), and implementing insert() as you cant search an empty data structure (well, you CAN, but it will return a failed search every time...)
The Data Structures
C++, unlike Java does not have pure interfaces, what it DOES have is pure virtual classes, which serve the same purpose. The following virtual class is used to define our dictionary interface:
As stated earlier in the article we will be comparing searching in sorted Arrays, Linked Lists, and Binary Search Tree's, so i'm only going to show the implementation code for the find() function of each. Full code for all of the data structures are available on my github at http://github.com/bottomupmergesort/comparingDS/
A linked list give us just one option for implementing find(), sequential search:
A by using an array we can take advantage of binary search by keeping the array in sorted ordered:
If the name doesn't betray it, Binary Search Trees are a physical representation of the decision tree built using the binary search algorithm, thus, searching in a binary search tree is very similar to running binary search on a sorted array:
The performance showdown
Big Oh analysis is great, it gives a rough idea of how our algorithms will perform, based on algorithmic complexity analysis, we know that sequential search in in a linked list is an O(N) operation, binary search on an array is an O(log N) operation, and searching in a binary search tree is also an O(log N) operation. So how do they stack up against each other?
After implementing the 3 different data structures mentioned for above for our Dict class, i then ran a test where i would do 1000 inserts and 500 searches on each and time how long it takes as well as the search success to search miss performance.
After running the above program i obtained the following results:
Linked list: Inserts: 1000 Successful searches: 330 Time: 1.861 Sorted array: Inserts: 1000 Successful searches: 222 Time: 3.943 Binary Search Tree: Inserts: 0 Successful searches: 302 Time: 0.394
In order of best performance to least we get:
- Binary Search Tree
- Unordered Linked List
- Sorted Array
So what happened? Why did an unordered linked list out perform the sorted array? Believe it or not, we can look to our winner, the Binary Search Tree for some answers.
While a sorted array can exploit binary search, it cant fully exploit this advantage during inserts because the array still needs to move a large number of items for each insert.
For Lists of less than a million items or so, the advantage of binary searching an array over sequential searching a linked list aren't that great.
This is also the key to the performance of the binary search tree, because it only has to do logN compares for each insert and zero move operations its leaps and bounds above the other two options in terms of insertion speed, and it naturally keeps this advantage for searching.
It should be noted that for this example i used a regular binary search tree, if i chose to use a Red/Black Tree or AVL Tree or some other such self balancing search tree we could expect to see even more performance gains!
Lessons
Neither data strucutres, nor algorithms exist in a vacuum, and for that matter, neither do programs. One should always remember the words of Niklaus Wirth:
Data Structures + Algorithms = Programs.
By knowing the different properies of data structures and how we can exploit them with algorithms, we can develop programs that will muster the best performance possible.
Till next time, Keep Learning!
-
Pratt Parsing: Top-Down Operator Precedence
-
Generating P-Code by AST Traversal
-
Ternary Search Tries: String Specific Ordered Symbol Tables
-
Digital Search Trees
-
Lossless Compression Part III: Huffman Coding
-
Lossless Compression Part II: The LZ77 Algorithm
-
Lossless Compression Part I: Working with Bits in a Byte Oriented World
-
Bottom Up AVL Tree: The OG Self-Balancing Binary Search Tree
-
A Different Take on Merge Sort: Binary Search Trees?
-
Deleting Entries From Open-Address Hash tables
Leave A Comment