Enforcing uniqueness of objects in std::set<> and std::unordered_set<>
Unique Collections
Knowing that a collection is comprised of unique only items if a very useful thing indeed. It's so useful that this concept is the basis of one of the fundamental data structure: the Set. While this is not the only reason for using a set data structure, it's probably the most common. Many languages offer a built-in set library, and C++ is no different. The C++ Standard Template Library offers two different implementations of Sets: std::set and std::unordered_set, which offer similar functionality but with different performance constraints. std::set is implemented via a Red/Black tree while unordered_set is implemented by way of a hash table. The main difference between the two is that while std::set keeps it's items sorted, unordered_set does not.
If you're only maintaining a set of primitive data types (int, char, float, string, etc) than either implementation will work as intended right out of the bag. But if you want to store Objects in a set, you've got a little bit more work ahead of you.
Storing Objects in std::set
Due to std::set being implemented with a Red/Black tree (most frequently at least, i've never seen an STL implementation that used a different choice) one of the required features is that it needs to have a defined operator<. You can also choose to supply std::set with a custom comparator for a different ordering.
For exampled set, lets say we are implementing a Graph data structure and we want to use a std::set to store our Edges to ensure that no duplicate edges. If we use the following Edge struct, we're going to get the kind of scrolling error messages only the STL is capable of producing...
However, by overloading operator< then everything is smooth sailing:
But what if we want some kind of ordering that the built in operators aren't capable of? Thats when the previously mentioned comparator objects come in! The following example is a comparator to keep a set of pair
Storing Objects in std::unordered_set
std::unordered_set is a bit of a different beast, because operators alone aren't going to cut it. There are three requirements for using an object as the key in an unordered_set:
- The object must support operator==
- The object must support operator!=
- The object must be capable of being hashed.
It's the third requirement that trips a lot of people up. Fear not though, with C++ there is certainly ALWAYS a way! Continuing with the second example from the previous section, i will describe how to implement an unordered_set of an std::pair of integers.
std::pair already supports equality testing (operator== and operator!=), but to be able to store them in an unordered_set we need to implement a hash function for them. In the <functional> header there is the function object std::hash that is used for generating hash codes. We are going to overload operator() of this function object to hash our integer pairs.
This approach will work for any object from which a hash function can be computed.
Closing remarks
It should be noted that the methods shown here for use with std::set and std::unordered_set are directly applicable for use with std::map and std::unorered_map as well, as they use the same underlying data structure, except for the storage of key/value pairs instead of just keys!
-
How Many Hanukkah Candles?
-
Pratt Parsing: Top-Down Operator Precedence
-
Generating P-Code by AST Traversal
-
Ternary Search Tries: String Specific Ordered Symbol Tables
-
Digital Search Trees
-
Lossless Compression Part III: Huffman Coding
-
Lossless Compression Part II: The LZ77 Algorithm
-
Lossless Compression Part I: Working with Bits in a Byte Oriented World
-
Bottom Up AVL Tree: The OG Self-Balancing Binary Search Tree
-
A Different Take on Merge Sort: Binary Search Trees?
Leave A Comment