Range Searching
Searching and sorting are the two most common processes performed on a computer, and for that reason a lot of research has gone into the various methods of performing those tasks. Range searching is an expansion of general searching, where instead of searching a collection for a specific element, you are interested in all of the elements whos value falls within a specified upper and lower bound, inclusively.
As an example, lets say you want to buy used a car, something not crazy expensive, but also not a complete piece of junk. This means that you have two constrains for your search, the upper and lower price of the car. You only have a maximum of $18,000 to spend on the car, but at the same time, you want a car with low miles, no accidents, and well maintained, which means for the cars you're interested cost at least $13,000. Searching for all cars less than $18,000 (the general search), would return the list of cars you're looking for, but would be cluttered by car's that don't mean you're other criteria. A ranged search of all cars less than $18,000, but more than $13,000 is used to reduce the size of the result set while increasing the quality of the data that is returned. Adding additional criteria to the search, such as color of the car, engine type, etc. increases the number of dimensions.
The simplest form of range searching is one dimensional, where numbers fall between the provided bounds based on just on criteria. however the technique can be generalized and expanded into N-dimensions. In this post I will be using the example of searching a used car database to implement range searching algorithms for various data structures.
Records And Databases And... Graphics?
The concept of range searching, in particular multi-dimensional range searching as described above is a natural and straight forward way of describing complex searching in databases. In this case, "database" meaning a collection of multi-attribute items that facilitates searching by single or combinations of those attributes. For our example we'll be using the following class as our database record type:
class Record {
private String make;
private String model;
private String year;
private String color;
private Double price;
//Getters/Setters/Constructor etc
}
It may come as somewhat of a surprise then to learn that many of the techniques used were originally developed for use in geometric algorithms. k-D trees in particular.
Warming up with 1-D Range Search
Our database will use a tree based a structure as the underlying implementation for storing and searching. For a 1-D range search, a regular binary search tree will work fine, though you should use some form of self-balancing scheme like AVL or Red/Black trees. At higher dimensions, A K-d tends to stay well balanced of its own accord.
We will warm up by first implementing a binary search tree to demonstrate the idea in 1-dimension, and then turn that BST into a k-D tree, as k-D tree's are also binary trees.
public class CarDB {
class Node {
DBRecord record;
Node left;
Node right;
Node(DBRecord record) {
this.record = record;
}
}
private Node head;
private int recordCount;
CarDB(List<DBRecord> cars, String onKey) {
for (DBRecord record : cars) {
insert(record, onKey);
}
}
public void insert(DBRecord record, String onKey) {
insertR(record, onKey);
recordCount++;
}
public ArrayList<DBRecord> search(String onKey, DBRecord v1, DBRecord v2) {
return rangeSearch(head, onKey, v1, v2);
}
}
You're probably wondering what the "onKey" parameter is. It is used to determine which key in the record to perform the current task on. so if we pass "make" as the onKey value, the binary search tree will compare the make fields while building the tree. It is used to to drive the following comparator function:
private int compareOnKey(DBRecord a, DBRecord b, String onKey) {
switch (onKey) {
case "make" -> { return a.getMake().compareTo(b.getMake()); }
case "model" -> { return a.getModel().compareTo(b.getModel()); }
case "year" -> { return a.getYear().compareTo(b.getYear()); }
case "price" -> { return a.getPrice().compareTo(b.getPrice()); }
case "miles" -> { return a.getMiles().compareTo(b.getMiles()); }
};
return 0;
}
The code for insertion then follows the normal BST insertion:
private void insertR(DBRecord record, String onKey) {
Node x = head;
Node p = x;
while (x != null) {
p = x;
x = (compareOnKey(record, x.record, onKey) < 0) ? x.left:x.right;
}
x = new Node(record);
if (p == null) {
head = x;
} else {
if (compareOnKey(record, p.record, onKey) < 0) {
p.left = x;
} else {
p.right = x;
}
}
}
We will use the following list of cars and their attributes to build our database:
- 2016 Audi A4 black, 42000mi, $15500.0
- 2011 BMW M3 silver, 6000mi, $17500.0
- 2014 Cheverolet Camaro Yellow, 50000mi, $10500.0
- 2017 Subaru WRX STi blue, 65000mi, $14000.0
- 2015 Ford Mustang black, 30000mi, $13500.0
- 2008 Nissan Altima black, 107000mi, $5000.0
- 2001 Honda Civic white, 137590mi, $2300.0
Let's take a moment to talk about algorithmic complexity. We could of course, avoid building a tree all together and perform a 1-d range search on on a unordered list in O(n). Using an ordered search tree however, we can reduce O(N) to O(log N + M) where M is the number of results, so long as we are searching on the same key the tree is ordered on, otherwise it is still O(N) when using a binary search tree.
public ArrayList<DBRecord> search(String onKey, DBRecord v1, DBRecord v2) {
return rangeSearch(head, onKey, v1, v2);
}
ArrayList<DBRecord> rangeSearch(Node curr, String onKey, DBRecord v1, DBRecord v2) {
if (curr == null) {
return new ArrayList<DBRecord>();
}
ArrayList<DBRecord> results = new ArrayList<>();
boolean tx1 = compareOnKey(curr.record, v1, onKey) >= 0;
boolean tx2 = compareOnKey(curr.record, v2, onKey) <= 0;
if (tx1) results.addAll(rangeSearch(curr.left, onKey, v1, v2));
if (tx1 && tx2) { results.add(curr.record); }
if (tx2) results.addAll(rangeSearch(curr.right, onKey, v1, v2));
return results;
}
I use 2 DBRecords with the field we are searching on set to our low and high values are to supply the upper and lower bounds for our search as well as the onKey field to aid the compareOnKey method.
The following code will populate our data base with above list, building the tree on the "price" field of the record as the sort key, and then return a list of records for cars who's price fall in our desired range.
public class RangeEx {
public static void main(String[] args) {
List<DBRecord> cars = List.of(new DBRecord().setMake("Audi").setModel("A4")
.setYear(2016).setColor("black").setMiles(42000).setPrice(15500.00),
new DBRecord().setMake("BMW").setModel("M3")
.setYear(2011).setColor("silver").setMiles(6000).setPrice(17500.00),
new DBRecord().setMake("Cheverolet").setModel("Camaro")
.setYear(20014).setColor("Yellow").setMiles(50000).setPrice(10500.00),
new DBRecord().setMake("Subaru").setModel("WRX STi")
.setYear(2017).setColor("blue").setMiles(65000).setPrice(14000.00),
new DBRecord().setMake("Ford").setModel("Mustang")
.setYear(2015).setColor("black").setMiles(30000).setPrice(13500.00),
new DBRecord().setMake("Nissan").setModel("Altima").setMiles(107000).setPrice(5000.00)
.setYear(2008).setColor("black"),
new DBRecord().setMake("Honda").setModel("Civic").setMiles(137590).setPrice(2300.00)
.setYear(2001).setColor("white"));
CarDB cardb = new CarDB(cars, "price");
DBRecord lowerBound = new DBRecord().setPrice(13000.00);
DBRecord upperBound = new DBRecord().setPrice(18000.00)
for (DBRecord result : cardb.search("price", lowerBound, upperBound)) {
System.out.println(result);
}
}
}
When we run the above query, we are returned the following list:
2015 Ford Mustang
black, 30000mi, $13500.0
2017 Subaru WRX STi
blue, 65000mi, $14000.0
2016 Audi A4
black, 42000mi, $15500.0
2011 BMW M3
silver, 6000mi, $17500.0
As you can see it has trimmed the list of cars to only those falling between $13,000 and $18.000, which is what we were interested in. On a list of a few dozen, or maybe a hundred cars, this may be enough to whittle the list down to a reasonable size. On a larger list though, we would more than likely wish to refine our search further still.
If we wanted to add another parameter while still using a binary search tree, we would either need to settle for O(n) search times, or we could take the result set from our query, build a new tree from that list, and run our additional parameter search on the resulting tree. Obviously, neither of these is an ideal situation, and that's where k-d trees come in. So check back soon for part 2, where I'll be covering the construction and use of k-d search trees. Until next time, Happy Hacking!
Leave A Comment