Counting Islands with Flood Fill
Sometimes when I'm bored I reach for one of the various books of programming challenges, flip to a random page and work through random problems until I've had enough. This past weekend I flipped to a random page and was greeted with the "Counting Islands" problem. The problem goes like this:
"You are a provided a map in the form of an NxM 2d array, with 0's representing water, and 1's representing Islands, If two or more 1's are adjacent to each other, it counts as one Island. Count the number of Islands that appear on the Map." An example map with 5 islands is pictured below.
00000000000000000
01101000000000000
00111000000110010
00011000000000000
00000000001110000
00100000000011000
00110000000000000
Before we start coding lets look at the information we've been provided with and see if there's anything in the wording of the problem that is either an intentional "gotcha" - or - a hint of how to proceed.
- We know that we are being supplied a map in the form of a grid, which is a special case of a graph.
- We know that we are searching this graph for areas that are similar to each other, but different from their surroundings.
- The areas that are similar to each other can be considered the same area.
To begin lets consider a simplification of the problem. How would we solve this problem if adjacent islands were considered separate islands? In that case, we would simply iterate over the array, and count the number of 1's we encounter, and that would be the number of the islands on the map.
int simplifiedAnswer(vector<vector<int>>& map) {
int nrows = map.size(), ncols = map[0].size();
for (int y = 0; y < nrows; y++) {
for (int x = 0; x < ncols; x++) {
if (map[y][x] == 1) {
numIslands++;
}
}
}
return numIslands;
}
Obviously this wont work for our problem, but it gives a starting point, because our search will proceed the same way, the difference will be in how we count '1's.
In the simplified problem above, we treated each x/y position as an individual vertex in a graph. What we need to do, is instead find the vertices that are connected to each other but not connected to other vertices. In graph theory these are called "connected components", and in our case we are counting connected components, not vertices.
Finding Connected Components
For two vertices to be considered connected there needs to be a path from one to the other. A path being a series of 1 or more edges starting from the beginning vertex to the ending vertex. This can be determined with a simple depth first traversal(breadth first will work as well).
Because our grid is an implicit graph, we generate each vertices adjacency list as we encounter them. We will be using Manhattan neighborhoods for this, meaning that a vertex can have an edge to the vertex immediately above, below, to the left, and to the right, but not the diagonals. A vertex that is on a grid boundary will not have out of bounds neighbors, so while the maximum number of neighbors is 4, some have less. The inBounds() method is use to determine if a generated coordinate is valid or not.
class CountIslands {
private:
struct point {
int x,y;
point(int _x = 0, int_y = 0) : x(_x), y(_y) { }
};
bool inBounds(int cx, int cy){
return cx >= 0 && cy >= 0 &&
cx < ncol && cy < nrow;
}
vector<vector<bool>> seen;
vector<point> dirs = { {0,-1},{1,0}, {-1,0}, {0,1}};
int nrow;
int ncol;
void floodDFS(Grid& term, int cx, int cy);
public:
CountIslands() { }
int connectedComponents(Grid& term);
};
The "dirs" vector holds offsets that are added to the position of the current vertex being visited in order to calculate the x/y positions of that vertices neighbors to visit next. The way that the search fans out in these directions to eventually occupy every reachable vertex is called a "flood fill" and is how the flood fill tool in drawing programs work.
void CountIslands::floodDFS(Grid& term, int cx, int cy) {
seen[cy][cx] = true;
for (point p : dirs) {
int nx = cx + p.x;
int ny = cy + p.y;
if (inBounds(nx, ny) && term.vertAt(ny,nx) != 0 && !seen[ny][nx])
floodDFS(term, nx, ny);
}
}
The "seen" 2d vector is used for tracking which x/y positions we have visited to avoid visiting them again. If we didn't track where we have already visited, we risk ending up stuck in cycles. While this is admittedly one of the more inefficient ways of doing this, its the simplest. Alternatively, you can use a hashtable, using x/y position as key.
int countIslands(Grid& term) {
nrow = term.rows();
ncol = term.cols();
seen = vector<vector<bool>>(nrow, vector<bool>(ncol, false));
int numIslands = 0;
for (int y = 0; y < term.rows(); y++) {
for (int x = 0; x < term.cols(); x++) {
if (term.vertAt(y,x) != 0 && !seen[y][x]) {
floodDFS(term, x, y, 5);
numIslands++;
}
}
}
return numIslands;
}
The above method will return the desired solution in O(V+E) time and O(V) space in the worst case. While we can't really do better than that while sticking to a graph based solution, there are still some optimizations that we can make. The first and most obvious improvement is to remove the recursion so as to not risk overflowing the runtime stack on really big inputs.
The two obvious choices would be 1) an explicit stack, to simply switch the algorithm to iterative depth first search, or 2) use a FIFO queue, and process the graph in Breadth First order instead.
void breadthFirstFlood(Grid& term, int sx, int sy) {
queue<point> fq;
seen[sy][sx] = true;
term.scrAt(sy, sx) = 7;
fq.push(point(sx, sy));
while (!fq.empty()) {
point curr = fq.front();
fq.pop();
for (point p : dirs) {
int nx = curr.x + p.x;
int ny = curr.y + p.y;
if (term.inBounds(nx, ny) && term.scrAt(ny,nx) != '0' && !seen[ny][nx]) {
seen[ny][nx] = true;
term.scrAt(ny, nx) = 7;
fq.push({nx, ny});
}
}
}
}
Those of you who were reading closely may have noticed that I said we couldn't the beat the stated worst case complexity with graph searching. We can however, use the union find algorithm to solve this problem as well. I'll leave that as an exercise to the reader - don't forget to use path compression and weighted ranking.
Until Next Time, Happy Hacking!
-
Ternary Search Tries: String Specific Ordered Symbol Tables
-
Digital Search Trees
-
Lossless Compression Part III: Huffman Coding
-
Lossless Compression Part II: The LZ77 Algorithm
-
Lossless Compression Part I: Working with Bits in a Byte Oriented World
-
Bottom Up AVL Tree: The OG Self-Balancing Binary Search Tree
-
A Different Take on Merge Sort: Binary Search Trees?
-
Deleting Entries From Open-Address Hash tables
-
Transform any Binary Search Tree in to a Sorted Linked List
-
From Regular Expressions To NFA by way of Abstract Syntax Trees: Thompsons Construction
Leave A Comment