Counting Islands with Flood Fill

mgcadmin 12-25-2023

Sometimes when I'm bored I reach for one of the various books of programming challenges, flip to a random page and work through random problems until I've had enough. This past weekend I flipped to a random page and was greeted with the "Counting Islands" problem. The problem goes like this:

"You are a provided a map in the form of an NxM 2d array, with 0's representing water, and 1's representing Islands, If two or more 1's are adjacent to each other, it counts as one Island. Count the number of Islands that appear on the Map." An example map with 5 islands is pictured below.

00000000000000000 
01101000000000000 
00111000000110010 
00011000000000000
00000000001110000 
00100000000011000  
00110000000000000

Before we start coding lets look at the information we've been provided with and see if there's anything in the wording of the problem that is either an intentional "gotcha" - or - a hint of how to proceed.

We know that we are being supplied a map in the form of a grid, which is a special case of a graph.
We know that we are searching this graph for areas that are similar to each other, but different from their surroundings.
The areas that are similar to each other can be considered the same area.

To begin lets consider a simplification of the problem. How would we solve this problem if adjacent islands were considered separate islands? In that case, we would simply iterate over the array, and count the number of 1's we encounter, and that would be the number of the islands on the map.

int simplifiedAnswer(vector<vector<int>>& map) {
     int nrows = map.size(), ncols = map[0].size();
     for (int y = 0; y < nrows; y++) {
       for (int x = 0; x < ncols; x++) {
          if (map[y][x] == 1) {
             numIslands++;
          }
       }
    }
    return numIslands; 
}

Obviously this wont work for our problem, but it gives a starting point, because our search will proceed the same way, the difference will be in how we count '1's.

In the simplified problem above, we treated each x/y position as an individual vertex in a graph. What we need to do, is instead find the vertices that are connected to each other but not connected to other vertices. In graph theory these are called "connected components", and in our case we are counting connected components, not vertices.

Finding Connected Components

For two vertices to be considered connected there needs to be a path from one to the other. A path being a series of 1 or more edges starting from the beginning vertex to the ending vertex. This can be determined with a simple depth first traversal(breadth first will work as well).

Because our grid is an implicit graph, we generate each vertices adjacency list as we encounter them. We will be using Manhattan neighborhoods for this, meaning that a vertex can have an edge to the vertex immediately above, below, to the left, and to the right, but not the diagonals. A vertex that is on a grid boundary will not have out of bounds neighbors, so while the maximum number of neighbors is 4, some have less. The inBounds() method is use to determine if a generated coordinate is valid or not.

class CountIslands {
      private:
         struct point {
             int x,y;
             point(int _x = 0, int_y = 0) : x(_x), y(_y) { }
         };
        bool inBounds(int cx, int cy){
            return cx >= 0 && cy >= 0 &&
                   cx < ncol && cy < nrow;
        }            
        vector<vector<bool>> seen;
        vector<point> dirs = { {0,-1},{1,0}, {-1,0}, {0,1}};
        int nrow;
        int ncol;
        void floodDFS(Grid& term, int cx, int cy);
     public:
        CountIslands() { }
        int connectedComponents(Grid& term);
};

The "dirs" vector holds offsets that are added to the position of the current vertex being visited in order to calculate the x/y positions of that vertices neighbors to visit next. The way that the search fans out in these directions to eventually occupy every reachable vertex is called a "flood fill" and is how the flood fill tool in drawing programs work.

void CountIslands::floodDFS(Grid& term, int cx, int cy) {
     seen[cy][cx] = true;
     for (point p : dirs) {
         int nx = cx + p.x;
         int ny = cy + p.y;
         if (inBounds(nx, ny) && term.vertAt(ny,nx) != 0 && !seen[ny][nx])
              floodDFS(term, nx, ny);
     }
}

The "seen" 2d vector is used for tracking which x/y positions we have visited to avoid visiting them again. If we didn't track where we have already visited, we risk ending up stuck in cycles. While this is admittedly one of the more inefficient ways of doing this, its the simplest. Alternatively, you can use a hashtable, using x/y position as key.

int countIslands(Grid& term) {
     nrow = term.rows();
     ncol = term.cols();
     seen = vector<vector<bool>>(nrow, vector<bool>(ncol, false));
     int numIslands = 0;
     for (int y = 0; y < term.rows(); y++) {
         for (int x = 0; x < term.cols(); x++) {
            if (term.vertAt(y,x) != 0 && !seen[y][x]) {
                floodDFS(term, x, y, 5);
                numIslands++;
            }
         }
      }
      return numIslands;
  }

The above method will return the desired solution in O(V+E) time and O(V) space in the worst case. While we can't really do better than that while sticking to a graph based solution, there are still some optimizations that we can make. The first and most obvious improvement is to remove the recursion so as to not risk overflowing the runtime stack on really big inputs.

The two obvious choices would be 1) an explicit stack, to simply switch the algorithm to iterative depth first search, or 2) use a FIFO queue, and process the graph in Breadth First order instead.

void breadthFirstFlood(Grid& term, int sx, int sy) {
    queue<point> fq;
    seen[sy][sx] = true;
    term.scrAt(sy, sx) = 7;
    fq.push(point(sx, sy));
    while (!fq.empty()) {
       point curr = fq.front();
       fq.pop();
       for (point p : dirs) {
         int nx = curr.x + p.x;
         int ny = curr.y + p.y;
         if (term.inBounds(nx, ny) && term.scrAt(ny,nx) != '0' && !seen[ny][nx]) {
                    seen[ny][nx] = true;
                    term.scrAt(ny, nx) = 7;
                    fq.push({nx, ny});
           }
       }
    }
 }

Those of you who were reading closely may have noticed that I said we couldn't the beat the stated worst case complexity with graph searching. We can however, use the union find algorithm to solve this problem as well. I'll leave that as an exercise to the reader - don't forget to use path compression and weighted ranking.

Until Next Time, Happy Hacking!

MaxGCoding.com

For the love of programming

Counting Islands with Flood Fill

Finding Connected Components

Leave A Comment

Top-Down Deletion for Red/Black Trees

Implementing Closures in Bytecode VMs: Heap Allocated Activation Records & Access Links

Pascal & Bernoulli & Floyd: Triangles

A Quick tour of MGCLex

Compiling Regular Expressions for "The VM Approach"

Composable Linked Digraphs: An efficient NFA Data Structure for Thompsons Construction

Improving the Space Efficiency of Suffix Arrays

Augmenting B+ Trees For Order Statistics

Top-Down AST Construction of Regular Expressions with Recursive Descent

Balanced Deletion for in-memory B+ Trees