I was converting a Matlab script to Python and needed to replicate the Matlab imfill function to fill holes/depressions in floating point rasters.

To determine the algorithm that imfill uses, I followed the reference in the imfill documentation:

*Soille, P., Morphological Image Analysis: Principles and Applications, Springer-Verlag, 1999, pp. 173-174*. The 2nd edition, published in 2004 has a small description on hole filling in section 6.3.7 (pg 208). The key point is that the input image (referred to as the mask image) is set to the maximum value of the image except along the border (referred to as the marker image), and then morphologically eroded. The process is explained in more detail in

*Soile, P., Vogt, J., and Colombo, R., 2003. Carving and Adaptive Drainage Enforcement of Grid Digital Elevation Models. Water Resources Research, 39(12), 1366*. The erosion operation outputs the minimum value in the neighboring cells of a point (defined using a structuring element that is typically a 3x3 square or a cross). The filled image is found by taking the maximum of either the erosion of the marker cell (the minimum value of the neighboring marker cells) or the original mask cell value.

The SciPy ndimage module has morphology tools. There is a binary hole filling function, but no floating point hole fill function. There is a grey_erosion function, and the following Python function mimics imfill using the grey_erosion function:

import numpy as np from scipy import ndimage

def flood_fill(test_array, four_way=False):

""""""

input_array = np.copy(test_array)

# Set h_max to a value larger than the array maximum to ensure # that the while loop will terminate h_max = np.max(input_array * 2.0)

# Build mask of cells with data not on the edge of the image # Use 3x3 square structuring element

# Build Structuring element only using NumPy module # Structuring element could also be built using SciPy ndimage module # el = ndimage.generate_binary_structure(2,2).astype(np.int) data_mask = np.isfinite(input_array) inside_mask = ndimage.binary_erosion( data_mask, structure=np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]]).astype(np.bool)) edge_mask = (data_mask & ~inside_mask) # Initialize output array as max value test_array except edges output_array = np.copy(input_array) output_array[inside_mask] = h_max # Array for storing previous iteration output_old_array = np.copy(input_array) output_old_array.fill(0) # Cross structuring element if four_way: el = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]]).astype(np.bool) # el = ndimage.generate_binary_structure(2, 1).astype(np.int) else: el = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]]).astype(np.bool) # el = ndimage.generate_binary_structure(2, 2).astype(np.int) # Iterate until marker array doesn't change while not np.array_equal(output_old_array, output_array): output_old_array = np.copy(output_array) output_array = np.maximum( input_array, ndimage.grey_erosion(output_array, size=(3, 3), footprint=el)) return output_arrayThis function is very slow though, because the calculation starts on the edge of the image and moves in. Only a small number of cells are changing but the entire array is being processed for each iteration. The function was used to fill holes on a 2206x2986 image with numerous large depressions and took 690 seconds to run.

The Soille (2004) text and the Soille et al. (2003) article mentioned that a fast algorithm that uses priority queues is proposed in

*Soille and Gratin, 1994. An Efficient Algorithm for Drainage Networks Extraction on DEMs. Journal of Visual Communication and Image Representation, 5(2), 181-189*. This article was difficult to obtain but it turns out that the article

*Liu et al., 2009. Another Fast and Simple DEM Depression-Filling Algorithm Based on Priority Queue Structure. Atmopsheric and Oceanic Science Letters, 2(4) 214-219*presents an algorithm that is nearly identical (but with no mention or citation of the Soile and Gratin (1994) article!). The article is available with a quick Google search. The basic idea of the method is the same as the previous algorithm, but instead of processing every cell each iteration, the cells with the lowest value are processed first. Conceptually, from the edge the algorithm works upslope until it hits a depression, at which point the depression is filled with the current value (h_crt) until a higher point is reached.

The following python function implements the priority queue based flood-fill algorithm:

import heapq import numpy as np from scipy import ndimage

def flood_fill(test_array, four_way=False):

""""""

input_array = np.copy(test_array) input_rows, input_cols = input_array.shape # Set h_max to a value larger than the array maximum to ensure # that the while loop will terminate h_max = np.max(input_array*2.0) # Build mask of cells with data not on the edge of the image # Use 3x3 square structuring element inside_mask = ndimage.binary_erosion( np.isfinite(input_array),

structure=ndimage.generate_binary_structure(2, 2).astype(np.int)) # Initialize output array as max value test_array except edges output_array = np.copy(input_array) output_array[inside_mask] = h_max # Build priority queue and place edge pixels into priority queue # Last value is flag to indicate if cell is an edge cell put = heapq.heappush get = heapq.heappop fill_heap = [ (output_array[t_row, t_col], int(t_row), int(t_col), True) for t_row, t_col in np.transpose(np.where(edge_mask))] heapq.heapify(fill_heap)

def neighbors(row, col, four_way=False): """Return indices of adjacent cells""" if four_way: return [ (row - 1, col), (row, col + 1), (row + 1, col), (row, col - 1)] else: return [ (row - 1, col), (row - 1, col + 1), (row, col + 1), (row + 1, col + 1), (row + 1, col), (row + 1, col - 1), (row, col - 1), (row - 1, col - 1)]

# Iterate until priority queue is empty while True: try:

h_crt, t_row, t_col, edge_flag = get(fill_heap) except IndexError:

break for n_row, n_col in neighbors(t_row, t_col, four_way): # Skip cell if outside array edges if edge_flag: try: if not inside_mask[n_row, n_col]:

continue except IndexError:

continue if output_array[n_row, n_col] == h_max: output_array[n_row, n_col] = max( h_crt, input_array[n_row, n_col]) put(fill_heap, (output_array[n_row, n_col], n_row, n_col, False)) return output_arrayThis function only took 61 seconds to fill the same test image, which is a considerable improvement, but imfill only takes ~3 seconds, so there is still a long way to go. I will probably try implementing the function in C++ next.

A number of things helped speed up the algorithm. The biggest gain was saving 180 seconds by using Python Ints instead of NumPy Ints in the heap data tuple (cell_value, row, col, edge_flag). Other improvements included using the heapq module instead of Queue.PriorityQueue() (saved 47s), using the python max function instead of the numpy max function (saved 25s), creating local copies of the function references heappush and heappop (saved 2s), and only checking if the neighboring pixels were in the image for edge pixels using the edge_flag (saved 4s). Last, I set the max value of the marker array to be greater than the max value of the input array so that the while loop would terminate without checking the value each time.