Raster Processing Tutorial

A basic introduction to raster processing

Summary

Cropping rasters based on the borders of a polygon. Usage of this function assumes an input raster and a geodataframe from which to extract bounds. There is also an option to crop multiple geometries at once (e.g., crop raster using bounds of each cell in a grid).

How does it work?

Returns a subset of a rasterio dataset. This function assumes that the bounding coordinates for the desired cropped rasters are present in the input geodataframe’s geometry column.

query_window_by_polygon(raster, output_folder, gdf, mask)

For cropping a raster based on boundaries of a single polygon. A technical step-by-step explanation of how query_window_by_polygon works is detailed in the cell blocks below.

type default optional/ required details
raster String none required local filename of raster or open raster dataset
output_folder String none required file path where cropped raster outputs will be saved
gdf GeoDataFrame none required polygon that will become basis of boundaries for cropping
mask Boolean none required True- Assign NULL to areas outside borders of a non-rectangular polygon  False- Retain values outside borders of a non-rectangular polygon
  1. Define the function and its arguments.
def query_window_by_polygon(
    input_raster: Union[str, DatasetReader, PosixPath],
    output_path: str,
    geometry: Polygon,
    mask=False,
) -> None:
  1. Check if raster is specified as PosixPath and convert to String.
    if isinstance(input_raster, PosixPath):
        input_raster = str(input_raster)
  1. Check if raster is path to a file or an open raster dataset.
    if isinstance(input_raster,str):
         input_dset = rio.open(input_raster)
    else:
         input_dset = input_raster
  1. Get the window bounds (left, right, top, bottom coordinates) from polygon geometry and check if it has the correct number of elements.
    window_bounds = geometry.bounds
    assert (
        len(window_bounds) == 4
    ), 
  1. Unroll window bounds.
    left, bottom, right, top = window_bounds
  1. Open raster as input.
    with rio.open(input_raster) as input_dst:
  1. Get profile of input_dst:
        input_profile = input_dst.profile
  1. Specify window and query subset.
        window = rio.windows.from_bounds(left, bottom, right, top, input_dst.transform)
        subset = input_dst.read(window=window)
  1. Get the shape of the output subset.
        number_of_bands, height, width = subset.shape
  1. Get the transformation of the subset based on the window.
        win_transform = input_dst.window_transform(window)
  1. Update metadata for the output.
        output_profile = input_profile.copy()
        update_params = {
            "count": number_of_bands,
            "height": height,
            "width": width,
            "transform": win_transform,
        }
        output_profile.update(update_params)
  1. Write image to output file.
        with rio.open(output_path, "w", **output_profile) as output_dst:
            output_dst.write(subset)
            output_dst.colorinterp = input_dst.colorinterp
  1. Apply mask according to shape of the polygon.
    if mask:
        with rio.open(output_path) as dst:
            masked_image, masked_transform = rio.mask.mask(dst, [geometry], crop=True)
        with rio.open(output_path, "w", **output_profile) as output_dst:
            update_params = {
                "height": masked_image.shape[1],
                "width": masked_image.shape[2],
                "transform": masked_transform,
            }
            output_profile.update(update_params)
            output_dst.write(masked_image)
  1. Return output.
    return 

query_window_by_gdf(raster, output_folder, gdf, name_col, mask)

For cropping a raster based on boundaries of multiple polygons. A step-by-step explanation of how query_window_by_gdf works is detailed in the cell blocks below.

type default optional/ required details
raster String none required local filename of raster or open raster dataset
output_folder String none required file path where cropped raster outputs will be saved
gdf GeoDataFrame none required polygon that will become basis of boundaries for cropping
name_col String none optional column name to base output filepath on. if left blank, outputs will be name sequentially as ‘output_0.tif’
mask Boolean none required True- Assign NULL to areas outside borders of a non-rectangular polygon  False- Retain values outside borders of a non-rectangular polygon
  1. Define the function and create a copy of the geodataframe.
def query_window_by_gdf(
    input_raster: Union[str, DatasetReader, PosixPath],
    output_folder: str,
    gdf,
    name_col=None,
    mask=False,
) -> None:
    gdf = gdf.copy()
  1. Check if the coordinate reference systems of the raster and geodataframe match.
    with rio.open(input_raster) as dst:
        assert dst.meta["crs"] == gdf.crs, "input_raster and gdf CRS must match!"
  1. Name outputs based on values from name_col.
    if name_col is None:
        name_col = "name"
        gdf[name_col] = "output_" + gdf.reset_index().index.astype(str) + ".tif"
    else:
        gdf[name_col] = gdf[name_col] + ".tif"

    for i, row in gdf.iterrows():
        polygon = row.geometry
        output_name = row[name_col]
        output_path = output_folder / output_name
        print(output_path)
        query_window_by_polygon(input_raster, output_path, polygon, mask)

Sample use case 1- Crop raster with single circular polygon

Input: - input_image - raster of 2020 Philippine population - output_folder - filepath to where output raster will be saved - circle_gdf - GeoDataFrame containining single circular polygon - mask - True- to assign NULL values to cells outside the circular polygon

Output: - cropped raster (GeoTiff) containing only values within circular polygon

Step 1: Import packages

import rasterio as rio

import geowrangler.raster_process as raster_process

Step 2: Load raster

input_image
'../data/phl_ppp_2020_constrained.tif'
raster = rio.open(input_image)
show(raster.read(1), cmap="plasma", transform=raster.transform);

Step 3: Load polygon

circle_gdf
lat lon geometry
0 14.599512 120.984222 POLYGON ((121.99932 14.59951, 121.99443 14.503...

Step 4: Check if the polygon is within bounds of the raster.

fig, ax = plt.subplots(1, 1, figsize=(4, 8))
show(raster.read(1), cmap="viridis", ax=ax, transform=raster.transform)
circle_gdf.plot(ax=ax, facecolor="none", edgecolor="yellow")
ax;

Step 5: State filepath of output folder.

output_folder = Path("../data")

Step 6: Crop raster

raster_process.query_window_by_gdf(input_image, output_folder, circle_gdf, mask=True)
../data/output_0.tif
with rio.open(output_folder / "output_0.tif") as dst:
    fig, ax = plt.subplots(1, 1, figsize=(4, 4))

    show(dst.read(1), cmap="viridis", ax=ax, transform=dst.transform)
    circle_gdf.plot(facecolor="none", edgecolor="yellow", ax=ax)
    print(dst.read(1))
ax;
[[-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 ...
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]]

Sample use case 2- Crop raster with multiple cells from a grid

Input: - input_image - raster of 2020 Philippine population - output_folder - filepath to where output rasters will be saved - grid_gdf - GeoDataFrame containining multiple polygons (grid cells) - mask - False- no need to assign NULL values because grid cells occupy full space from image center to boundaries

Output: - multiple cropped rasters (GeoTiff) split according to grid cell polygons

Step 1: Import packages

import rasterio as rio

import geowrangler.raster_process

Step 2: Load raster

input_image
'../data/phl_ppp_2020_constrained.tif'
raster = rio.open(input_image)
show(raster.read(1), cmap="plasma", transform=raster.transform)

Step 3: Load grid

grid_gdf
x y geometry name
0 0 0 POLYGON ((119.96913 13.61504, 120.86744 13.615... gridxy-0-0
1 0 1 POLYGON ((119.96913 14.48647, 120.86744 14.486... gridxy-0-1
2 0 2 POLYGON ((119.96913 15.35449, 120.86744 15.354... gridxy-0-2
3 1 0 POLYGON ((120.86744 13.61504, 121.76576 13.615... gridxy-1-0
4 1 1 POLYGON ((120.86744 14.48647, 121.76576 14.486... gridxy-1-1
5 1 2 POLYGON ((120.86744 15.35449, 121.76576 15.354... gridxy-1-2
6 2 0 POLYGON ((121.76576 13.61504, 122.66407 13.615... gridxy-2-0
7 2 1 POLYGON ((121.76576 14.48647, 122.66407 14.486... gridxy-2-1

Step 4: Check if the grid is within bounds of the raster.

fig, ax = plt.subplots(1, 1, figsize=(4, 8))
show(raster.read(1), cmap="viridis", ax=ax, transform=raster.transform)
grid_gdf.plot(ax=ax, facecolor="none", edgecolor="yellow")
ax

Step 5: State filepath of output folder.

output_folder = Path("../data")

Step 6: Crop raster

raster_process.query_window_by_gdf(
    input_image, output_folder, grid_gdf, name_col="name", mask=False
)
../data/gridxy-0-0.tif
../data/gridxy-0-1.tif
../data/gridxy-0-2.tif
../data/gridxy-1-0.tif
../data/gridxy-1-1.tif
../data/gridxy-1-2.tif
../data/gridxy-2-0.tif
../data/gridxy-2-1.tif
for name in grid_gdf["name"]:
    image_path = output_folder / (name + ".tif")
    with rio.open(image_path) as dst:
        fig, ax = plt.subplots(1, 1, figsize=(4, 4))
        ax.set_title(image_path)
        show(dst.read(1), cmap="viridis", ax=ax, transform=dst.transform)