import rasterio as rio
import geowrangler.raster_process as raster_process
Raster Processing Tutorial
A basic introduction to raster processing
Summary
Cropping rasters based on the borders of a polygon. Usage of this function assumes an input raster and a geodataframe from which to extract bounds. There is also an option to crop multiple geometries at once (e.g., crop raster using bounds of each cell in a grid).
How does it work?
Returns a subset of a rasterio dataset. This function assumes that the bounding coordinates for the desired cropped rasters are present in the input geodataframe’s geometry column.
query_window_by_polygon(raster, output_folder, gdf, mask)
For cropping a raster based on boundaries of a single polygon. A technical step-by-step explanation of how query_window_by_polygon
works is detailed in the cell blocks below.
type | default | optional/ required | details | |
---|---|---|---|---|
raster | String | none | required | local filename of raster or open raster dataset |
output_folder | String | none | required | file path where cropped raster outputs will be saved |
gdf | GeoDataFrame | none | required | polygon that will become basis of boundaries for cropping |
mask | Boolean | none | required | True- Assign NULL to areas outside borders of a non-rectangular polygon False- Retain values outside borders of a non-rectangular polygon |
- Define the function and its arguments.
def query_window_by_polygon(
input_raster: Union[str, DatasetReader, PosixPath],
output_path: str,
geometry: Polygon,
mask=False,
) -> None:
- Check if raster is specified as PosixPath and convert to String.
if isinstance(input_raster, PosixPath):
input_raster = str(input_raster)
- Check if raster is path to a file or an open raster dataset.
if isinstance(input_raster,str):
input_dset = rio.open(input_raster)
else:
input_dset = input_raster
- Get the window bounds (left, right, top, bottom coordinates) from polygon geometry and check if it has the correct number of elements.
window_bounds = geometry.bounds
assert (
len(window_bounds) == 4
),
- Unroll window bounds.
left, bottom, right, top = window_bounds
- Open raster as input.
with rio.open(input_raster) as input_dst:
- Get profile of input_dst:
input_profile = input_dst.profile
- Specify window and query subset.
window = rio.windows.from_bounds(left, bottom, right, top, input_dst.transform)
subset = input_dst.read(window=window)
- Get the shape of the output subset.
number_of_bands, height, width = subset.shape
- Get the transformation of the subset based on the window.
win_transform = input_dst.window_transform(window)
- Update metadata for the output.
output_profile = input_profile.copy()
update_params = {
"count": number_of_bands,
"height": height,
"width": width,
"transform": win_transform,
}
output_profile.update(update_params)
- Write image to output file.
with rio.open(output_path, "w", **output_profile) as output_dst:
output_dst.write(subset)
output_dst.colorinterp = input_dst.colorinterp
- Apply mask according to shape of the polygon.
if mask:
with rio.open(output_path) as dst:
masked_image, masked_transform = rio.mask.mask(dst, [geometry], crop=True)
with rio.open(output_path, "w", **output_profile) as output_dst:
update_params = {
"height": masked_image.shape[1],
"width": masked_image.shape[2],
"transform": masked_transform,
}
output_profile.update(update_params)
output_dst.write(masked_image)
- Return output.
return
query_window_by_gdf(raster, output_folder, gdf, name_col, mask)
For cropping a raster based on boundaries of multiple polygons. A step-by-step explanation of how query_window_by_gdf
works is detailed in the cell blocks below.
type | default | optional/ required | details | |
---|---|---|---|---|
raster | String | none | required | local filename of raster or open raster dataset |
output_folder | String | none | required | file path where cropped raster outputs will be saved |
gdf | GeoDataFrame | none | required | polygon that will become basis of boundaries for cropping |
name_col | String | none | optional | column name to base output filepath on. if left blank, outputs will be name sequentially as ‘output_0.tif’ |
mask | Boolean | none | required | True- Assign NULL to areas outside borders of a non-rectangular polygon False- Retain values outside borders of a non-rectangular polygon |
- Define the function and create a copy of the geodataframe.
def query_window_by_gdf(
input_raster: Union[str, DatasetReader, PosixPath],
output_folder: str,
gdf,
name_col=None,
mask=False,
) -> None:
gdf = gdf.copy()
- Check if the coordinate reference systems of the raster and geodataframe match.
with rio.open(input_raster) as dst:
assert dst.meta["crs"] == gdf.crs, "input_raster and gdf CRS must match!"
- Name outputs based on values from name_col.
if name_col is None:
name_col = "name"
gdf[name_col] = "output_" + gdf.reset_index().index.astype(str) + ".tif"
else:
gdf[name_col] = gdf[name_col] + ".tif"
for i, row in gdf.iterrows():
polygon = row.geometry
output_name = row[name_col]
output_path = output_folder / output_name
print(output_path)
query_window_by_polygon(input_raster, output_path, polygon, mask)
Sample use case 1- Crop raster with single circular polygon
Input: - input_image
- raster of 2020 Philippine population - output_folder
- filepath to where output raster will be saved - circle_gdf
- GeoDataFrame containining single circular polygon - mask
- True- to assign NULL values to cells outside the circular polygon
Output: - cropped raster (GeoTiff) containing only values within circular polygon
Step 1: Import packages
Step 2: Load raster
input_image
'../data/phl_ppp_2020_constrained.tif'
= rio.open(input_image) raster
1), cmap="plasma", transform=raster.transform); show(raster.read(
Step 3: Load polygon
circle_gdf
lat | lon | geometry | |
---|---|---|---|
0 | 14.599512 | 120.984222 | POLYGON ((121.99932 14.59951, 121.99443 14.503... |
Step 4: Check if the polygon is within bounds of the raster.
= plt.subplots(1, 1, figsize=(4, 8))
fig, ax 1), cmap="viridis", ax=ax, transform=raster.transform)
show(raster.read(=ax, facecolor="none", edgecolor="yellow")
circle_gdf.plot(ax; ax
Step 5: State filepath of output folder.
= Path("../data") output_folder
Step 6: Crop raster
=True) raster_process.query_window_by_gdf(input_image, output_folder, circle_gdf, mask
../data/output_0.tif
with rio.open(output_folder / "output_0.tif") as dst:
= plt.subplots(1, 1, figsize=(4, 4))
fig, ax
1), cmap="viridis", ax=ax, transform=dst.transform)
show(dst.read(="none", edgecolor="yellow", ax=ax)
circle_gdf.plot(facecolorprint(dst.read(1))
; ax
[[-99999. -99999. -99999. ... -99999. -99999. -99999.]
[-99999. -99999. -99999. ... -99999. -99999. -99999.]
[-99999. -99999. -99999. ... -99999. -99999. -99999.]
...
[-99999. -99999. -99999. ... -99999. -99999. -99999.]
[-99999. -99999. -99999. ... -99999. -99999. -99999.]
[-99999. -99999. -99999. ... -99999. -99999. -99999.]]
Sample use case 2- Crop raster with multiple cells from a grid
Input: - input_image
- raster of 2020 Philippine population - output_folder
- filepath to where output rasters will be saved - grid_gdf
- GeoDataFrame containining multiple polygons (grid cells) - mask
- False- no need to assign NULL values because grid cells occupy full space from image center to boundaries
Output: - multiple cropped rasters (GeoTiff) split according to grid cell polygons
Step 1: Import packages
import rasterio as rio
import geowrangler.raster_process
Step 2: Load raster
input_image
'../data/phl_ppp_2020_constrained.tif'
= rio.open(input_image)
raster 1), cmap="plasma", transform=raster.transform) show(raster.read(
Step 3: Load grid
grid_gdf
x | y | geometry | name | |
---|---|---|---|---|
0 | 0 | 0 | POLYGON ((119.96913 13.61504, 120.86744 13.615... | gridxy-0-0 |
1 | 0 | 1 | POLYGON ((119.96913 14.48647, 120.86744 14.486... | gridxy-0-1 |
2 | 0 | 2 | POLYGON ((119.96913 15.35449, 120.86744 15.354... | gridxy-0-2 |
3 | 1 | 0 | POLYGON ((120.86744 13.61504, 121.76576 13.615... | gridxy-1-0 |
4 | 1 | 1 | POLYGON ((120.86744 14.48647, 121.76576 14.486... | gridxy-1-1 |
5 | 1 | 2 | POLYGON ((120.86744 15.35449, 121.76576 15.354... | gridxy-1-2 |
6 | 2 | 0 | POLYGON ((121.76576 13.61504, 122.66407 13.615... | gridxy-2-0 |
7 | 2 | 1 | POLYGON ((121.76576 14.48647, 122.66407 14.486... | gridxy-2-1 |
Step 4: Check if the grid is within bounds of the raster.
= plt.subplots(1, 1, figsize=(4, 8))
fig, ax 1), cmap="viridis", ax=ax, transform=raster.transform)
show(raster.read(=ax, facecolor="none", edgecolor="yellow")
grid_gdf.plot(ax ax
Step 5: State filepath of output folder.
= Path("../data") output_folder
Step 6: Crop raster
raster_process.query_window_by_gdf(="name", mask=False
input_image, output_folder, grid_gdf, name_col )
../data/gridxy-0-0.tif
../data/gridxy-0-1.tif
../data/gridxy-0-2.tif
../data/gridxy-1-0.tif
../data/gridxy-1-1.tif
../data/gridxy-1-2.tif
../data/gridxy-2-0.tif
../data/gridxy-2-1.tif
for name in grid_gdf["name"]:
= output_folder / (name + ".tif")
image_path with rio.open(image_path) as dst:
= plt.subplots(1, 1, figsize=(4, 4))
fig, ax
ax.set_title(image_path)1), cmap="viridis", ax=ax, transform=dst.transform) show(dst.read(