import geopandas as gpd
import pandas as pd
from geowrangler import grids
Grid Generation Tutorial
A basic introduction to geospatial grids
Basic Usage
Creating a grid from a geojson file
Import geopandas
and the geowrangler.grids
module
Load a sample geojson file into pandas
In our case, we are loading the Region 3 (Central Luzon Administrative Region) of the Philippines.
= gpd.read_file("../data/region3_admin.geojson") region3_gdf
This geopandas dataframe has the size:
length of region3_gdf: 1 row(s)
region3_gdf
Reg_Code | Reg_Name | Reg_Alt_Name | geometry | |
---|---|---|---|---|
0 | 030000000 | Region III | Central Luzon | MULTIPOLYGON (((120.11687 14.76309, 120.11684 ... |
Show the original plot
import matplotlib.pyplot as plt
= region3_gdf.plot(ax=plt.axes()) ax
# CRS info region3_gdf.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
Reprojecting Before Gridding
If you’re going to use a custom boundary for SquareGridGenerator
and FastSquareGridGenerator
, it is good practice to use the buffered boundary of the reprojected AOI instead of using just the boundary of the original AOI. This ensures that we don’t miss out on any grid cells.
We will use EPSG:3857
which is the default projection for both square grid genreator variants.
Note: The 13km buffer here was chosen because it is the minimum buffer size (rounded to the nearest km) where the boundary fully encloses the AOI.
= region3_gdf.to_crs("EPSG:3857").buffer(13_000).total_bounds
aoi_total_bounds aoi_total_bounds
array([13321499.12819502, 1608018.14989316, 13621027.63361816,
1876773.06126797])
Square Grid Generator
Generating Grids
Create a grid generator with a size of 5,000
m. The units of the grid size are dependent on the projection parameter
of the grid generator. In this case, the default is EPSG:3857
.
grids.SquareGridGenerator?
Init signature: grids.SquareGridGenerator( cell_size: float, grid_projection: str = 'EPSG:3857', boundary: Union[geowrangler.grids.SquareGridBoundary, List[float], Tuple[float]] = None, ) Docstring: <no docstring> File: ~/github-repos/geowrangler/geowrangler/grids.py Type: type Subclasses:
= grids.SquareGridGenerator(5_000) # 5 km x 5 km square cells grid_generator5k
Generate square grids > Notice the time taken to grid the multi polygon at 5K resolution
= grid_generator5k.generate_grid(region3_gdf) grid_gdf5k
CPU times: user 340 ms, sys: 35.8 ms, total: 376 ms
Wall time: 376 ms
grid_gdf5k.plot()
Show gridded version of sample geojson file at 5K resolution
length of grid_gdf5k: 1074 row(s)
= region3_gdf.plot(facecolor="none", edgecolor="blue")
ax = grid_gdf5k.plot(ax=ax, facecolor="none", edgecolor="green") ax
In addition to the grid cells, there are 2 extra columns x
and y
when combined are unique per grid. It can also tell us which grids are adjacent to each other.
grid_gdf5k.head()
x | y | geometry | |
---|---|---|---|
0 | 7 | 8 | POLYGON ((120.10024 14.75528, 120.14516 14.755... |
1 | 6 | 8 | POLYGON ((120.05533 14.75528, 120.10024 14.755... |
2 | 9 | 8 | POLYGON ((120.19008 14.75528, 120.23499 14.755... |
3 | 2 | 24 | POLYGON ((119.87566 15.4491, 119.92058 15.4491... |
4 | 2 | 25 | POLYGON ((119.87566 15.49239, 119.92058 15.492... |
= region3_gdf.plot(facecolor="none", edgecolor="blue")
ax = grid_gdf5k[grid_gdf5k["x"] == 10].plot(ax=ax, facecolor="none", edgecolor="green") ax
Gridding at a coarser resolution
Let’s try the same process but with a bigger grid (15K)
= grids.SquareGridGenerator(15_000) # 15 km x 15 km grids grid_generator15k
Generate square grids > Notice the time taken to grid the multi polygon at 15K resolution (compared to 5K resolution)
= grid_generator15k.generate_grid(region3_gdf) grid_gdf15k
CPU times: user 306 ms, sys: 33.7 ms, total: 340 ms
Wall time: 339 ms
= region3_gdf.plot(facecolor="none", edgecolor="blue")
ax =ax, facecolor="none", edgecolor="green") grid_gdf15k.plot(ax
Show gridded version of sample geojson file at 15K resolution
length of grid_gdf15k: 154 row(s)
Reusing boundaries
If you have 2 polygons that are far from each other but wish to have them follow the reference same reference x
and y
, you can use pass in custom boudaries.
Let’s load some grids that are from each other
= grid_gdf15k.head(1)
cell1 = grid_gdf15k.tail(1) cell2
pd.concat([cell1, cell2])
x | y | geometry | |
---|---|---|---|
0 | 2 | 2 | POLYGON ((120.05533 14.66839, 120.19008 14.668... |
153 | 18 | 16 | POLYGON ((122.21128 16.48548, 122.34603 16.485... |
= region3_gdf.plot(facecolor="none", edgecolor="red")
ax = grid_gdf15k.plot(ax=ax, facecolor="none", edgecolor="green")
ax = pd.concat([cell1, cell2]).plot(ax=ax) ax
= grids.SquareGridGenerator(1_000, boundary=aoi_total_bounds) grid_generator1k
We reproject cell1
and cell2
to EPSG:3857
before passing them in generate_grid
since aoi_total_bounds
is already reprojected. The projection of the AOI and the boundary should be the same.
We reproject back to EPSG:4326
after the grids are generated.
= cell1.to_crs("EPSG:3857")
cell1 = cell2.to_crs("EPSG:3857") cell2
= grid_generator1k.generate_grid(cell1)
gridcell1 = gridcell1.to_crs("EPSG:4326") gridcell1
CPU times: user 10.3 ms, sys: 1.19 ms, total: 11.5 ms
Wall time: 11.9 ms
len(gridcell1)
256
gridcell1.head()
x | y | geometry | |
---|---|---|---|
0 | 42 | 42 | POLYGON ((120.04636 14.65975, 120.05534 14.659... |
1 | 42 | 43 | POLYGON ((120.04636 14.66844, 120.05534 14.668... |
2 | 42 | 44 | POLYGON ((120.04636 14.67713, 120.05534 14.677... |
3 | 42 | 45 | POLYGON ((120.04636 14.68582, 120.05534 14.685... |
4 | 42 | 46 | POLYGON ((120.04636 14.69451, 120.05534 14.694... |
= gridcell1.plot(facecolor="none", edgecolor="green") ax
Note: We pass in the reprojected cell2
in generate_grid
since aoi_total_bounds
is already reprojected
= grid_generator1k.generate_grid(cell2)
gridcell2 = gridcell2.to_crs("EPSG:4326") gridcell2
CPU times: user 11 ms, sys: 1.86 ms, total: 12.8 ms
Wall time: 11.8 ms
len(gridcell2)
256
gridcell2.head()
x | y | geometry | |
---|---|---|---|
0 | 282 | 252 | POLYGON ((122.20231 16.47691, 122.21129 16.476... |
1 | 282 | 253 | POLYGON ((122.20231 16.48552, 122.21129 16.485... |
2 | 282 | 254 | POLYGON ((122.20231 16.49414, 122.21129 16.494... |
3 | 282 | 255 | POLYGON ((122.20231 16.50275, 122.21129 16.502... |
4 | 282 | 256 | POLYGON ((122.20231 16.51136, 122.21129 16.511... |
= gridcell2.plot(facecolor="none", edgecolor="red") ax
= region3_gdf.plot(facecolor="none", edgecolor="blue")
ax = gridcell1.plot(ax=ax, color="green")
ax = gridcell2.plot(ax=ax, color="red") ax
Generating grids for multiple distant areas
If you are using AOIs that are vary far from each other, Grid Generator already optmizes the implementation for you
Note: We also reproject sparse_aois
to EPSG:3857
since the grid_generator1k
boundary is based on EPSG:3857
= grid_gdf15k.iloc[
sparse_aois 0:1000:3,
# Get areas that far from each other
] = region3_gdf.plot(facecolor="none", edgecolor="blue")
ax = sparse_aois.plot(ax=ax, facecolor="none", edgecolor="green") ax
= sparse_aois.to_crs("EPSG:3857") sparse_aois
= grid_generator1k.generate_grid(sparse_aois)
sparse_grid = sparse_grid.to_crs("EPSG:4326") sparse_grid
= region3_gdf.plot(facecolor="none", edgecolor="blue")
ax = sparse_grid.plot(ax=ax, facecolor="none", edgecolor="green") ax
It is a bit hard to see the grids so, we get a subset to verify.
= sparse_grid.head(1000).plot(facecolor="none", edgecolor="green") ax
Fast Square Grid Generator
Generating grids
We can use FastSquareGridGenerator
for significantly faster grid generation.
= grids.FastSquareGridGenerator(5_000) # 5 km x 5 km square cells fast_grid_generator5k
= fast_grid_generator5k.generate_grid(region3_gdf)
fast_grid_gdf5k fast_grid_gdf5k.head()
CPU times: user 178 ms, sys: 46 ms, total: 224 ms
Wall time: 180 ms
x | y | geometry | |
---|---|---|---|
0 | 26 | 21 | POLYGON ((120.99856 15.31919, 120.99856 15.362... |
1 | 18 | 17 | POLYGON ((120.63923 15.14583, 120.63923 15.189... |
2 | 15 | 30 | POLYGON ((120.50449 15.7087, 120.50449 15.7519... |
3 | 7 | 26 | POLYGON ((120.14516 15.53567, 120.14516 15.578... |
4 | 26 | 39 | POLYGON ((120.99856 16.09747, 120.99856 16.140... |
= region3_gdf.plot(facecolor="none", edgecolor="blue")
ax = fast_grid_gdf5k.plot(ax=ax, facecolor="none", edgecolor="green") ax
We check that SquareGridGenerator
and FastSquareGridGenerator
have an exact match in tiles. Since they use different underlying algorithms, it isn’t a 100% guarantee that they will always have the exact same match in tiles due to floating point differences, but they should be extremely close most of the time.
0], fast_grid_gdf5k.shape[0] grid_gdf5k.shape[
(1074, 1074)
"xy_key"] = grid_gdf5k.apply(lambda row: f"{row['x']}-{row['y']}", axis=1)
grid_gdf5k["xy_key"] = fast_grid_gdf5k.apply(lambda row: f"{row['x']}-{row['y']}", axis=1)
fast_grid_gdf5k[
= grid_gdf5k["xy_key"].isin(fast_grid_gdf5k["xy_key"])
bool_mask assert bool_mask.all()
= fast_grid_gdf5k["xy_key"].isin(grid_gdf5k["xy_key"])
bool_mask assert bool_mask.all()
We can even use a more granular cell_size, say 500m, and FastSquareGridGenerator
will still run for around the same time as SquareGridGenerator
at 5000m cell_size despite the former having ~88 times more grid tiles.
We can also pass in a column name to unique_id_col
so that the grids can preserve the ID from the AOI. This makes it convenient so you don’t have to do another spatial intersection between the tiles gdf and the AOI gdf to get the AOI IDs.
= grids.FastSquareGridGenerator(500) # 500 m x 500 m square cells fast_grid_generator500
= fast_grid_generator500.generate_grid(region3_gdf, unique_id_col="Reg_Name")
fast_grid_gdf500 fast_grid_gdf500.head()
CPU times: user 1.06 s, sys: 64.3 ms, total: 1.13 s
Wall time: 1.08 s
x | y | Reg_Name | geometry | |
---|---|---|---|---|
0 | 332 | 102 | Region III | POLYGON ((121.28153 14.85081, 121.28153 14.855... |
1 | 191 | 180 | Region III | POLYGON ((120.64822 15.18919, 120.64822 15.193... |
2 | 208 | 210 | Region III | POLYGON ((120.72457 15.31919, 120.72457 15.323... |
3 | 94 | 246 | Region III | POLYGON ((120.21253 15.47508, 120.21253 15.479... |
4 | 202 | 224 | Region III | POLYGON ((120.69762 15.37982, 120.69762 15.384... |
0] fast_grid_gdf500.shape[
94133
# computes how many more tiles there are in FastSquareGridGenerator 500m vs SquareGridGenerator 5000m
0]/grid_gdf5k.shape[0] fast_grid_gdf500.shape[
87.64711359404097
H3 Grid Generator
Generating grids
Let us generate grids of resolution 5. To learn more about the different resolution, visit: https://h3geo.org/docs/core-library/restable/
= grids.H3GridGenerator(resolution=5) h3_generator
= h3_generator.generate_grid(region3_gdf) h3_5_gdf
CPU times: user 1.17 s, sys: 47.8 ms, total: 1.22 s
Wall time: 1.23 s
= region3_gdf.plot(aspect="equal")
ax = h3_5_gdf.plot(ax=ax, facecolor="none", edgecolor="blue", aspect="equal") ax
Not exporting geometry
If you do not plan to use use geometry, you can pass return_geometry=False
to just get a normal pandas DataFrame with the ids.
= grids.H3GridGenerator(resolution=5, return_geometry=False) h3_generator_no_geom
= h3_generator_no_geom.generate_grid(region3_gdf) h3_region3_no_geom
CPU times: user 1.16 s, sys: 56.5 ms, total: 1.22 s
Wall time: 1.22 s
len(h3_region3_no_geom)
84
h3_region3_no_geom.head()
hex_id | |
---|---|
0 | 85694c5bfffffff |
1 | 8569413bfffffff |
2 | 85694113fffffff |
3 | 85696b7bfffffff |
4 | 856941abfffffff |
Bing Tile Grid Generator
Generating grids
Let us generate grids of zoom_level 12. To learn more about the different resolution, visit: https://docs.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system
= grids.BingTileGridGenerator(12) bing_tile_grid_generator
# slow
= bing_tile_grid_generator.generate_grid(region3_gdf) bing_tile_gdf
CPU times: user 3.18 s, sys: 52.6 ms, total: 3.23 s
Wall time: 3.24 s
= region3_gdf.plot(facecolor="none", edgecolor="blue")
ax = bing_tile_gdf.plot(ax=ax, facecolor="none", edgecolor="green") ax
Not exporting geometry
If you do not plan to use use geometry, you can pass return_geometry=False to just get a normal pandas DataFrame with the ids.
= grids.BingTileGridGenerator(
bing_tile_grid_generator_no_geom 12, return_geometry=False
)
# slow
= bing_tile_grid_generator_no_geom.generate_grid(region3_gdf) bing_region3_keys
CPU times: user 3.18 s, sys: 45.8 ms, total: 3.22 s
Wall time: 3.22 s
bing_region3_keys.head()
quadkey | |
---|---|
0 | 132303030330 |
1 | 132303030331 |
2 | 132303030313 |
3 | 132303012302 |
4 | 132303012031 |
Converting quadkey to x,y,z format
If you need to convert the quadkey to x,y format, you can pass add_xyz_cols=True to add the x and y columns to the returned dataframe.
= grids.BingTileGridGenerator(12, add_xyz_cols=True) bing_tile_grid_generator_add_xyz
# slow
= bing_tile_grid_generator_add_xyz.generate_grid(region3_gdf) bing_region3_keys
CPU times: user 3.18 s, sys: 40.6 ms, total: 3.22 s
Wall time: 3.22 s
bing_region3_keys.head()
quadkey | x | y | z | geometry | |
---|---|---|---|---|---|
0 | 132303030330 | 3414 | 1878 | 12 | POLYGON ((120.05859 14.68988, 120.05859 14.774... |
1 | 132303030331 | 3415 | 1878 | 12 | POLYGON ((120.14648 14.68988, 120.14648 14.774... |
2 | 132303030313 | 3415 | 1877 | 12 | POLYGON ((120.14648 14.77488, 120.14648 14.859... |
3 | 132303012302 | 3412 | 1869 | 12 | POLYGON ((119.88281 15.45368, 119.88281 15.538... |
4 | 132303012031 | 3411 | 1866 | 12 | POLYGON ((119.79492 15.70766, 119.79492 15.792... |
Fast Bing Tile Grid Generator
Generating grids
Let us generate grids of zoom_level 12 using the FastBingTileGridGenerator
. This should run significantly faster than BingTileGridGenerator
= grids.FastBingTileGridGenerator(12) fast_bing_tile_grid_generator
= fast_bing_tile_grid_generator.generate_grid(region3_gdf) fast_bing_tile_gdf
CPU times: user 126 ms, sys: 41.9 ms, total: 168 ms
Wall time: 128 ms
= region3_gdf.plot(facecolor="none", edgecolor="blue")
ax = fast_bing_tile_gdf.plot(ax=ax, facecolor="none", edgecolor="green") ax
We check that BingTileGridGenerator
and FastBingTileGridGenerator
have an exact match in tiles. Since they use different underlying algorithms, it isn’t a 100% guarantee that they will always have the exact same match in tiles due to floating point differences, but they should be extremely close most of the time.
0], fast_bing_tile_gdf.shape[0] bing_tile_gdf.shape[
(319, 319)
= bing_tile_gdf["quadkey"].isin(fast_bing_tile_gdf["quadkey"])
bool_mask assert bool_mask.all()
= fast_bing_tile_gdf["quadkey"].isin(bing_tile_gdf["quadkey"])
bool_mask assert bool_mask.all()
We can even use a higher zoom_level, say 17, and FastBingTileGridGenerator
will still run significantly faster than BingTileGridGenerator
at zoom_level 12 despite the former having ~783 times more grid tiles.
We can also pass in a column name to unique_id_col
so that the grids can preserve the ID from the AOI. This makes it convenient so you don’t have to do another spatial intersection between the tiles gdf and the AOI gdf to get the AOI IDs.
= grids.FastBingTileGridGenerator(17) fast_bing_tile_grid_generator
= fast_bing_tile_grid_generator.generate_grid(region3_gdf, unique_id_col = "Reg_Name")
fast_bing_tile_gdf fast_bing_tile_gdf.head()
CPU times: user 1.82 s, sys: 109 ms, total: 1.93 s
Wall time: 1.86 s
Reg_Name | quadkey | geometry | |
---|---|---|---|
0 | Region III | 13230301313023130 | POLYGON ((120.80292 15.72088, 120.80292 15.723... |
1 | Region III | 13230301313233111 | POLYGON ((120.84961 15.64155, 120.84961 15.644... |
2 | Region III | 13230301331132213 | POLYGON ((120.90454 15.54367, 120.90454 15.546... |
3 | Region III | 13230301332000003 | POLYGON ((120.59143 15.44839, 120.59143 15.451... |
4 | Region III | 13230310112103320 | POLYGON ((122.11578 16.42818, 122.11578 16.430... |
0] fast_bing_tile_gdf.shape[
249818
# computes how many more tiles there are in FastBingTileGridGenerator zoom 17 vs BingTileGridGenerator zoom 12
0]/bing_region3_keys.shape[0] fast_bing_tile_gdf.shape[
783.128526645768