Vector Zonal Stats

generate vector zonal stat features

Aggregations

In order to generate zonal stats for an area (or areas) of interest (aoi) , we have come up with the concept of an aggregation specification or agg spec, which is a way to specify what aggregation functions (such as count,sum, mean,std etc.) are to be applied to columns in the source dataframe (data).

The method create_zonal_stats can then take in a list of these agg specs and apply them to create zonal stats from the data for the aoi.

Each agg spec consists of a dict with the following keys:

func: (Required) a str or a list [str] of aggregation functions. See the pandas documentation for agg
column: (Optional) an existing column in the data to generate the zonal statistic from. If not specified, the grouping key based on the index of the aoi applied to the data is used as default.
output: (Optional) a str or a list [str] of the name(s) of the output zonal statistic column. If not specified it is concatenated from the column and func i.e. {column}_{func} (e.g. 'func':'mean' on 'column':'population' has a default value 'output':'population_mean')
fillna: (Optional) a bool or a list [bool] of the flag(s) that indicates whether to to a fillna(0) step for the new zonal column, True meaning it will set any NA values in the resulting zonal stat to 0, and False will retain any NA values. The default value of the flag(s) is False.

Examples

The simplest aggregation spec. This will result in an output column named index_count as it will use the aoi index as the default column.

{"func":"count"}

The sum function is applied to the data column population which will create an output column named total_population.

{
 "func:"sum",
 "column": "population",
 "output": "total_population"
}

Compute the zonal stats mean,sum,max on the population column and rename the output columns (by default) to population_mean, population_sum and population_max.

{
 "func": ["mean","sum","max"],
 "column": "population",
}

A full aggregation spec with fillna. fillna == False for std means it will remain an NA if there is no data for the column in the group. The default value for fillna is True which means that 0 is used to replace any NA in the output column.

{
 "func": ["mean", "sum", "std"],
 "column":"population",
 "output": ["avg_pop", "total_pop", "std_dev"],
 "fillna": [True,True,False],
}

The agg spec in the list of aggregations can contain the same columns, but the output columns must be unique since they will added as columns in the results.

Regular and Grid Zonal Stats

Vector zonal stats for user defined areas and grids (e.g. Admin areas)

source

create_zonal_stats

 create_zonal_stats (aoi:geopandas.geodataframe.GeoDataFrame,
                     data:geopandas.geodataframe.GeoDataFrame,
                     aggregations:List[Dict[str,Any]],
                     overlap_method:str='intersects')

Create zonal stats for area of interest from data using aggregration operations on data columns. Returns the same aoi with additional columns containing the computed zonal features.

	Type	Default	Details
aoi	GeoDataFrame		Area of interest for which zonal stats are to be computed for
data	GeoDataFrame		Source gdf containing data to compute zonal stats from
aggregations	List		List of agg specs, with each agg spec applied to a data column
overlap_method	str	intersects	spatial predicate to used in spatial join of aoi and data geopandas.sjoin for more details
Returns	GeoDataFrame		categorical_column_options: str = None,

simple_aoi  # sample aoi

	col1	lat0	lat1	lon1	lat2	lon2	lat3	geometry
0	1	0.0	0.0	1.0	1.0	1.0	1.0	POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))
1	2	1.0	1.0	1.0	2.0	1.0	2.0	POLYGON ((1 0, 1 1, 2 1, 2 0, 1 0))
2	3	2.0	2.0	1.0	3.0	1.0	3.0	POLYGON ((2 0, 2 1, 3 1, 3 0, 2 0))

simple_data  # sample data

	col1	lat	lon	geometry
0	1	0.50	0.50	POINT (0.5 0.5)
1	2	1.50	0.50	POINT (1.5 0.5)
2	3	2.50	0.50	POINT (2.5 0.5)
3	4	0.45	0.50	POINT (0.45 0.5)
4	5	1.45	0.50	POINT (1.45 0.5)
5	6	2.45	0.50	POINT (2.45 0.5)
6	7	0.45	0.45	POINT (0.45 0.45)
7	8	1.45	0.45	POINT (1.45 0.45)
8	9	2.45	0.45	POINT (2.45 0.45)
9	10	0.45	1.45	POINT (0.45 1.45)
10	11	1.45	1.45	POINT (1.45 1.45)
11	12	2.45	1.45	POINT (2.45 1.45)

ax = simple_aoi.plot(
    ax=plt.axes(), facecolor="none", edgecolor=["red", "blue", "green"]
)
ax = simple_data.plot(ax=ax, color="purple")

results = create_zonal_stats(simple_aoi, simple_data, aggregations=[{"func": "count"}])
results

	col1	lat0	lat1	lon1	lat2	lon2	lat3	geometry	index_count
0	1	0.0	0.0	1.0	1.0	1.0	1.0	POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))	3
1	2	1.0	1.0	1.0	2.0	1.0	2.0	POLYGON ((1 0, 1 1, 2 1, 2 0, 1 0))	3
2	3	2.0	2.0	1.0	3.0	1.0	3.0	POLYGON ((2 0, 2 1, 3 1, 3 0, 2 0))	3

Index name is not none

named_index_aoi = simple_aoi.copy()
named_index_aoi.index.name = "myindex"

named_index_aoi

	col1	lat0	lon0	lat1	lon1	lat2	lon2	lat3	lon3	geometry
myindex
0	1	0.0	0.0	0.0	1.0	1.0	1.0	1.0	0.0	POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))
1	2	1.0	0.0	1.0	1.0	2.0	1.0	2.0	0.0	POLYGON ((1 0, 1 1, 2 1, 2 0, 1 0))
2	3	2.0	0.0	2.0	1.0	3.0	1.0	3.0	0.0	POLYGON ((2 0, 2 1, 3 1, 3 0, 2 0))

named_index_results = create_zonal_stats(
    named_index_aoi, simple_data, aggregations=[{"func": "count"}]
)

named_index_results.head()

	col1	lat0	lon0	lat1	lon1	lat2	lon2	lat3	lon3	geometry	index_count
myindex
0	1	0.0	0.0	0.0	1.0	1.0	1.0	1.0	0.0	POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))	3
1	2	1.0	0.0	1.0	1.0	2.0	1.0	2.0	0.0	POLYGON ((1 0, 1 1, 2 1, 2 0, 1 0))	3
2	3	2.0	0.0	2.0	1.0	3.0	1.0	3.0	0.0	POLYGON ((2 0, 2 1, 3 1, 3 0, 2 0))	3

Bing Map Tile Grid Zonal Stats

Generating zonal stats for Bing tile grid AOIs

source

compute_quadkey

 compute_quadkey (data:geopandas.geodataframe.GeoDataFrame,
                  zoom_level:int, quadkey_column:str='quadkey')

Computes the quadkeys for the geometries of the data. If geometries are not points, the quadkeys are computed from the centroids of the geometries.

	Type	Default	Details
data	GeoDataFrame		The geodataframe
zoom_level	int		The quadkey zoom level (1-23)
quadkey_column	str	quadkey	The name of the quadkey output column
Returns	GeoDataFrame

If our existing data geodataframe doesn’t have quadkeys, we can use the compute_quadkey to generate the quadkeys for the centroid of the data’s geometries.

DATA_ZOOM_LEVEL = 19
AOI_ZOOM_LEVEL = 9
simple_data_quadkey = compute_quadkey(simple_data, DATA_ZOOM_LEVEL)

CPU times: user 4.32 ms, sys: 684 µs, total: 5 ms
Wall time: 4.98 ms

simple_data_quadkey.head()

	col1	lat	lon	geometry	quadkey
0	1	0.50	0.5	POINT (0.5 0.5)	1222222221211211222
1	2	1.50	0.5	POINT (1.5 0.5)	1222222320210201222
2	3	2.50	0.5	POINT (2.5 0.5)	1222222331200311222
3	4	0.45	0.5	POINT (0.45 0.5)	1222222221210201333
4	5	1.45	0.5	POINT (1.45 0.5)	1222222320200311333

source

create_bingtile_zonal_stats

 create_bingtile_zonal_stats (aoi:pandas.core.frame.DataFrame,
                              data:pandas.core.frame.DataFrame,
                              aggregations:List[Dict[str,Any]],
                              aoi_quadkey_column:str='quadkey',
                              data_quadkey_column:str='quadkey')

	Type	Default	Details
aoi	DataFrame		An aoi with quadkey column
data	DataFrame		Data with quadkey column
aggregations	List		List of agg specs, with each agg spec applied to a data column
aoi_quadkey_column	str	quadkey	Column name of aoi quadkey
data_quadkey_column	str	quadkey	Column name of data quadkey
Returns	DataFrame

To create bingtile zonal stats, we need to compute the quadkeys for the areas of interest (AOI) and the data.

The geowrangler.grids module provides a BingTileGridGenerator that will generate the quadkeys for the areas covered by your AOIs.

import geowrangler.grids as gr

bgtile_generator = gr.BingTileGridGenerator(AOI_ZOOM_LEVEL)
simple_aoi_bingtiles = bgtile_generator.generate_grid(simple_aoi)

Using the data with the computed quadkeys, we can generate zonal stats for our bingtile grid aois. This just uses the regular pandas grouping and merging function and skips any geospatial joins which results in faster computation.

bingtile_results = create_bingtile_zonal_stats(
    simple_aoi_bingtiles,
    simple_data_quadkey,
    aggregations=[dict(func="count", fillna=True)],
)

CPU times: user 5.02 ms, sys: 438 µs, total: 5.46 ms
Wall time: 6.08 ms

bingtile_results

	quadkey	geometry	index_count
0	122222220	POLYGON ((0 0.70311, 0 1.40611, 0.70312 1.4061...	0.0
1	122222221	POLYGON ((0.70312 0.70311, 0.70312 1.40611, 1....	0.0
2	122222230	POLYGON ((1.40625 0.70311, 1.40625 1.40611, 2....	0.0
3	122222231	POLYGON ((2.10937 0.70311, 2.10937 1.40611, 2....	0.0
4	122222320	POLYGON ((2.8125 0.70311, 2.8125 1.40611, 3.51...	0.0
5	122222222	POLYGON ((0 0, 0 0.70311, 0.70312 0.70311, 0.7...	3.0
6	122222223	POLYGON ((0.70312 0, 0.70312 0.70311, 1.40625 ...	0.0
7	122222232	POLYGON ((1.40625 0, 1.40625 0.70311, 2.10937 ...	3.0
8	122222233	POLYGON ((2.10937 0, 2.10937 0.70311, 2.8125 0...	3.0
9	122222322	POLYGON ((2.8125 0, 2.8125 0.70311, 3.51562 0....	0.0

We can also use any bingtile grid for any zoom level lower than the data’s zoom level

bgtile_generator10 = gr.BingTileGridGenerator(AOI_ZOOM_LEVEL + 1)
simple_aoi_bingtiles10 = bgtile_generator10.generate_grid(simple_aoi)

bingtile_results10 = create_bingtile_zonal_stats(
    simple_aoi_bingtiles10,
    simple_data_quadkey,
    aggregations=[dict(func="count", fillna=True)],
)

bingtile_results10[bingtile_results10.index_count > 0]

	quadkey	geometry	index_count
10	1222222221	POLYGON ((0.35156 0.35156, 0.35156 0.70311, 0....	3.0
13	1222222320	POLYGON ((1.40625 0.35156, 1.40625 0.70311, 1....	3.0
15	1222222330	POLYGON ((2.10937 0.35156, 2.10937 0.70311, 2....	2.0
16	1222222331	POLYGON ((2.46094 0.35156, 2.46094 0.70311, 2....	1.0

ax = results.plot(ax=plt.axes(), column="index_count", edgecolor="blue", alpha=0.2)
ax = simple_data.plot(ax=ax, color="purple")
ax = bingtile_results.plot(ax=ax, column="index_count", edgecolor="black", alpha=0.4)
ax = bingtile_results10.plot(ax=ax, column="index_count", edgecolor="red", alpha=0.4)