Aggregations
In order to generate zonal stats for an area (or areas) of interest (aoi) , we have come up with the concept of an aggregation specification or agg spec, which is a way to specify what aggregation functions (such as count
,sum
, mean
,std
etc.) are to be applied to columns in the source dataframe (data).
The method create_zonal_stats
can then take in a list of these agg specs and apply them to create zonal stats from the data for the aoi.
Each agg spec consists of a dict
with the following keys:
func
: (Required) astr
or a list[str]
of aggregation functions. See the pandas documentation for aggcolumn
: (Optional) an existing column in the data to generate the zonal statistic from. If not specified, the grouping key based on the index of the aoi applied to the data is used as default.output
: (Optional) astr
or a list[str]
of the name(s) of the output zonal statistic column. If not specified it is concatenated from the column and func i.e.{column}_{func}
(e.g.'func':'mean'
on'column':'population'
has a default value'output':'population_mean'
)fillna
: (Optional) abool
or a list[bool]
of the flag(s) that indicates whether to to afillna(0)
step for the new zonal column,True
meaning it will set anyNA
values in the resulting zonal stat to0
, andFalse
will retain anyNA
values. The default value of the flag(s) isFalse
.
Examples
- The simplest aggregation spec. This will result in an output column named
index_count
as it will use the aoi index as the default column.
{"func":"count"}
- The
sum
function is applied to the data columnpopulation
which will create an output column namedtotal_population
.
{
"func:"sum",
"column": "population",
"output": "total_population"
}
- Compute the zonal stats
mean
,sum
,max
on thepopulation
column and rename the output columns (by default) topopulation_mean
,population_sum
andpopulation_max
.
{
"func": ["mean","sum","max"],
"column": "population",
}
- A full aggregation spec with
fillna
.fillna == False
forstd
means it will remain anNA
if there is no data for the column in the group. The default value forfillna
isTrue
which means that0
is used to replace anyNA
in the output column.
{
"func": ["mean", "sum", "std"],
"column":"population",
"output": ["avg_pop", "total_pop", "std_dev"],
"fillna": [True,True,False],
}
The agg spec
in the list of aggregations can contain the same columns, but the output columns must be unique since they will added as columns in the results.
simple_aoi # sample aoi
simple_data # sample data
ax = simple_aoi.plot(
ax=plt.axes(), facecolor="none", edgecolor=["red", "blue", "green"]
)
ax = simple_data.plot(ax=ax, color="purple")
results = create_zonal_stats(simple_aoi, simple_data, aggregations=[{"func": "count"}])
results
Index name is not none
named_index_aoi = simple_aoi.copy()
named_index_aoi.index.name = "myindex"
named_index_aoi
named_index_results = create_zonal_stats(
named_index_aoi, simple_data, aggregations=[{"func": "count"}]
)
named_index_results.head()
If our existing data geodataframe doesn't have quadkeys,
we can use the compute_quadkey
to generate the quadkeys
for the centroid of the data's geometries.
%%time
DATA_ZOOM_LEVEL = 19
AOI_ZOOM_LEVEL = 9
simple_data_quadkey = compute_quadkey(simple_data, DATA_ZOOM_LEVEL)
simple_data_quadkey.head()
To create bingtile zonal stats, we need to compute the quadkeys for the areas of interest (AOI) and the data.
The geowrangler.grids
module provides a BingTileGridGenerator
that
will generate the quadkeys for the areas covered by your AOIs.
import geowrangler.grids as gr
bgtile_generator = gr.BingTileGridGenerator(AOI_ZOOM_LEVEL)
simple_aoi_bingtiles = bgtile_generator.generate_grid(simple_aoi)
Using the data with the computed quadkeys, we can generate zonal stats for our bingtile grid aois. This just uses the regular pandas grouping and merging function and skips any geospatial joins which results in faster computation.
%%time
bingtile_results = create_bingtile_zonal_stats(
simple_aoi_bingtiles,
simple_data_quadkey,
aggregations=[dict(func="count", fillna=True)],
)
bingtile_results
We can also use any bingtile grid for any zoom level lower than the data's zoom level
bgtile_generator10 = gr.BingTileGridGenerator(AOI_ZOOM_LEVEL + 1)
simple_aoi_bingtiles10 = bgtile_generator10.generate_grid(simple_aoi)
bingtile_results10 = create_bingtile_zonal_stats(
simple_aoi_bingtiles10,
simple_data_quadkey,
aggregations=[dict(func="count", fillna=True)],
)
bingtile_results10[bingtile_results10.index_count > 0]
ax = results.plot(ax=plt.axes(), column="index_count", edgecolor="blue", alpha=0.2)
ax = simple_data.plot(ax=ax, color="purple")
ax = bingtile_results.plot(ax=ax, column="index_count", edgecolor="black", alpha=0.4)
ax = bingtile_results10.plot(ax=ax, column="index_count", edgecolor="red", alpha=0.4)