A basic introduction to Vector Zonal Stats
Basic Usage
Generate zonal stats for a GeoDataframe containing areas of interest
Terms:
- aoi - (area of interest) a geodataframe which we are interested in generating zonal statistics for
- data - the source geodataframe containing the features which are interested in collecting zonal stats for our aoi.
Example 1: Count POIs for Regions
- Input:
- aoi - region3,4,ncr regions (Admin Level 1) (Central Luzon) geometry (geom_type - polygon, multipolygon
- data - Philippine pois (geom_type - points)
- overlap_method = 'intersects'
- aggregations:
- count - number of pois within aoi
- Output
- aoi with pois count (default output column:
index_count
)
- aoi with pois count (default output column:
import geopandas as gpd
import matplotlib.pyplot as plt
import geowrangler.vector_zonal_stats as vzs
%%time
# area multipolygons for regions 3,4,ncr of the philippines
aoi = gpd.read_file("../data/region34ncr_admin.geojson")
ax = aoi.plot(
ax=plt.axes(),
facecolor="none",
edgecolor=[
"#C62828",
"#283593",
"#FF9800",
],
)
aoi
%%time
# raw pois from osm data (subset of region3,4, ncr only)
raw_data = gpd.read_file("../data/region34ncr_osm_pois.geojson")
raw_data.columns.values
raw_data.head()
ax = aoi.plot(
ax=plt.axes(),
facecolor="none",
edgecolor=[
"#C62828",
"#283593",
"#FF9800",
],
)
ax = raw_data.plot(ax=ax)
Compute POIs count per region
%%time
aoi = vzs.create_zonal_stats(
aoi,
raw_data,
overlap_method="intersects",
aggregations=[{"func": "count"}],
)
New aoi with pois count in the column index_count
. (The column name can be overridden as shown in the next example)
%%time
# no_test
aoi
Example 2: Count attractions for Regions
- Input:
- aoi - region3,4,ncr regions (Admin Level 1) (Central Luzon, NCR, Calabarzon) geometry (geom_type - polygon, multipolygon)
- data - attractions: filtered Philippine pois (Central Luzon, NCR, Calabarzon only) (geom_type - points) with
fclass
== 'attraction' - overlap_method = 'intersects'
- aggregations:
- count
- number of pois within aoi
- output column name
attractions
- count
- Output
- aoi with attractions count
Filter the raw data
attractions = raw_data[raw_data.fclass == "attraction"]
attractions.head()
Create zonal stats for filtered data. Add output
key to specify output column name for count
%%time
aoi_attr = vzs.create_zonal_stats(
aoi, attractions, aggregations=[{"func": "count", "output": "attractions"}]
)
aoi_attr
Example 3: Grid Tiles over POIs
- Input:
- aoi - gridded tiles for Region 3 (Central Luzon) at 15km x 15km size
- data - region 3 data filtered from philippine pois (geom_type - points)
- overlap_method = 'intersects'
- aggregations:
- count
- number of pois within aoi
- output column name::
pois_count
- count
- Output
- aoi with pois count
%%time
# load gridded tiles
grid_aoi = gpd.read_file("../data/region3_admin_grids.geojson")
ax = aoi[aoi.Reg_Name == "Region III"].plot(
ax=plt.axes(), facecolor="none", edgecolor="#C62828"
)
ax = grid_aoi.plot(ax=ax, facecolor="none", edgecolor="blue")
%%time
region3_pois = gpd.read_file("../data/region3_osm_pois.geojson")
region3_pois.head()
len(region3_pois)
ax = aoi[aoi.Reg_Name == "Region III"].plot(
ax=plt.axes(), facecolor="none", edgecolor="#C62828"
)
ax = region3_pois.plot(ax=ax)
Compute pois count per grid
%%time
grid_aoi = vzs.create_zonal_stats(
grid_aoi,
region3_pois,
overlap_method="intersects",
aggregations=[{"func": "count", "output": "pois_count"}],
)
grid_aoi[grid_aoi.pois_count > 0].head()
grid_aoi.pois_count.sum()
# show grids with at least 1 poi
ax = aoi[aoi.Reg_Name == "Region III"].plot(
ax=plt.axes(),
facecolor="none",
edgecolor=[
"#C62828",
],
)
ax = grid_aoi[grid_aoi.pois_count > 0].plot(ax=ax, facecolor="none", edgecolor="green")
Example 1 Regions over Population per Bgy Level
- Input:
- aoi - region3,4,ncr geometry (geom_type - polygon, multipolygon)
- data - region3,4, ncr population data (geom_type - pois)
- data_type: 'individual_pois'
- overlap_method = 'intersects'
- aggregations:
- metric_columns: 'population','men', etc.
- each row in the data has a column: population, men, women, etc. with numeric value
- aggregation_functions - 'min','max', 'mean', 'sum', etc.
- output_columns - 'pop_min', 'pop_max', for each column
- metric_columns: 'population','men', etc.
- Output:
- aoi with new output columns pop_min, pop_max etc
Load 2020 Region3,4,NCR Population Data at Barangay level (Admin Level 4)
EPSG:3969
). The land area was computed by projecting to EPSG:3123
and getting the geometry.area
. %%time
# load region3,4,ncr population data at barangay level
region34ncr_pop_data = gpd.read_file("../data/region34ncr_population_land.geojson")
region34ncr_pop_data.head()
region34ncr_pop_data.columns.values
Compute zonal stats for regions 3,4,NCR
- barangay count per region (bgy_count)
- sum and mean for each population statistic (population, men, women, etc)
- sum, mean, std, min, max for land area statistic
%%time
aoi = vzs.create_zonal_stats(
aoi,
region34ncr_pop_data,
aggregations=[
{"func": "count", "output": "bgy_count"},
{
"column": "population",
"func": ["sum", "mean"],
"output": ["pop_total", "pop_avg"],
},
{"column": "men", "func": ["sum", "mean"], "output": ["men_total", "men_avg"]},
{
"column": "women",
"func": ["sum", "mean"],
"output": ["women_total", "women_avg"],
},
{
"column": "children_under_five",
"func": ["sum", "mean"],
"output": ["under5_total", "under5_avg"],
},
{
"column": "youth_15_24",
"func": ["sum", "mean"],
"output": ["youth_total", "youth_avg"],
},
{
"column": "women_of_reproductive_age_15_49",
"func": ["sum", "mean"],
"output": ["women_repro_total", "women_repro_avg"],
},
{
"column": "elderly_60_plus",
"func": ["sum", "mean"],
"output": ["elderly_total", "elderly_avg"],
},
{
"column": "land_area",
"func": ["sum", "mean", "min", "max", "std"],
"output": ["land_total", "land_avg", "land_min", "land_max", "land_std"],
},
],
overlap_method="intersects",
)
aoi.columns.values
aoi.head()
Example 2 : Grids over Population per Bgy Level
- Input:
- aoi - region3 grids geometry (geom_type - polygon)
- data - population data (geom_type - pois)
- data_type: 'individual_pois'
- overlap_method = 'intersects'
- aggregations:
- metric_columns: 'population','men', 'land_area'
- each row in the data has a column population, men, women, including land with numeric value
- aggregation_functions - 'min','max', 'mean', 'sum', etc.
- output_columns - 'pop_min', 'pop_max', for each
- metric_columns: 'population','men', 'land_area'
- Output:
- aoi with new columns pop_min, pop_max etc.
Load population and land POIs (Bgy level)
%%time
region3_pop_pois = gpd.read_file("../data/region3_population_pois.geojson")
region3_pop_pois.head()
region3_pop_pois.columns.values
Create zonal stats (same as previous example, but now for a more granular level for region 3 only)
%%time
grid_aoi = vzs.create_zonal_stats(
grid_aoi,
region3_pop_pois,
aggregations=[
{"func": "count", "output": "bgy_count"},
{
"column": "population",
"func": ["sum", "mean"],
"output": ["pop_total", "pop_avg"],
},
{"column": "men", "func": ["sum", "mean"], "output": ["men_total", "men_avg"]},
{
"column": "women",
"func": ["sum", "mean"],
"output": ["women_total", "women_avg"],
},
{
"column": "children_under_five",
"func": ["sum", "mean"],
"output": ["under5_total", "under5_avg"],
},
{
"column": "youth_15_24",
"func": ["sum", "mean"],
"output": ["youth_total", "youth_avg"],
},
{
"column": "women_of_reproductive_age_15_49",
"func": ["sum", "mean"],
"output": ["women_repro_total", "women_repro_avg"],
},
{
"column": "elderly_60_plus",
"func": ["sum", "mean"],
"output": ["elderly_total", "elderly_avg"],
},
{
"column": "land_area",
"func": ["sum", "mean", "min", "max", "std"],
"output": ["land_total", "land_avg", "land_min", "land_max", "land_std"],
},
],
overlap_method="intersects",
)
grid_aoi.head()
Show grids with bgy_count > 0 and bgy_count == 0
ax = aoi[aoi.Reg_Name == "Region III"].plot(
ax=plt.axes(), facecolor="none", edgecolor="black"
)
ax = grid_aoi[grid_aoi.bgy_count.notna()].plot(
ax=ax, facecolor="none", edgecolor="blue"
)
ax = grid_aoi[grid_aoi.bgy_count.isna()].plot(ax=ax, facecolor="none", edgecolor="red")
Generating Bing Tile Grid Zonal Stats
If our areas of interest use Bing Tile Grids (with the associated quadkeys),
we can use a much faster way of generating zonal stats by pre-computing the quadkeys for our data.
geowrangler.grids
modules BingTileGridGenerator
to convert your AOI into a Bing Tile Grid AOI.Example 1: Count POIs for Region 3 Bing Tile Grid AOI
- Input:
- aoi bing tile grids zoom level 13 - region3 (Admin Level 1) (Central Luzon) geometry (geom_type - polygon, multipolygon)
- data - Region 3 pois (geom_type - points)
- aggregations:
- count - number of pois within aoi
- Output
- aoi with pois count (default output column:
index_count
)
- aoi with pois count (default output column:
Load our bing tile grid aoi
region3_bingtile_grid = gpd.read_file("../data/region3_bingtile_grid13.geojson")
region3_bingtile_grid.head()
ax = plt.axes()
ax = aoi[aoi.Reg_Name == "Region III"].plot(ax=ax, facecolor="none", edgecolor="blue")
ax = region3_bingtile_grid.plot(ax=ax, facecolor="none", edgecolor="red")
Lets compute the quadkeys for our Region3 pois
DATA_ZOOM_LEVEL = 23 # for pois, it can be as high as 24
%%time
region3_pois_quadkeys = vzs.compute_quadkey(region3_pois, DATA_ZOOM_LEVEL)
region3_pois_quadkeys.head()
Lets compute the zonal stats for each grid as the pois_count.
Notice the computation time is pretty fast
%%time
region3_bingtile_pois = vzs.create_bingtile_zonal_stats(
region3_bingtile_grid,
region3_pois_quadkeys,
aggregations=[dict(func="count", output="pois_count", fillna=True)],
)
region3_bingtile_pois[region3_bingtile_pois.pois_count > 0].head()
ax = aoi[aoi.Reg_Name == "Region III"].plot(
ax=plt.axes(), facecolor="none", edgecolor="black"
)
ax = region3_bingtile_pois[region3_bingtile_pois.pois_count > 0].plot(
column="pois_count", ax=ax, edgecolor="blue", alpha=0.4
)
ax = region3_bingtile_pois[region3_bingtile_pois.pois_count == 0].plot(
ax=ax, facecolor="none", edgecolor="red", alpha=0.4
)
Example 2: Population stats for Region 3 Bing Tile Grid AOI
- Input:
- aoi bing tile grids zoom level 13 - region3 (Admin Level 1) (Central Luzon) geometry (geom_type - polygon, multipolygon)
- data - Region 3 pop data (geom_type - points)
- aggregations:
- pop totals, avg, men total and avg, etc , barangay counts, etc.
- Output
- aoi bing tile grids zoom level 13 with additional stats
We can also reuse the region34ncr population data to add more zonal statistics to our region3 bingtile pois
%%time
# Add quadkeys to region34ncr population data
region34ncr_pop_data = vzs.compute_quadkey(
region34ncr_pop_data, DATA_ZOOM_LEVEL, inplace=True
) # inplace saves copying
region34ncr_pop_data.head()
%%time
region3_bingtile_pois_pop_data = vzs.create_bingtile_zonal_stats(
region3_bingtile_pois, # reuse from previous example
region34ncr_pop_data, # updated with quadkeys
aggregations=[
{"func": "count", "output": "bgy_count"},
{
"column": "population",
"func": ["sum", "mean"],
"output": ["pop_total", "pop_avg"],
},
{"column": "men", "func": ["sum", "mean"], "output": ["men_total", "men_avg"]},
{
"column": "women",
"func": ["sum", "mean"],
"output": ["women_total", "women_avg"],
},
{
"column": "children_under_five",
"func": ["sum", "mean"],
"output": ["under5_total", "under5_avg"],
},
{
"column": "youth_15_24",
"func": ["sum", "mean"],
"output": ["youth_total", "youth_avg"],
},
{
"column": "women_of_reproductive_age_15_49",
"func": ["sum", "mean"],
"output": ["women_repro_total", "women_repro_avg"],
},
{
"column": "elderly_60_plus",
"func": ["sum", "mean"],
"output": ["elderly_total", "elderly_avg"],
},
{
"column": "land_area",
"func": ["sum", "mean", "min", "max", "std"],
"output": ["land_total", "land_avg", "land_min", "land_max", "land_std"],
},
],
)
region3_bingtile_pois_pop_data[region3_bingtile_pois_pop_data.bgy_count.notna()].head()
ax = aoi[aoi.Reg_Name == "Region III"].plot(
ax=plt.axes(), facecolor="none", edgecolor="black"
)
ax = region3_bingtile_pois_pop_data[
region3_bingtile_pois_pop_data.bgy_count.notna()
].plot(column="pop_total", ax=ax, edgecolor="blue", alpha=0.4)
ax = region3_bingtile_pois_pop_data[
region3_bingtile_pois_pop_data.bgy_count.isna()
].plot(ax=ax, facecolor="none", edgecolor="red", alpha=0.4)