A basic introduction to Vector Distance Zonal Stats
Basic Usage
Generate area zonal stats for a GeoDataframe containing areas of interest with a vector data source with the nearest distance.
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Point
import geowrangler.distance_zonal_stats as dzs
simple_aoi = gpd.read_file("../data/simple_planar_aoi.geojson")
simple_data = gpd.read_file("../data/simple_planar_data.geojson")
simple_point_data = make_point_df(3, 5, offset_x=0.5, offset_y=3.0)
Given an aoi (simple_aoi
), a sample area data source (simple_data
), and a sample point data source
simple_aoi
simple_data
simple_point_data
In order correctly compute distances from the aoi to the data sources, we need to make sure that the aoi, data and point data geodataframes are using a planar
CRS (i.e. gdf.crs.is_geographic == False
)
simple_aoi.crs
simple_data.crs
simple_point_data.crs
We have an aoi (simple_aoi
) and geodataframe containing sample data (simple_data
) that overlaps the aoi.
We also have simple point data which do not intersect with our AOIs.
ax = plt.axes()
ax = simple_data.plot(
ax=ax, color=["orange", "brown", "purple"], edgecolor="yellow", alpha=0.4
)
ax = simple_aoi.plot(ax=ax, facecolor="none", edgecolor=["r", "g", "b"])
ax = simple_point_data.plot(ax=ax)
The red,green,blue outlines are the 3 regions of interest (aoi) while the orange,brown, purple areas are the data areas.The blue dots are data which do not intersect our AOIs.
%%time
results = dzs.create_distance_zonal_stats(
simple_aoi,
simple_point_data,
max_distance=7,
aggregations=[
dict(func="count"),
dict(func="sum", column="population"),
dict(func="mean", column="internet_speed"),
],
)
The zonal stats computed for the point data only includes those points nearest to each aoi. The data geometries within nearest distance (within 7.0
m) are the only ones considered.
max_distance
to None
or a large value can cause a possible slowdown for large datasets. See this Geopandas reference for more details.results
Data areas and geometries which overlap the aoi areas have a distance of 0.0
and are always the nearest geometries.
%%time
results2 = dzs.create_distance_zonal_stats(
simple_aoi,
simple_data,
max_distance=1,
aggregations=[
dict(func="count"),
dict(func="sum", column="population"),
dict(func="mean", column="internet_speed"),
],
)
results2
%%time
region3_admin_grids = gpd.read_file("../data/region3_admin_grids.geojson")
%%time
region34ncr_osm_pois = gpd.read_file("../data/region34ncr_osm_pois.geojson")
ax = plt.axes()
ax = region34ncr_osm_pois.plot(ax=ax)
ax = region3_admin_grids.plot(ax=ax, facecolor="none", edgecolor="blue")
region3_admin_grids = region3_admin_grids.to_crs("EPSG:3857") # convert to planar
region34ncr_osm_pois = region34ncr_osm_pois.to_crs("EPSG:3857")
region34ncr_osm_pois
%%time
results3 = dzs.create_distance_zonal_stats(
region3_admin_grids,
region34ncr_osm_pois,
max_distance=10_000, # within 10km
aggregations=[dict(func="count", output="pois_count", fillna=[True])],
)
len(results3[results3.pois_count == 0.0])
results3