Geowrangler is a Python package for geodata wrangling. It helps you build data transformation workflows that have no out-of-the-box solutions from other geospatial libraries.
We surveyed our past geospatial projects to extract these solutions for our work and hope that these will be useful for others as well.
Our audience are researchers, analysts, and engineers delivering geospatial projects.
Geowrangler was borne out of our efforts to reduce the amount of boilerplate code in wrangling geospatial data. It builds on top of existing geospatial libraries such as geopandas, rasterio, rasterstats, morecantile, and others. Our goals are centered on the following tasks:
Extracting area of interest zonal statistics from vector and raster data
Gridding areas of interest
Validating geospatial datasets
Downloading of publically available geospatial datasets (e.g., OSM, Ookla, and Nightlights)
Other geospatial vector and raster data processing tasks
To make it easy to document, maintain, and extend the package, we opted to maintain the source code, tests and documentation on Jupyter notebooks. We use nbdev to generate the Python package and documentation from the notebooks. See this document to learn more about our development workflow.
By doing this, we hope to make it easy for geospatial analysts, scientists, and engineers to learn, explore, and extend this package for their geospatial processing needs.
Aside from providing reference documentation for each module, we have included extensive tutorials and use case examples in order to make it easy to learn and use.
Modules
Grid Tile Generation
Geometry Validation
Vector Zonal Stats
Raster Zonal Stats
Area Zonal Stats
Distance Zonal Stats
Vector to Raster Mask
Raster to Dataframe
Raster Processing
Demographic and Health Survey (DHS) Processing Utils
We develop the package modules alongside their documentation. Each page comes with an Open in Colab button that will open the Jupyter notebook in Colab for exploration (including this page).
Click on the Open in Colab button below to open this page as a Google Colab notebook.
# view the source of a grid componentgdf = gpd.GeoDataFrame()grid = geowrangler.grids.SquareGridGenerator(gdf, 1)grid??
Type: SquareGridGenerator
String form: <geowrangler.grids.SquareGridGenerator object>
File: ~/work/unicef-ai4d/geowrangler-1/geowrangler/grids.py
Source:class SquareGridGenerator:def __init__(
self,
cell_size: float,# height and width of a square cell in meters
grid_projection: str ="EPSG:3857",# projection of grid output
boundary: Union[SquareGridBoundary, List[float]]=None,# original boundary):
self.cell_size = cell_size
self.grid_projection = grid_projection
self.boundary = boundary