A basic introduction to using geometry validation
import geopandas as gpd
import pandas as pd
gdf = gpd.read_file("../data/broken.geojson")
gdf = pd.concat([gdf, gpd.GeoDataFrame({"geometry": [None], "id": "null geometry"})])
gdf
We then run Geometry Validation. By default, these append a new column if the validation fails, applies a fix if possible, and raises a warning if no fix is available.
from geowrangler.validation import GeometryValidation
GeometryValidation(gdf)
validated_gdf = GeometryValidation(gdf).validate_all()
validated_gdf
gdf.iloc[5].geometry.area
Running the validation again shows that validation applies some fixes
GeometryValidation(validated_gdf[["id", "geometry"]]).validate_all()
Passing Validators
You can pass a list of Validators to selective run validators, the default uses the following
NullValidator
- Checks if geometry is null. No fixOrientationValidator
- Check the orientation of the outer most ring of each polygon is counter clockwise. Converts it to counter-clockwise if as the fixSelfIntersectingValidator
- Checks if the polygons is self intersecting. Runsshapely.validation.make_valid
as the fix.CrsBoundsValidator
- Checks if bounds of each geometry are within the CRS. No fixAreaValidator
- Checks if polygons or multipolygon have an area greater than zero
from geowrangler.validation import NullValidator, SelfIntersectingValidator
validated_gdf = GeometryValidation(
gdf, validators=[NullValidator, SelfIntersectingValidator]
).validate_all()
validated_gdf
You can also use a single validator at a time
SelfIntersectingValidator().validate(gdf)
from shapely.geometry.point import Point
from shapely.geometry.polygon import Polygon
from geowrangler.validation import BaseValidator
class PointValidator(BaseValidator):
validator_column_name = "is_not_point"
geometry_types = ["Point"] # What kind of geometies to validate and fix
def check(self, geometry):
# Checks if the geometry is valid. If False, applies the fix
return geometry.x > 0
def fix(self, geometry):
return Point(0, geometry.y)
gdf = gpd.GeoDataFrame(
geometry=[Point(-0.1, 0), Polygon([(-0.1, 0.1), (-0.1, 1), (1, 1)])]
)
validated_gdf = PointValidator().validate(gdf)
ax = gdf.plot()
ax = validated_gdf.plot()
There are several cases where no fix is available or you want to fix them manualy, we can create a validator that warns the users.
from shapely.geometry.point import Point
from shapely.geometry.polygon import Polygon
from geowrangler.validation import BaseValidator
class PointValidator(BaseValidator):
validator_column_name = "is_not_point"
fix_available = False # Telling the validator that there is no available fixes
warning_message = "Found geometries that are points below 0" # warning message
geometry_types = ["Point"] # What kind of geometies to validate and fix
def check(self, geometry):
# Checks if the geometry is valid. If False, warn the user
return geometry.x > 0
gdf = gpd.GeoDataFrame(geometry=[Point(-0.1, 0), Polygon([(0, 0.0), (0, 1), (1, 1)])])
validated_gdf = PointValidator().validate(gdf)
validated_gdf