Raster to Dataframe Tutorial

A basic introduction to generating a vector-to-raster mask and converting rasters to Dataframes.

Basic Usage

Generate a raster mask from a reference TIF image based on the labels assigned to certain regions of the area. The generated raster mask can also be used to convert the raster to a dataframe with respect to their assigned labels.

Terms: * raster mask - processed image wherein some parts of it are set to no value or in other words hidden * labels - parts of an area within the satellite image are categorized as such. Examples are mining, agriculture, etc.

import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd

import geowrangler.raster_to_dataframe as rdf
from rasterio import features
from rasterio.plot import show
from rasterio.windows import Window, transform

Load reference raster image and shape file

The shape file that should be loaded should contain a column that consists of labels.

tiff_file = "../data/vector_to_raster_mask_sample/cabanglasan.tif"
shape_file = "../data/vector_to_raster_mask_sample/labels_20220816.gpkg"
target_file = shape_file.replace("gpkg", "tiff")

We create a dictionary of our target labels that we want to create a raster mask for.

labels = {
    "mining": 1,
    "neg": 2,
    "agriculture": 3,
    "product_extraction": 4,
    "kaingin": 5,
    "biophysical": 6,
}

Generate raster mask

To create the mask we input the reference raster image, shape file with labels, and the dictionary with our target labels into the function.

# Generate masks for a file
masks, grids, values = rdf.generate_mask(
    tiff_file=tiff_file,
    shape_file=shape_file,
    output_file=target_file,
    labels_column="label",
    labels_dict=labels,
    plot=True,
)

Load rasters and the outputed raster mask

tiff_files = [
    "../data/vector_to_raster_mask_sample/cabanglasan.tif",
]
mask_file = "../data/vector_to_raster_mask_sample/labels_20220816.tiff"

Convert raster to dataframe

To convert raster/s to a dataframe, we just need to input the images into the function and add the mask file. The mask file will be used to create the last column label in the dataframe. The result is a tabular dataset that contains the Band values per raster image and their labels.

data = rdf.read_bands(tiff_files, mask_file)
data
B1_0 B2_0 B3_0 B4_0 B5_0 B6_0 B7_0 B8_0 B9_0 B10_0 B11_0 B12_0 label
0 0.1198 0.09635 0.09330 0.0698 0.10665 0.20250 0.2490 0.23525 0.28125 0.0377 0.19925 0.1002 0
1 0.1198 0.09580 0.09245 0.0708 0.10665 0.20250 0.2490 0.23925 0.28125 0.0377 0.19925 0.1002 0
2 0.1148 0.09420 0.09460 0.0707 0.10380 0.20395 0.2478 0.23150 0.27165 0.0385 0.18240 0.0902 0
3 0.1148 0.09190 0.08850 0.0631 0.10380 0.20395 0.2478 0.23300 0.27165 0.0385 0.18240 0.0902 0
4 0.1148 0.09350 0.09080 0.0643 0.10565 0.20830 0.2466 0.24205 0.26990 0.0385 0.18050 0.0894 0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
775824 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.0000 0
775825 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.0000 0
775826 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.0000 0
775827 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.0000 0
775828 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.0000 0

775829 rows × 13 columns

Checking the label column we can see that the TIF image that we converted does not contain the desired labels.

data["label"].unique()
array([0], dtype=uint16)