---
author: "Chris Reudenbach"
title: "GRASS application for real world big data"
date: "`r Sys.Date()`"
editor_options:
  chunk_output_type: console
output:
  html_document: 
    theme: united
    toc: true
  rmarkdown: default
  pdf_document:
    latex_engine: xelatex
    toc: true
urlcolor: blue
vignette: >
  %\VignetteIndexEntry{GRASS application for real world big data }
  %\VignetteEncoding{UTF-8}{inputenc}\
  %\VignetteEngine{knitr::knitr}
---

# Real world example

A typical use case is working with an existing GRASS project database.  
GRASS stores all data in a structured *gisdbase* consisting of locations and mapsets.

Raster and vector data are fully managed inside this structure.  
This example shows how to integrate large external datasets into GRASS.

## Downloading census data

We use German 2011 census data as a real-world big data example.  
The dataset contains over 35 million grid points with multiple attributes.

```{sh eval=FALSE}
# Download census grid data
wget https://www.zensus2011.de/SharedDocs/Downloads/DE/Pressemitteilung/DemografischeGrunddaten/csv_Bevoelkerung_100m_Gitter.zip?__blob=publicationFile&v=3
```

Key properties of the dataset:

* Provided in a typical governmental open-data format
* Large enough to demonstrate big data workflows
* Spatially explicit and well suited for raster and vector analysis

```{r eval=FALSE}
require(link2GI)
require(curl)

# Create project folder structure
dirs <- link2GI::createFolders(
  root_folder = file.path(tempdir(), "link2GI_examples"),
  folders     = c("run/")
)

# Download census data
url <- "https://www.zensus2011.de/SharedDocs/Downloads/DE/Pressemitteilung/DemografischeGrunddaten/csv_Bevoelkerung_100m_Gitter.zip?__blob=publicationFile&v=3"
zipfile <- file.path(dirs$path_run, "testdata.zip")
curl::curl_download(url, zipfile)

# Extract CSV file
unzip(
  zipfile,
  files = grep("[.]csv$", unzip(zipfile, list = TRUE)$Name, value = TRUE),
  exdir = dirs$path_run,
  junkpaths = TRUE,
  overwrite = TRUE
)
```

## Preprocessing the data

The dataset is essentially gridded xyz data.
Each row represents one 100 m × 100 m grid cell.

```{r eval=FALSE}
# Fast CSV import
xyz <- data.table::fread(
  file.path(dirs$path_run, "Zensus_Bevoelkerung_100m-Gitter.csv")
)

head(xyz)
```

Because the data is already gridded, rasterization is straightforward.
We only need to identify x, y, and value columns.

```r
require(RColorBrewer)
require(terra)
require(mapview)

# Detect coordinate and value columns
cn <- names(xyz)
num_cols <- cn[vapply(xyz, is.numeric, logical(1))]
xcol <- num_cols[1]
ycol <- num_cols[2]
vcol <- num_cols[3]

# Build xyz matrix
xyz3 <- xyz[, c(xcol, ycol, vcol)]
names(xyz3) <- c("x", "y", "z")

# Create raster from xyz table
r <- terra::rast(xyz3, type = "xyz")
terra::crs(r) <- "EPSG:3035"

# Visualize raster
p <- colorRampPalette(brewer.pal(8, "Reds"))
mapview::mapviewOptions(mapview.maxpixels = min(2e6, ncell(r)))
mapview::mapview(
  r,
  col.regions = p,
  at = c(-1, 10, 25, 50, 100, 500, 1000, 2500),
  legend = TRUE
)
```

## Setting up a GRASS project

We now create a permanent GRASS location from the raster object.
`linkGRASS()` searches for an installed GRASS version and initializes it.

```r
require(link2GI)

link2GI::linkGRASS(
  x = r,
  epsg = 3035,
  gisdbase = file.path(tempdir(), "link2GI_examples"),
  location = "microzensus2011"
)
```

## Importing raster data into GRASS

The raster is written to GeoTIFF before import.
Any GDAL-supported format would also work.

```r
require(rgrass)
require(terra)

# Write raster to GeoTIFF
tif_file <- file.path(dirs$path_run, "Zensus_Bevoelkerung_100m-Gitter.tif")
terra::writeRaster(r, tif_file, overwrite = TRUE)

# Register raster as external GRASS dataset
rgrass::execGRASS(
  "r.external",
  flags = "o",
  input  = tif_file,
  output = "Zensus_Bevoelkerung_100m_Gitter",
  band   = "1"
)

# Inspect raster metadata
rgrass::execGRASS("r.info", map = "Zensus_Bevoelkerung_100m_Gitter")
```

## Importing data as GRASS vector points

We first convert the xyz table into an `sf` point object.
Each grid cell becomes a single point feature.

```r
require(sf)

xyz_sf <- sf::st_as_sf(
  xyz,
  coords = c("x_mp_100m", "y_mp_100m"),
  crs = "EPSG:3035",
  agr = "constant"
)

plot(sf::st_geometry(xyz_sf))
```

The GRASS database already exists.
We therefore link to it and import the vector points.

```r
require(link2GI)
require(rgrass)

# Reuse existing GRASS location
link2GI::linkGRASS(
  x = xyz_sf,
  epsg = 3035,
  gisdbase = file.path(tempdir(), "link2GI_examples"),
  location = "microzensus2011",
  gisdbase_exist = TRUE
)

# Import sf object as GRASS vector
vname <- "Zensus_Bevoelkerung_100m"
sf2gvec(
  x = xyz_sf,
  obj_name = vname,
  gisdbase = file.path(tempdir(), "link2GI_examples"),
  location = "microzensus2011",
  gisdbase_exist = TRUE
)

# Inspect imported vector data
rgrass::execGRASS("g.list", type = "vector")
rgrass::execGRASS("v.info", map = vname)
```

