From 1,000 point-clicks to 1 script…
Learning objectives
{sf}
for geospatial workThe ability to work in one place or with one program from start to finish is powerful and more efficient than splitting your workflow across multiple tools. By sticking with one single framework or set of tools, we can reduce the mental workload necessary when switch between programs, staying organized in each, and dealing with import/export across multiple programs. Although different tools such as ESRI (or ArcPy extensions) are powerful, they require a paid license and typically use point-click user interfaces.
The advantage R
has over these tools is that it is
freely available, easily integrates with vast statistical/modeling
toolboxes, has access to many spatial analysis and mapmaking tools, and
allows us to work in a single place.
If we use a functional programming approach (described in the iteration module ) for spatial problems,
R
can be a very robust and powerful tool for analysis and
spatial visualization of data! Furthermore, once analyses have been
completed, we can re-use the scripts and functions for common spatial
tasks (like making maps or exporting specific spatial files).
Common tasks in a GUI-based approach will always require the same number of point and clicks. With a script-based approach, it’s much easier to recycle previously written code, or to just change a variable and re-run the code. This efficiency is magnified immensely when it can be automated or iterated over the same task through time, or multiple data sets.
For example, some common tasks may include:
{sf}
The {sf}
package truly makes working with vector-based
spatial data easy. We can use a pipeline that includes:
st_read()
: read spatial data in (e.g., shapefiles)st_transform()
: transform or reproject datast_buffer()
: buffer around datast_union()
: combine data into one layerst_intersection()
: crop or intersect one data by
anothergroup_split()
& st_write()
to split
data by a column or attribute and write outThere are many more options that can be added or subtracted from these pieces, but at the core, we can use this very functional approach to provide data, make maps, conduct analysis, and so much more.
Let’s use an example where we read in some groundwater station data, spatially find the closest surface water stations, download some river data, and visualize!
First we’ll need a few packages we’ve not used yet. Please install
these with install.packages()
if you don’t have them.
# GENERAL PACKAGES
library(tidyverse) # data wrangling & viz
library(purrr) # iteration
library(janitor) # name cleaning
library(glue) # pasting stuff together
# SPATIAL PACKAGES
library(sf) # analysis tools
library(mapview) # interactive maps!
mapviewOptions(fgb = FALSE) # to save interactive maps
library(tigris) # county & state boundaries
library(units) # for convert spatial and other units
library(dataRetrieval) # download USGS river data
library(tmap) # mapping
library(tmaptools) # mapping tools
We’ll leverage the ability to pull in many different data and stitch them all together through joins (spatial or common attributes). Each data component may be comprised of one or more “layers”, which ultimately we can use on a map.
First we need state and county boundaries. The {tigris}
package is excellent for this.
Let’s also pull in a shapefile that’s just Sacramento County. We’ll use this to crop/trim things down as we move forward. Note, we’ll need to check the coordinate reference system and projection here, and make sure we are matching our spatial data.
# get just sacramento: here we read in a shapefile:
sac_co <- st_read("data/shp/sac/sac_county.shp")
Reading layer `sac_county' from data source
`/Users/richpauloo/Documents/GitHub/r4wrds/intermediate/data/shp/sac/sac_county.shp'
using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 9 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -13565710 ymin: 4582007 xmax: -13472670 ymax: 4683976
Projected CRS: WGS 84 / Pseudo-Mercator
# check CRS
st_crs(sac_co)$epsg
[1] 3857
# match with other data
sac_co <- st_transform(sac_co, st_crs(ca_co))
st_crs(sac_co) == st_crs(ca_co) # should be TRUE!
[1] TRUE
# make a box around sacramento county
# (a grid with an n=1) for inset
sac_box <- st_make_grid(sac_co, n = 1)
And let’s quickly visualize these pieces! We’ll use the base
plot()
functions here.
# make sure we have all the pieces with a quick test plot
plot(ca$geometry, col = alpha("gray", 0.5), border = "black", lwd=2)
plot(ca_co$geometry, add = T, border = "purple", col = NA)
plot(sac_co$geometry, add = T, border = "cyan4", col = "skyblue",alpha=0.4, lwd = 2)
plot(sac_box, add = T, border = "orange", col = NA, lwd = 1.4)
Let’s practice our iteration skills. We’ll read in groundwater stations for 3 counties (El Dorado, Placer, and Sacramento), convert to {sf} objects, plot them, and then crop/select a subset of stations using spatial intersection.
Iteration…remember {purrr
}? Let’s use it here!
# read the stations
gw_files <- list.files(path = "data/gwl/county",
full.names = TRUE, pattern = "*.csv")
# read all files into dataframes and combine with purrr
gw_df <- map_df(gw_files, ~read.csv(.x))
# the readr package will also do this same thing by default
# when passed a list of files with the same data types
gw_df <- read_csv(gw_files)
# now make "spatial" as sf objects
gw_df <- st_as_sf(gw_df, coords=c("LONGITUDE","LATITUDE"),
crs=4326, remove=FALSE) %>%
# and transform!
st_transform(., st_crs(ca_co))
# preview!
mapview(gw_df, zcol="COUNTY_NAME", layer.name="GW Stations") +
mapview(sac_co, legend=FALSE)