11. {sf} and map making in R

Spatial data and Mapmaking 101

Learning objectives

Geospatial data with {sf}

R is a powerful tool for working with spatial data and making maps. Within R, you can do nearly everything that a GUI type program can do (e.g., ArcGIS or QGIS), and moreover, you can write a script to automate or scale up routine analyses, thus saving time on repetitive tasks and ensuring you create reproducible workflows.

There are a variety of spatial mapping/plotting packages in R. However, the best option being used widely for vector-based spatial data in R is the {sf} package. {sf} is a powerful, clean, and fairly straightforward approach because it has an easy to use syntax, it is fast and it can do most if not all of the tasks commonly done in other geospatial software programs. Even better, {sf} spatial objects are simply data.frames with a geometry column, so all of the tidy, wrangle, join, and plot capabilities from {dplyr} and {ggplot2} also work on {sf} objects. Therefore, it is possible to use R to address all your data processing needs in a single environment without the need to move between tools for tabular data (e.g., Excel) and geospatial data (e.g., ArcGIS or QGIS).

(ref:ah-sf) Illustration by @allison_horst.

(ref:ah-sf)

Figure 1: (ref:ah-sf)

Reading spatial data

The {sf} package can read spatial data in various formats (e.g., shapefile, GeoJSON, PostGIS, kml), and is very straightforward if the data is already spatial, requiring only a call to the function st_read().

# First we load the sf and here packages
library(sf)
library(tidyverse)

# may need to unzip first:
unzip("data/shp/sac_county.zip", exdir = "data/shp/sac")

# Read a shapefile of Sacramento county 
sac <- st_read("data/shp/sac/sac_county.shp", quiet = TRUE) 

colnames(sac)
 [1] "OBJECTID"   "COUNTY_NAM" "COUNTY_ABB" "COUNTY_NUM" "COUNTY_COD"
 [6] "COUNTY_FIP" "Shape"      "Shape.STAr" "Shape.STLe" "geometry"  

Converting a dataframe to sf

At other times, you may need to convert tabular data to a spatial format. To make an {sf} object, we are creating a geometry column. This contains all the geospatial information we need, and it lives within the dataframe. Importantly, this geometry column is “sticky”, meaning whatever we do to our data (tidy, filter, mutate etc) the associated geometry column will stick with the data. What’s awesome is this column can contain anything from information for a point to a line to a complex polygon – all in one column. To make an {sf} object, we need to know two important pieces of information…

To practice, let’s read all groundwater level monitoring stations in Sacramento County, stations_sac.csv, and convert this tabular data to an sf object with the function st_as_sf(). We will specify which columns contain the coordinates (coords), and what the projection, or coordinate reference system (crs) the data is in. In this case, we know the data is in NAD83, or EPSG 4269.

# read groundwater level stations in Sacramento county as dataframe
stations <- read_csv("data/gwl/stations_sac.csv")

# check object class
class(stations)
[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame" 
# convert stations into an sf object by specifying coordinates and crs
stations <- st_as_sf(stations, 
                     coords = c("LONGITUDE", "LATITUDE"), # note x goes first
                     crs = 4269, # projection, this is NAD83
                     remove = FALSE) # don't remove lat/lon cols from dataframe

# check the class of the new object
class(stations)
[1] "sf"          "spec_tbl_df" "tbl_df"      "tbl"        
[5] "data.frame" 

Projecting spatial data

A common problem in geospatial analysis is when two different datasets are in different projections. We can check the projection of our sac and stations objects with the function st_crs(), and transform our data (or re-project) with the st_transform() function.

We know stations is in NAD83, but what about sac? Let’s check with st_crs(). Line 2 of the output indicates WGS84.

st_crs(sac)
Coordinate Reference System:
  User input: WGS 84 / Pseudo-Mercator 
  wkt:
PROJCRS["WGS 84 / Pseudo-Mercator",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["Popular Visualisation Pseudo-Mercator",
        METHOD["Popular Visualisation Pseudo Mercator",
            ID["EPSG",1024]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Web mapping and visualisation."],
        AREA["World between 85.06°S and 85.06°N."],
        BBOX[-85.06,-180,85.06,180]],
    ID["EPSG",3857]]

We can re-project, or transform the projection (crs) with st_transform() and by specifying the EPSG code to transform the data to. Here we use NAD83 (EPSG: 4269), so the Sacramento county boundary (sac) is in the same projection as the groundwater level monitoring points (stations).

sac <- st_transform(sac, crs = 4269)

Lastly, we verify our transformation worked.

st_crs(sac)
Coordinate Reference System:
  User input: EPSG:4269 
  wkt:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Geodesy."],
        AREA["North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands.  British Virgin Islands."],
        BBOX[14.92,167.65,86.46,-47.74]],
    ID["EPSG",4269]]

We can even ask R if the projection (crs) of sac and stations are identical, and they are.

identical(st_crs(stations), st_crs(sac))
[1] TRUE

We can also transform data using the actual sf object or dataframe, without needing to find the specific EPSG or CRS code. For example, if we want to transform our sac county polygon into the same projection as our stations data, we can do the following:

sac <- st_transform(sac, crs = st_crs(stations))

# verify these are the same
identical(st_crs(stations), st_crs(sac))
[1] TRUE
# or look at just the EPSG code:
st_crs(sac)$epsg
[1] 4269
st_crs(stations)$epsg
[1] 4269

Spatial join with st_intersection()

With all of our spatial data in the same projection, we can perform a spatial join. Perhaps the most common spatial join is an intersection. For example, above, the stations object only contains stations in Sacramento County, but it came from a much larger set of stations (n = 43,807). Let’s bring in all groundwater stations in the state of California, convert it to an sf object class, and plot the data.

# all gw stations in California
all_gw_stations <- read_csv("data/gwl/stations.csv") %>% 
  st_as_sf(coords = c("LONGITUDE", "LATITUDE"), 
           crs = 4269, 
           remove = FALSE) 

As we can see above, groundwater monitoring stations are concentrated in Bulletin 118 subbasins.2

Using dplyr::filter() we can subset these stations to Sacramento County.

# filter to Sacramento county & verify this worked
stations_sac <- stations %>% 
  filter(COUNTY_NAME == "Sacramento") 

# verify this worked
unique(stations_sac$COUNTY_NAME)
[1] "Sacramento"

But what if we didn’t have the county data already, or if we wanted to filter these data by a polygon that wasn’t detailed in one of the existing variables in our dataframe? In this case, we can use st_intersection() which is a spatial join that takes two arguments, the sf object we want to filter (x), and another sf object to filter by (y). If we use x = all_gw_stations and y = sac it will return all of the points in all_gw_stations that fall within sac county.

Before performing the spatial join, we must re-project our data from a geographic coordinate reference system (CRS) to a projected coordinate reference system.3

# it's good practice to ensure your spatial data are in a projected CRS
# like meters before performing spatial operations, so we transform to 3310
all_gw_stations_3310 <- st_transform(stations, 3310)
sac_3310 <- st_transform(sac, 3310)

# perform the intersection 
stations_sac_3310 <- st_intersection(all_gw_stations_3310, sac_3310) 

# number of observations in each county
table(stations_sac_3310$COUNTY_NAME)

     Placer  Sacramento San Joaquin      Sutter 
          2         485           1           1 

Interestingly, there are 4 counties in our data after the spatial join. Why is that? If we visualize the data, we can see that all of these points are right on the border of Sacramento County, and the process that previously added county names to these data must have used a slightly different Sacramento county polygon than the one we are using in this module.

There are many many more methods available beyond intersections, including area, distances, buffers, crops, voronoi polygons, nearest neighbor calculations, convex hull calculations, centroid calculations, and much, much more. The list of operations within {sf} are shown below.

methods(class = 'sfc')
 [1] [                     [<-                   as.data.frame        
 [4] c                     coerce                format               
 [7] fortify               identify              initialize           
[10] mapView               obj_sum               Ops                  
[13] print                 rep                   scale_type           
[16] show                  slotsFromS3           st_area              
[19] st_as_binary          st_as_grob            st_as_s2             
[22] st_as_sf              st_as_text            st_bbox              
[25] st_boundary           st_buffer             st_cast              
[28] st_centroid           st_collection_extract st_convex_hull       
[31] st_coordinates        st_crop               st_crs               
[34] st_crs<-              st_difference         st_geometry          
[37] st_inscribed_circle   st_intersection       st_intersects        
[40] st_is_valid           st_is                 st_line_merge        
[43] st_m_range            st_make_valid         st_nearest_points    
[46] st_node               st_normalize          st_point_on_surface  
[49] st_polygonize         st_precision          st_reverse           
[52] st_sample             st_segmentize         st_set_precision     
[55] st_shift_longitude    st_simplify           st_snap              
[58] st_sym_difference     st_transform          st_triangulate       
[61] st_union              st_voronoi            st_wrap_dateline     
[64] st_write              st_z_range            st_zm                
[67] str                   summary               type_sum             
[70] vec_cast.sfc          vec_ptype2.sfc       
see '?methods' for accessing help and source code

To learn more about advanced spatial operations, see the Spatial data module in the Intermediate to Advanced wrds course, and online books and resources.

Plotting {sf} data

With all of our spatial data in the same projection, we can start making maps! We will cover the built-in plot() method for {sf} objects, interactive maps with {mapview}, and plotting with {ggplot2}.

Inspect data with plot()

After reading spatial data you may want to plot it to make sure that it imported correctly, and to understand the fields. For a quick plot, you can simply use plot().

# plot the geometry
plot(sac$geometry)

If we don’t specify the “geometry” column, plot() will plot the first 10 columns in the dataframe (you can control the number of subplots shown with the max.plot argument). Here we can see there are 4 distinct basins (BASIN_CODE) in Sacramento County.

plot(stations)

Interactive mapping with {mapview}

One of the easiest and coolest packages you’ll find for interactive mapping is {mapview}. As long as data are in sf format, you can quickly make an interactive map. First let’s make sure we have an sf class of data.

class(stations)
[1] "sf"          "spec_tbl_df" "tbl_df"      "tbl"        
[5] "data.frame" 
class(sac)
[1] "sf"         "data.frame"

Basic use

Next we can use the simple mapview() function to create an interactive webmap!

We can add {mapview} objects to one another in the same way we add layers to a ggplot, by using a +. We can then toggle them on and off from the interactive map from the top-left hand layer control icon. We can also change the basemap layers being used on the map from this same menu.

# combine sac and stations data
mapview(sac) + mapview(stations)