# 9. Function Junction

Use, re-use, and write code to do stuff!

Functions

• Understand what a function is
• Understand how to use functions in R
• Write your own simple function

## Functions

What is a function? A function is simply a set of instructions or code written for a computer to follow. These can be very simple (multiply one column of data by another) to much more complex (multi-step model and visualization). Packages are usually sets of functions that focus on similar topics or themes, sort of like a cookbook for a specific kind of food.

Functions ultimately make our code more reusable and reproducible, because we can use functions for repeated processes (like cleaning and tidying data!).

### What’s in a Function?

In R, functions have a name, an input which may consist of one or more arguments (sometimes called parameters) and a output or the thing that the function returns.

Here’s an example function to convert discharge from cubic meters per second (cms) to cubic feet per second (cfs). Can you identify the different parts?

``````cms_to_cfs <- function(discharge) {
cfs_out <- discharge * 35.3146662
return(cfs_out)
}
``````

### Running a Function

Using functions requires we know what pieces go where inside the function (the arguments), our data is of the right class or shape for the function (input), and we know what the function should return to our Environment! Once we have this basic understanding, there are thousands of functions that are already written and part of packages we can use. In fact, we’ve been using functions in every lesson thus far. In this module, we focus on building our own functions to meet our data science needs.

To use the `cms_to_cfs` function we wrote above, we need to provide a value to the `discharge` argument. Let’s convert 10 cms to cfs, using our function.

``````cms_to_cfs(discharge = 10)
``````
``[1] 353.1467``

The function returns a numeric value to our console. If we wanted to save this value, we could assign it (`<-`) to an object. Functions allow us to iterate over data, or repeat the same task on different datasets without needing to re-write code every time.

### Naming Functions in R

It’s important to adopt consistent and descriptive names in R. This isn’t just for functions, but it’s a particularly good habit to practice when naming functions. For example, a general recommendation is to 🐍 `use_snake_case` 🐍. Other options include:

• `MaybeUseCamelCase`
• `or.separate.by.periods`
• `orJust_try.everythingBecause.CHAOS`

## Use a Function from a Package

We’ve already been doing this throughout the previous lessons with things like `read.csv()`, `dplyr::filter()`, or `ggplot2::ggplot()`, but let’s use a new package to pull some surface water river flow data. Here we’ll use the `{dataRetrieval}` package.

### Getting Water Data with `{dataRetrieval}`

The {dataRetrieval} package is a way to access hydrologic and water quality data from the USGS and EPA. There are several vignettes (or tutorials) available on the package website. Here, we will show how to download some surface water data from the American River near Sacramento. First we load the package with the functions we want to use. We load packages once per session.

``````# load the functions via packages!
library(tidyverse)
library(dataRetrieval) # a spellbook of water data functions

flow_site <- readNWISdata(site = "11427000", # USGS Station ID
service = "iv", # instantaneous measurements
parameterCd = "00060", # discharge in cfs
startDate = "2019-10-01",
endDate = "2021-09-30",
tz = "America/Los_Angeles")
``````

There are a lot of arguments in that function! How did we know what each one was or what to put there? There are a few different options. For functions or packages you aren’t familiar with, it’s worth using built-in help.

• Look up help for a function with `?readNWISdata` and look at the `Usage` and `Arguments` sections!
• Use tab autocomplete as much as you can in RStudio! Once inside a function’s parenthesis, hit tab! There are generally a list of arguments that will pop up, each with some info about them. This is the same info we should find in the Help tab when we look up the function.

### How do we know it worked?

Always visualize and inspect your data… we should have an object in our environment called `flow_site`. Let’s take a look at this data, tidy it, and plot!

``````# check data
summary(flow_site)
``````
``````  agency_cd           site_no             dateTime
Length:70080       Length:70080       Min.   :2019-10-01 00:00:00
Class :character   Class :character   1st Qu.:2020-03-31 13:26:15
Mode  :character   Mode  :character   Median :2020-09-30 13:07:30
Mean   :2020-09-30 12:39:55
3rd Qu.:2021-04-01 11:48:45
Max.   :2021-09-30 23:45:00
X_00060_00000     X_00060_00000_cd      tz_cd
Min.   :   9.68   Length:70080       Length:70080
1st Qu.:  55.50   Class :character   Class :character
Median : 115.00   Mode  :character   Mode  :character
Mean   : 323.72
3rd Qu.: 411.00
Max.   :4580.00                                        ``````
``````head(flow_site)
``````
``````  agency_cd  site_no            dateTime X_00060_00000
1      USGS 11427000 2019-10-01 00:00:00           101
2      USGS 11427000 2019-10-01 00:15:00           106
3      USGS 11427000 2019-10-01 00:30:00           106
4      USGS 11427000 2019-10-01 00:45:00           106
5      USGS 11427000 2019-10-01 01:00:00           106
6      USGS 11427000 2019-10-01 01:15:00           106
X_00060_00000_cd               tz_cd
1                A America/Los_Angeles
2                A America/Los_Angeles
3                A America/Los_Angeles
4                A America/Los_Angeles
5                A America/Los_Angeles
6                A America/Los_Angeles``````
``````# fix column names so we can use a more informative parameter name
flow_site <- dataRetrieval::renameNWISColumns(flow_site)

# check data
names(flow_site)
``````
``````[1] "agency_cd"    "site_no"      "dateTime"     "Flow_Inst"
[5] "Flow_Inst_cd" "tz_cd"       ``````
``````# visualize:
ggplot(data = flow_site) +
geom_line(aes(x = dateTime, y = Flow_Inst), color = "darkblue") +
theme_classic() +
labs(x = "", y = "Flow (cfs)",
title = "Discharge from USGS Station 11427000",
caption = "data pulled using {dataRetrieval} package in R")
``````

Awesome! We plotted some discharge data from a river. That’s great. We could save this data if we wanted (see the import/export module), or do additional analysis (see next module on joins!). Let’s try using a different function here.

Challenge 1: You Try!

1. Using the code template below, create a `flow_huc` object using the `readNWISdata()` function but fill in the arguments with the following information:

• Instead of `site`, let’s use `huc`. We’ll try getting stations from HUC 18020111.
• Instead of `iv` (instantaneous), let’s use `dv`.
• Use the same `startDate`, `endDate`, and `parameterCd` as we used above.
• `%>%` the output from `readNWISdata()` directly to the `renameNWISColumns()` function and then to the `addWaterYear()` function.
2. How many unique `site_no` are there?

3. Can you visualize the data using a `ggplot()` with `geom_line` and use a different color for each `site_no`?

Code template:

``````flow_huc <- readNWISdata(___,
___,
parameterCd = "00060",
startDate = "2019-10-01",
endDate = "2021-09-30") %>%
___ %>%
___
``````
``````# look for stations with daily values of discharge in the whole watershed
flow_huc <- readNWISdata(huc = "18020111", # using a huc id number now
service = "dv", # daily values
parameterCd = "00060",
startDate = "2019-10-01",
endDate = "2021-09-30") %>%
renameNWISColumns() %>%  # the pipe infers that the data is passed through

names(flow_huc) # names look ok!
``````
``````[1] "agency_cd" "site_no"   "dateTime"  "waterYear" "Flow"
[6] "Flow_cd"   "tz_cd"    ``````
``````unique(flow_huc\$site_no) # 2 stations
``````
``[1] "11446500" "11447360"``
``````# visualize:
ggplot() +
geom_line(data = flow_huc,
aes(x = dateTime, y = Flow, color = site_no),
show.legend = FALSE) +
theme_classic() +
labs(x = "", y = "Flow (cfs)",
title = "Discharge from USGS Stations: 11446500, 11447360",
caption = "data pulled using {dataRetrieval} package in R") +
# a new way to view data with facets...add a free scale!
facet_grid(rows = vars(site_no), scales = "free")
``````

## Using Custom Functions

Since we already wrote a custom function to convert cms to cfs, let’s make another similar function but for cfs to cms, and apply it to the flow data we just downloaded. Let’s look at how we might use a function on an existing data frame or dataset that we have in R.

### Convert Flow Data to cms

Let’s use some of the `dplyr` we looked at in the last lesson. Here we want to make a new column that is our `Flow` data, but instead of in cfs, our units will be in cms. Let’s use the `flow_site` data frame.

``````# the function
cfs_to_cms <- function(discharge) {
cms_out <- discharge * 0.028316847
return(cms_out)
}

# add a new column to existing dataset with mutate
flow_site <- flow_site %>%
mutate(Flow_cms = cfs_to_cms(Flow_Inst)) # apply the function to the column

# quick plot to check!
ggplot() +
geom_line(data = flow_site,
aes(x = dateTime, y = Flow_cms), color = "darkblue") +
theme_classic() +
labs(x = "", y = "Flow (cms)",
title = "Discharge from USGS Station 11427000",
caption = "data pulled using {dataRetrieval} package in R")
``````

So what happened here? We made a function `cfs_to_cms()` to apply to our data. In this case, we wanted to apply our function to a column in our dataframe, which if we remember the earlier lesson on data, in a dataframe a column is a vector of the same type of data. That means the function can very quickly iterate through every observation in the vector and convert the data from cfs to cms.

## More Practice

This section uses some more functions, but with some more advanced applications. We’ll want to have the `{sf}` package installed for the following code to work fully.

### Getting Water Quality Stations

What if we wanted to find all the possible water quality stations that are upstream or downstream of the gage we pulled data from earlier (`USGS: 11446500`)? A great function that will not only help us figure that out, but also help us pull in spatial data we can use to make maps is the `findNLDI()` function from the `{dataRetrieval}` package.

The function `findNLDI()` has arguments for the type of data we want to use (`nwis`, `huc`,`location`), the direction along a stream to look (upstream or downstream, or in mainstem or tributaries), the data we want to get back (`find`), and the distance we should search upstream or downstream (`distance_km`).

Let’s look only in the mainstem river for 50 km upstream or downstream of our USGS gage and return any NWIS (water quality or flow) sites and the associated flowlines we searched.

``````library(sf)

# get NWIS and flowline data 50km downstream of the USGS gage
amer_ds_nimbus <- findNLDI(
nwis = "11446500", # Below Nimbus Dam
nav = c("UM","DM"),
find = c("nwis", "flowlines"),
distance_km = 50)

names(amer_ds_nimbus)
``````
``````[1] "origin"       "UM_nwissite"  "UM_flowlines" "DM_nwissite"
[5] "DM_flowlines"``````

Note, this returned a named list of dataframes. We can access individual parts of a list just as we can access a single column in a dataframe using the `\$`. This function returned 5 parts, the `origin` which is the site we used to start our search, the `UM_nwissite` which is the NWIS sites upstream on the mainstem from our USGS gage, the `DM_nwissite` which is the NWIS sites downstream of our USGS gage, and finally, a mainstem flowline upstream and downstream of our gage. These are dataframes with an underlying spatial component, which is nice because it allows us to easily map and visualize these data if we want. See the mapmaking lesson for more info.The dataframes are an `sf` class, from the simple features {sf} package, but they can also be converted to into a dataframe.

``````# check the object class
class(amer_ds_nimbus\$DM_nwissite)
``````
``[1] "sf"         "tbl_df"     "tbl"        "data.frame"``
``````# get just downstream NWIS sites and make into dataframe
nwis_sites <- amer_ds_nimbus\$DM_nwissite %>%
st_drop_geometry() # drop the sf class

# verify the class has changed
class(nwis_sites)
``````
``[1] "tbl_df"     "tbl"        "data.frame"``

Let’s visualize these sites with a fancy map! See the mapmaking lesson for more details on making maps in R.

### Save Our Data!

Finally, let’s go ahead and save (export) our list of NWIS sites, and all our spatial components from the American River below Nimbus Dam so we can use it for future lessons.

``````# save out
write_csv(nwis_sites, path = here("data", "nwis_sites_american_river.csv"))
write_rds(amer_ds_nimbus, path = here("data", "nwis_sites_amer_ds_nimbus_sf.rds"))
``````

## Additional Water Packages of Interest

There are many packages that work with water or water data. Here’s a list of just a few that you may find interesting or helpful. The R CRAN repository has a list of all packages that relate to Hydrology that is also a good place to look.

• {dataRetrieval}
• {nhdplusTools}
• {EGRET}
• {geoknife}

### Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

### Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/r4wrds/r4wrds, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".