3. Project Management: Let’s Get Organized

Know where you are…

Learning objectives

Where Am I?

*A disorganized system for finding things*

Figure 1: A disorganized system for finding things

Any time you are working on your computer, you are navigating amidst a forest of files and folders.

One of the most common issues we deal with is finding where a file or folder is located, or telling your computer where something is.

One of the best habits you can form (whether you are using R or not!), is intentionally keeping a clear structure, no matter what project or task you face. This becomes especially important when using computer programming like in your work. You will need to tell R, very specifically, where you are and where your files are in the forest of your computer.

Where you are is typically referred to as the working directory. In , think of this as your homebase, and everything is relative to this folder/location on your computer.

Use Project Workflows

One of the great advantages of using tools like RStudio is they make it easy to use a more organized “project” workflow. What do we mean by “using projects”? Think of a general pattern or structure that we can use for each work project or analysis we have. This approach isn’t just specific to R. Any good data scientist will generally have a folder structure and organization scheme they follow, no matter what programming language they use.

The general idea is to always keep the same structure, and naming schemes, for every project. Do this every single time with every single project you make, in order to make it a habit. This will save you time and brainpower! Imagine quickly moving between tasks or projects with minimal time spent “trying to find where things are and get oriented”. You’ll always know where things should be!

Basic Folders for Every Project

At a minimum, it’s useful to have separate directories for each of the following:

*An example project folder structure*

Figure 2: An example project folder structure

RStudio Projects

Within the R environment, something that makes project management and organization much easier is the use of RStudio Projects (.Rproj). Within RStudio, this is baked in and pretty easy to do. One of the nicest parts of using RStudio Projects is that they automatically set the working directory to the folder containing the .RProj file. You can make any existing folder an RProject folder, or make a new one! RStudio projects are a great way to quickly and easily organize things, and each project can be a specific task, or a larger set of objectives you need to complete. With RStudio Projects, it’s easy to switch between projects, or work on different projects simultaneously without much mental overload, especially if you use the same project folder structure across each project!

Setting up an .Rproj

Let’s go ahead and create an RStudio Project as part of this course that you can use throughout each module. And we highly recommend that no matter what you are working on in R, try to do it within an RStudio Project!

  1. Open RStudio and navigate to the upper right-hand side where it says “Project: (None)”. If we click on this button, we should see some options. Select New Project.
  2. We can use either a New Directory, or setup a project in an Existing Directory. Both give similar options.
  3. Select “New Project
*RStudio Project setup*

Figure 3: RStudio Project setup

  1. Make a new sub-directory folder (if you don’t have it) under your Documents folder called: Rprojects
  1. Finally, we can name our new project (remember no spaces in our folder/file names!): intro_r4wrds
*Make a new project named **intro_r4wrds***

Figure 4: Make a new project named intro_r4wrds

Great! Now we can create a folder structure in our project as discussed in the Basic Folders for Every Project section above. More importantly, we can also put all our course data into the data folder in your RStudio project, and you’re ready to go!

File Paths

In R, file paths are always wrapped in quotes. There are 2 basic kinds of file paths:

The good news is if you setup an RStudio project as shown above, you can use relative filepaths with ease!

The Working Directory

Our working directory should be the root place where our project lives (as shown above). You can always check with getwd(), and you should see the folder where your .Rproj file lives! Importantly, everything you do should be relative to that working directory.

That means we really don’t want to use things like setwd() (set working directory) to locate a file or folder on our computer, or use a hard path (i.e., a full path like C:/MyUserName/My_Documents/A_Folder_You_May_Have/But_This_One_You_Definitely_Dont/). That’s because this will pretty much never work on anyone’s computer other than your own, and sometimes it may not even work on your computer if you change a file name or folder! We really want to set a good habit, to make things reproducible for others, and for our future self.

Using the {here} package

Good news! There’s a package that can make this easier. The {here} package makes it easy to create a path relative to the top-level directory (the place where your current project is or any time you call here()). In addition, we can use here() to build a relative path to a file for saving or loading. Let’s say we’re working in our MyName/Documents/Projects/2020 folder.

library(here)

# identify your working directory.
here()
#> [1] /Users/MyName/Documents/Projects/2020

# load a file from `MyName/Documents/Projects/2020/data/superdata.csv`
read.csv(here("data", "superdata.csv"))
*Illustration by @allison_horst.*

Figure 5: Illustration by @allison_horst.

Using the {here} package means we can share our project with other folks and it will work, and if something changes around inside the project, it will remain functional and accessible.

Best Practices: Project Organization/Workflow Tips

Although there is no “best” way to lay out a project, there are some general principles to adhere to that will make project management easier. Here’s some sage advice from Jenny Bryan and Jim Hester from What They Forgot to Teach you About R (worth checking out!):

Always start R as a blank slate

When you quit R, do not save the workspace to an .Rdata file. When you launch, do not reload the workspace from an .Rdata file.

In fact, we should all make our default setting a blank slate. We should only be loading and working on data and code that we knowingly and willingly open or import into R.

In RStudio, set this via Tools > Global Options

Change defaults to never save your workspace to .RData! (Credit to Jenny Bryan and Jim Hester at rstats.wtf)

Figure 6: Change defaults to never save your workspace to .RData! (Credit to Jenny Bryan and Jim Hester at rstats.wtf)

Safe File Naming

This is really important and will make life easier for everyone in the long run. Jenny Bryan has the best set of slides on this, so take a few minutes and go read them. Then be the change!

Slides You Need to Read

TL&DR (Too long, didn’t read)

Treat raw data as Read Only

This is probably the most important tip for making a project reproducible and hassle free. Raw data should never be edited, because you don’t want to permanently change your starting point in an analysis, and you want to have a record of any changes you make to data. Therefore, treat your raw data as “read only”, perhaps even making a raw_data directory that is never modified. If you do some data cleaning or modification, save the modified file separate from the raw data, and ideally keep all the modifying actions in a script so that you can review and revise them as needed in the future.

Treat generated output as disposable

Anything generated by your scripts should be treated as disposable: it should all be able to be regenerated with code. Don’t get attached to anything other than your raw data, and your code! There are lots of different ways to manage this output, and what’s best may depend on the particular kind of project.

Summary

All in all, there are certainly many options and ways we can create and work using a project-based workflow. Importantly, find a strategy that works for the majority of the work you do, and then be consistent about organizing things the same way (from folder structure to file naming). This will help folks you work with, it will help you (including your future self), and it will mean you spend less time just figuring out where you are and where the pieces you need to do work are and more time doing the work you need to do!

Lesson adapted from R-DAVIS, Jenny Bryan and Jim Hester’s What they forgot to teach you about R, and the Data Carpentry: R for data analysis and visualization of Ecological Data lessons.


Previous module:
2. Getting started
Next module:
4. Import/export data

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/r4wrds/r4wrds, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".