While you work through this tutorial, you will create an R Markdown (.Rmd) document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents that include R code and results. Since all the data analysis and results are automatically included in the compiled output document, your work is reproducible and it’s easy to re-do analysis if the data change, or if a mistake is uncovered. For more details on using R Markdown see http://rmarkdown.rstudio.com.
To create a real Rmd file, you will have to work in RStudio (outside this tutorial environment). So, as you work on this tutorial, you will probably switch back and forth between the tutorial itself and a “real” RStudio session.
Go to http://rstudio.calvin.edu and enter your user name and password to log in. If you need help, go to the “Links and Resources” section of the course Moodle site and click on “Calvin RStudio server information.”
You can do things in R by typing commands in the Console panel; however, working that way makes it hard to keep a record of your work (and hard to redo things if anything changes or if a mistake was made). For this class, you will work in R markdown files, which can contain text, R code, and R output (such as figures).
Every R Markdown file (Rmd file) must be completely stand-alone. It doesn’t share any information with the Console or the Environment that you see in your RStudio session. All R code that you need to do whatever you are trying to do must be included in the Rmd file itself!
For example, if you use the point-and-click user interface in the RStudio Environment tab to import a data file, the read-in dataset will not be available for use within your Rmd file.
Similarly, if you load a package by typing “library(mosaic)” (or “require(mosaic)”) in the Console window, mosaic
functions and data will not be available to use within the Rmd file.
Keep it stand-alone!
In RStudio, navigate to File -> New File -> R Markdown…, or click on the white rectangle with a green + and select R Markdown from the drop-down menu.
Choose “From Template” and select the “mosaic fancy” template. (This template has a lot of stuff in it to show what you can do in an Rmd document. Once you have some practice, you may want to switch to the simpler “mosaic plain (PDF)” template - it contains less stuff you’d need to delete before inserting your own content.)
Save your file by clicking on the disk icon at the top of the file tab (maybe give it a useful file name like DeRuiterProject1.Rmd. Avoid spaces and special characters in your file name). The file will be saved to the server, not to your computer. All your files will be accessible in the RStudio Files tab (lower right panel) whenever you log into RStudio, regardless of which computer you are using.
However, you will have to download your files if you want a copy on your own computer, or to be able to upload a copy to Moodle to turn in. To download, go to the File tab, check the box for the file you want, then select More - Export. from the menu at the top of the File tab.
How do Rmd files actually work? What’s so cool about them?
Click on the small black arrow next to the word “Knit” (and the ball of yard icon) at the top of the file window. Select “Knit to PDF”. Check out the compiled PDF result, and compare it to the original markdown file. Wow!
At the top of the markdown file, enter an appropriate title, author(s), and date (within the quotation marks). Knit again to see the effect.
The R Markdown file is a text file where you save all the R commands you want to use, plus any text commenting on the work you are doing and the results you get. Parts of the file with a plain white background are normal text.
You can format the text. For example, enclosing a word in asterisks will generate italics, so *my text* in the Rmd file will become my text in the PDF. Using two asterisks instead of one will generate boldface, so **my text** becomes my text. You can also make bulleted lists, numbered lists, section headers, and more. For example,
### Some Text
becomes
(a sub-section header). Fewer hashtags make the text even larger, and more make it smaller.
Check out the R Markdown cheat sheet https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf for more examples.
For instructions on how to include R output and special characters (symbols, subscripts, etc.) in your text, see the document in the “Links and Resources” section of the course Moodle site.
Before moving on, try a few of the tricks you just learned in your Rmd file. Make it pretty!
An Rmd file can also contain one or more R code chunks. These sections of the file have a grey background onscreen. Each one begins with
```{r}
and ends with
```
like so:
To add a code chunk to your file, you can type in the header and footer by hand to start and end the chunk. Or, you can click on the green box with the C inside (at the top of the Rmd file) to insert an empty chunk. When you click the Knit button a document will be generated that includes both text content as well as the output of any embedded R code chunks within the document.
The first R code chunk in a Rmd file is usually used to specify settings. In this chunk, you can also give R permission to use certain packages (software toolkits) with
library(packagename)
Alternatively,
require(packagename)
does the same thing.
For example, we will almost always use the mosaic package. So, verify that the first R code chunk in your file includes the line library(mosaic)
or require(mosaic)
(both options work equally well).
If you look carefully at the PDF output, you will see that the settings chunk does not appear there. That’s intentional - for homework, you don’t need to show me what settings you used, although I would always like the R code you use to solve problems to show up in the output PDF.
Going back to the Rmd file, look at the header of the settings chunk – this is the part between the {curly braces}. Notice that in addition to the required “r” label, which is followed by an (optional) chunk name, the option
include=FALSE
appears, which means that the code from this chunk will be invisible in your compiled output document.
You can add other options for R chunks (we will learn about some more choices later) and they should all be separated by commas. You can control things like what is included in the output PDF and how big figures are in the PDF. For this class, the defaults defined in the mosaic plain and fancy templates will work well most of the time.
If interested, see https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf or http://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf.
At this point, you probably want to get rid of all the extra content in the template. Delete everything in the file that comes after the settings R code chunk. Now you have space to include your own R code and text.
There are multiple ways to run and test R code from a markdown file. Sometimes you want to knit the whole file and get the PDF; other times you want to run just a specific bit of code to make sure it’s working correctly.
Obviously, every time you knit the file, all R code will be run automatically.
Finally, here’s a third way to use shortcuts/buttons (option 3):
Copy the code you want to run, paste to the console window, and hit Enter.
(Or on Windows, place your cursor in the line you want to run and hit ctrl + enter.)
We already covered this once, but we’re covering it again because it’s one of the most common student mistakes in Rmd files!
If you run R code in the console or the RStudio GUI (for example, reading in a data set by pasting code into the console or using the Import Dataset button in the Environment tab), you won’t be able to use the results in your markdown file. Any and all commands you need, including reading in data, need to be included in the file.
You can load online datafiles in .csv format into R using the function read.csv()
. The input to read.csv()
is the full url where the file is located, in quotation marks. (Single or double quotes – it doesn’t matter which you choose, as they are equivalent in R.)
For example, to read in a dataset about bees stored at http://www.calvin.edu/~sld33/data/FlyingBees.csv, use the command below. The data includes measurements of the size of each bee, and the time it took the bee to fly through a maze.
bees <- read.csv('http://www.calvin.edu/~sld33/data/FlyingBees.csv')
bees <- read.csv('http://www.calvin.edu/~sld33/data/FlyingBees.csv')
Note that we assigned the name bees to the dataset we read in using <- (the “arrow” points from the object toward the name being assigned to the object).
After reading the data in, you can use R functions to have a look at it, for example:
head(bees)
str(bees)
nrow(bees)
Try each of the lines of code above in R. What do the functions head()
, str()
, and nrow()
do? Try to figure it out based on the output they produce.
If you get stuck, consult R’s built-in help files. Remember, you can access the help for a function by running the code ?functionName
– for example, if you want help on head()
, run:
?head
You can also upload your own data file to the server, and then read it in to R using read.csv. The basic process is:
read.csv()
function to read the file into RYour data table should have one row per observation (or “case”) – the people, animals, places, or things about which data have been collected.
There should be one column for each variable observed. Choose short but informative names for your variables (and the values they can take on), and avoid using any special symbols or spaces in the names.
Use any spreadsheet software to create a simple data file.
Use these data: Researchers studied wild marmots to see if they made whistles (alarm calls) in response to hiker passing near their burrows. Some hikers had dogs, others not. 20 out of 21 hikers with dogs caused whistles, while 12 out of 20 hikers without dogs caused whistles. When you are happy with your table, save it in CSV (comma separated values) format.
Go to the Files tab in RStudio (lower right). Click Upload and browse to select the file you created. Then, use the read.csv()
function to read in the file. Remember to put the file name in quotes, and use <- to assign a name to the dataset! Note that if you made folders in your Files tab and stored the csv file inside one of them, you will have to include the relative path from your Rmd file to your data file, e.g., “myfolder/myfilename.csv”.)
When you knit a Rmd file, it will probably automatically open in RStudio’s built-in PDF pre-viewer. If you print from there, it will look awful (blurry). But when you knit, a PDF of the output is also saved in your Files tab. If you go to the Files tab and double click the PDF, it should open in your computer’s normal PDF viewer and you can print it from there…and it will look good.
Start with the Rmd file you’ve been working with so far. Make sure it has an informative title, the names of everyone in your group, and the date.
Add answers to the following questions in the text part of the file, as well as R code chunks containing all R code that you used to answer them. If you used a ? (help) command, explain how/why in the text (since its results will not appear in the PDF). To answer the questions, you will need to use things you learned in the R Basics tutorial as well as this slide show.