• Use keyboard arrow keys to
    • advance ( → ) and
    • go back ( ← )
  • Type “s” to see speaker notes
  • Type “?” to see other keyboard shortcuts

Part 1: Introduction

R Programming language for data analysis

RStudio Integrated development environment (IDE)

Quarto Computational document format

Getting Started With RStudio

RStudio: On the Web and on Your Desktop

Posit.cloud

Hosted by Posit (in the cloud)

Posit Workbench

Hosted by you, your company, your university, on prem or in the cloud

RStudio Desktop

Installed on your computer

Your Turn #1

Go to https://posit.cloud/content/6121691 in your browser. If you’re not already logged in, log in to Posit.cloud.

Please click on “Save a Permanent Copy”. If you don’t do this, you could “fill up” our seats on the shared copy, and prevent other people from participating!

Click “thumbs up” once you see something like the image below.

01:00

Reproducible Data Analysis and R Markdown

The Duke Cancer Scandal

  • Chemo sensitivity from microarrays
  • Serious errors in data analysis
  • Clinical trials based on flawed models
  • Papers retracted, lawsuits settled

Duke

"1881_at"

"31321_at"

"31725_s_at"

"32307_r_at"

MD Anderson

"1882_g_at"

"31322_at"

"31726_at"

"32308_r_at"

Do you see the off-by-one indexing error?

“Common problems are simple…

Off-by-one indexing error

Sensitive / resistant label reversal

Confounding in experimental design

Inclusion of data from non-reported sources

Wrong figure shown

… and simple problems are common.”

Point-and-Click…

… is not reproducible!

Why Bother With Reproducibility?

  • Can we redo the analysis with this month’s data?
  • Why do the data in Table 1 not seem to agree with Figure 2?
  • Why did I decide to omit these six samples from my analysis?

Your closest collaborator is you from 6 months ago…

Anatomy of a Quarto Document

Running a Single Code Chunk

Can you see the green “play” button?

That’s how you run this chunk!

Rendering

You will see (“Knit”/“Preview” instead of “Render” in R Markdown documents)

Your Turn #2

Go to File > New File > Quarto Document. Click OK.

This will give you a handy template which is used to show you a working example of a simple R Markdown document. You will be asked to provide a title and author, and you can choose any values you like, such as “Test” or “My First Markdown”.

Run each chunk by clicking the green “play” button (). Note what happens.

Render the document (). Type “test” and click Save to save the HTML file. Inspect the HTML document.

03:00

Importing Data

The Data Analysis Pipeline

CSV

Tidyverse

  • A consistent way to organize data
  • Human readable, concise, consistent code
  • Build pipelines from atomic data analysis steps

Installing and Loading Packages

read_csv()

Functions

Functions

Functions

Functions

read_csv()

Your Turn #3

In the Files pane, click on the folder exercises.

Open the file titled 01 – Introduction.qmd. Instructions for this exercise are in the text of the document.

Click the thumbs up button in Teams when you are done.

05:00

Recap

Packages extend the functionality of R. Install with install.packages() and load with library()

Functions do stuff. They accept Arguments as input and return an Output. Capture an output in an Object using the assignment operator ( <- ).

Importing Data is the first step of data analysis. Use read_csv() from the tidyverse package to import data stored in a CSV file.

What Else?

Obviously in a short workshop we can barely scratch the surface… here are some other ideas to get you thinking.

Cheat Sheets

R for Data Science

In English:

English unofficial solutions (1st ed) at https://jrnold.github.io/r4ds-exercise-solutions/index.html

In Spanish (1st ed): https://es.r4ds.hadley.nz/

File Formats

Databases

Other Output Formats

R Interface to Python

```{python}

import pandas

covid_testing.info()

```

Next Up: Visualize

Our next topic is:

Part 2: Visualize