• Use keyboard arrow keys to
    • advance ( → ) and
    • go back ( ← )
  • Type “s” to see speaker notes
  • Type “?” to see other keyboard shortcuts

Introduction to R for Clinical Data

Presented by the Children’s Hospital of Philadelphia (CHOP) R User Group and Arcus Data Education

Part I: Introduction

R Programming language for data analysis

RStudio Integrated development environment (IDE)

R Markdown Computational document format

Getting Started With RStudio

RStudio: On the Web and on Your Desktop

RStudio Server

Hosted on a server

(in the cloud)

RStudio Desktop

Installed on your computer

Whoops!

There are some small server issues that won’t affect our work today but will create a couple of scary looking messages in your first exercise.

Failed to create bus connection: No such file or directory

Warning message: In system(“timedatectl”, intern = TRUE) : running command ‘timedatectl’ had status 1

Your Turn #1

Click the link in the chat to access the RStudio training environment.

Log in using your username and password.

Click “yes” once you see the RStudio panes.

Reproducible Data Analysis and R Markdown

The Duke Cancer Scandal

  • Chemo sensitivity from microarrays
  • Serious errors in data analysis
  • Clinical trials based on flawed models
  • Papers retracted, lawsuits settled

Duke

"1881_at"

"31321_at"

"31725_s_at"

"32307_r_at"

MD Anderson

"1882_g_at"

"31322_at"

"31726_at"

"32308_r_at"

Do you see the off-by-one indexing error?

“Common problems are simple…

Off-by-one indexing error

Sensitive / resistant label reversal

Confounding in experimental design

Inclusion of data from non-reported sources

Wrong figure shown

… and simple problems are common.”

Point-and-Click…

… is not reproducible!

Why Bother With Reproducibility?

  • Can we redo the analysis with this month’s data?
  • Why do the data in Table 1 not seem to agree with Figure 2?
  • Why did I decide to omit these six samples from my analysis?

Your closest collaborator is you from 6 months ago…

Anatomy of an R Markdown document

Running a Single Code Chunk

Rendering (“Knitting”)

Your Turn #2

Go to File > New File > R Markdown. Click OK.

This will give you a handy template which is used to show you a working example of a simple R Markdown document. You will be asked to provide a title and author, and you can choose any values you like, such as “Test” or “My First Markdown”.

Run each chunk by clicking the green “play” button (). Note what happens.

Knit the document with the “ball of yarn” button (). Type “test” and click Save to save the HTML file. Inspect the HTML document.

03:00

Importing Data

The Data Analysis Pipeline

CSV

Tidyverse

  • A consistent way to organize data
  • Human readable, concise, consistent code
  • Build pipelines from atomic data analysis steps

Installing and Loading Packages

read_csv()

Functions

Functions

Functions

Functions

read_csv()

Your Turn #3

In the Files pane, click on the folder exercises.

Open the R Markdown file titled 01 – Introduction.Rmd. Instructions for this exercise are in the text of the R Markdown document.

Click () when you are done.

05:00

Recap

Packages extend the functionality of R. Install with install.packages() and load with library()

Functions do stuff. They accept Arguments as input and return an Output. Capture an output in an Object using the assignment operator ( <- ).

Importing Data is the first step of data analysis. Use read_csv() from the tidyverse package to import data stored in a CSV file.

What Else?

Obviously in a five hour workshop we can barely scratch the surface… here are some other ideas to get you thinking.

Cheat Sheets

R for Data Science

File Formats

Databases

Other Output Formats

R Interface to Python

```{python}

import pandas

covid_testing.info()

```