# My useR!2020 tutorial on ggplot2

Today, the useR! 2020 committee announced the July 7th tutorials on twitter. I am so honored and grateful to be one of the selected tutorial leaders! What an amazing group to be a part of!

I will be giving the tutorial, *Creating Beautiful Data Visualizations in R: a ggplot2 Crash course.* The other tutorials are:

- Morning session:
*First steps in spatial data handling and visualization*by S. Rochette, D. Scott, and J. Nowosad*Predictive modeling with text using tidy data principles*by J. Silge and E. Hansen*So, you want to learn Python? An introduction to Python for the R lover*by S. Ellis*Application of Gaussian graphical models to metabolomics*by D. Scholtens and R. Balasubramanian*Periscope and CanvasXpress 0 Creating and enterprise-grade big-data visualization application in a day*by C. Brett*Seamless R and C++ integration with Rcpp*by D. Eddelbuettel*Create and share reproducible code with R Markdown and workflowr*by J. Blischak*Causal inference in R*by L. D’Agostino McGowan*Reproducible computation at scale with drake: hands-on practice with a machine learning project*by W. Landau

- Afternoon session:
*How green was my valley - Spatial analytics with PostgreSQL, PostGIS, R, and PL/R*by J. Conway*Building interactive web application with Dash for R*by R. Kyle*Easy Larger-than-Ram data manipulation with disk.frame*by ZJ Dai*R Markdown recipes*by Y. Xie*Getting the most out of Git*by C. Gillespie and R. Davies*End-to-end maching learning with Metaflow: Going from protoype to production with Netflix’s open source project for reproducible data science*by S. Goyal*Package development*by J. Hester and H. Wickham*What they forgot to teach you about teaching R*by M. Cetinkaya-Rundel

Hopefully I made the right guess at the tutorial leaders, and I apologize to those I could not find easily!

The rest of this post is my tutorial proposal that I submitted to the committee. I hope you’ll attend!

## Title:

### Creating Beautiful Data Visualizations in R: a `ggplot2`

Crash Course

## Audience:

Users of any subject matter background who are interested in data visualization with R are welcome. New R users (less than 1 year of continual experience) will get the most out of this course. Some basic knowledge, such as understanding of different R data types & structures (character, data frame, etc.) and authoring simple functions & loops, is required. The course, however, is not just for beginners. More advanced R users who have little to no experience with `ggplot2`

and other packages in the `tidyverse`

will learn many new tools for data visualization in R.

## Instructor background:

Dr. Tyner has been using `ggplot2`

since version 0.9.2 (2012). She co-authored and maintains the `ggplot2`

extension package `geomnet`

, and has taught `ggplot2`

to undergraduate students, graduate students, and professionals. Her material for undergraduate students can be found at csafe-isu.github.io/reu18/slides, and a list of all workshops taught is available in her CV. Dr. Tyner also has expertise in the theory of data visualization, acquired through her dissertation research and as a part of the statistical graphics working group at Iowa State University. She has written about her `ggplot2`

work in The R Journal and on her blog.

## Domain:

Most examples will use publicly available data from the U.S. Bureau of Labor Statistics, including time series and map data, but no economics knowledge is required.

## Points of appeal:

Learning `ggplot2`

brings joy and “aha moments” to new R users, keeping them more engaged and eager to grow their R skills. Newer R users will be and feel more empowered with data visualization skills. In addition to experiencing joy in creating beautiful graphics, advanced R users will learn to take advantage of `ggplot2`

’s elegant defaults, saving time on manual plotting tasks like drawing legends. Thus, time and energy can be spent on advanced analyses, not fights with plotting commands.

## Learning objectives:

Upon completion of this tutorial, the participants will be able to:

- identify the appropriate plot types and corresponding
`ggplot2`

`geom`

s to consider when visualizing their data;- Participants will match variable types (categorical, continuous, etc.) to the best visualizations (boxplots, scatterplots, etc.).

- implement the
`ggplot2`

grammar of graphics by using`ggplot()`

and building up plots with the`+`

operator;- Participants will add
`geom`

s,`stat`

s,`facet_*`

s, and`theme`

objects together to create beautiful graphics.

- Participants will add
- iterate through multiple visualizations of their data by changing the aesthetic mappings, geometries, and other graph properties;
- Participants will practice interchanging geometries and aesthetic mappings to communicate the data in different ways to find effective visualizations.

- incorporate custom elements (colors, fonts, etc.) into their visualizations by adjusting
`ggplot2`

theme elements;- Participants will be able to customize their graphs for appearance in journals, corporate reports, etc. with specific requirements.

- investigate the world of
`ggplot2`

independently to expand upon the skills learned in the course.- Participants will acquire “prosthetic knowledge,” or the ability to learn more about
`ggplot2`

and data visualization in R using the Internet.

- Participants will acquire “prosthetic knowledge,” or the ability to learn more about

## Computing requirements:

Participants should have a laptop with access to the internet. Having RStudio on the laptop is recommended, but if this is not possible the student may use the RStudio Cloud workspace. Students not using the RStudio Cloud should have the `tidyverse`

suite of packages and the `plotly`

package installed. Or, for the minimalist student, `ggplot2`

, `dplyr`

, `tidyr`

, and `plotly`

will be sufficient for the bulk of the material.

## Teaching assistant:

I have a great network of possible TAs for this course in current and former Iowa State University statistics Ph.D. students. I am confident I will be able to recruit two people to participate as TAs.

## Lesson plan:

Session 1: `ggplot2`

basics (90 minutes)

- Introduction to
`ggplot2`

’s grammar of graphics (20 minutes)- Theory: grammar of graphics
- Optimal data format
- The basics of
`geom_*`

s,`stat_*`

s, &`aes()`

- One-variable visualization (10 minutes)
- Bar charts
- Histograms, density estimates

- Hands-on (5 minutes)
- Two-variable visualization (20 minutes)
- Two categorical variables:
`geom_count`

, etc. - Two numeric variables: scatterplots, etc.
- One categorical, one numeric: boxplots, etc.

- Two categorical variables:
- Hands-on (5 minutes)
- Three or more variable visualization (20 minutes)
- Augmenting one- and two-variable visualisation with color, and other aesthetic mappings
- Grouping
- Faceting
- Maps

- Hands-on (10 minutes)

Break (30 minutes)

Session 2: Advanced customization (90 minutes)

- Combining layers (20 minutes)
- Using the same data object
- Different data objects

- Hands-on (10 minutes)
- Graph appearance (15 minutes)
- Themes & Scales
- Titles, legends, etc.

- Hands-on (5 minutes)
- ggplot2 extensions (15 minutes)
- Animation
- Domain-specific (e.g. network analysis, time series data)
- Appearance customization (custom themes, scales, etc.)

- Hands-on (10 minutes)
- Interactivity with plotly (10 minutes)
`ggplotly()`

- Tooltips

- Hands-on (5 minutes)

## Expected level of audience’s R background:

I anticipate two audiences for this tutorial: beginning R users, defined as having less than one year of continual experience, and advanced R users interested in data visualization and new to the `tidyverse`

. I expect the following skills:

- Basic understanding of R data types and structures
- Ability to write simple functions and loops
- Willingness to learn new ways of doing things in R

## Other considerations:

For the tutorial I will alternate between lecturing and hands-on time. For the lecture portion, I use slides, and will need a projector and a way to connect my personal laptop, as well as a place to put notes, a water bottle, etc. For the hands-on time, participants will require Internet access and a comfortable work setting with enough space to accomodate everyone’s laptops, notebooks, etc. In additon, I would ask that the organizers please consider body diversity when choosing a space. This includes having tables that can accomodate wheelchairs, chairs without arms for folks in larger bodies, and tables available at standing height for those unable to sit for long periods.

Additional ggplot2 resources:

- Documentation, ggplot2.tidyverse.org
- Book, ggplot2-book.org
- Data Visualization with ggplot2 cheatsheet from RStudio (within RStudio, Help \(\rightarrow\) Cheatsheets \(\rightarrow\) Data Visualization with ggplot2)
- StackOverflow has 30,000+ questions tagged “[ggplot2]”
- ggplot2 Github repository has thousands of closed issues, representing problems that users at all levels have encounterd.