My useR!2020 tutorial on ggplot2

Today, the useR! 2020 committee announced the July 7th tutorials on twitter. I am so honored and grateful to be one of the selected tutorial leaders! What an amazing group to be a part of!

I will be giving the tutorial, Creating Beautiful Data Visualizations in R: a ggplot2 Crash course. The other tutorials are:

  • Morning session:
    • First steps in spatial data handling and visualization by S. Rochette, D. Scott, and J. Nowosad
    • Predictive modeling with text using tidy data principles by J. Silge and E. Hansen
    • So, you want to learn Python? An introduction to Python for the R lover by S. Ellis
    • Application of Gaussian graphical models to metabolomics by D. Scholtens and R. Balasubramanian
    • Periscope and CanvasXpress 0 Creating and enterprise-grade big-data visualization application in a day by C. Brett
    • Seamless R and C++ integration with Rcpp by D. Eddelbuettel
    • Create and share reproducible code with R Markdown and workflowr by J. Blischak
    • Causal inference in R by L. D’Agostino McGowan
    • Reproducible computation at scale with drake: hands-on practice with a machine learning project by W. Landau
  • Afternoon session:
    • How green was my valley - Spatial analytics with PostgreSQL, PostGIS, R, and PL/R by J. Conway
    • Building interactive web application with Dash for R by R. Kyle
    • Easy Larger-than-Ram data manipulation with disk.frame by ZJ Dai
    • R Markdown recipes by Y. Xie
    • Getting the most out of Git by C. Gillespie and R. Davies
    • End-to-end maching learning with Metaflow: Going from protoype to production with Netflix’s open source project for reproducible data science by S. Goyal
    • Package development by J. Hester and H. Wickham
    • What they forgot to teach you about teaching R by M. Cetinkaya-Rundel

Hopefully I made the right guess at the tutorial leaders, and I apologize to those I could not find easily!

The rest of this post is my tutorial proposal that I submitted to the committee. I hope you’ll attend!


Title:

Creating Beautiful Data Visualizations in R: a ggplot2 Crash Course

Audience:

Users of any subject matter background who are interested in data visualization with R are welcome. New R users (less than 1 year of continual experience) will get the most out of this course. Some basic knowledge, such as understanding of different R data types & structures (character, data frame, etc.) and authoring simple functions & loops, is required. The course, however, is not just for beginners. More advanced R users who have little to no experience with ggplot2 and other packages in the tidyverse will learn many new tools for data visualization in R.

Instructor background:

Dr. Tyner has been using ggplot2 since version 0.9.2 (2012). She co-authored and maintains the ggplot2 extension package geomnet, and has taught ggplot2 to undergraduate students, graduate students, and professionals. Her material for undergraduate students can be found at csafe-isu.github.io/reu18/slides, and a list of all workshops taught is available in her CV. Dr. Tyner also has expertise in the theory of data visualization, acquired through her dissertation research and as a part of the statistical graphics working group at Iowa State University. She has written about her ggplot2 work in The R Journal and on her blog.

Domain:

Most examples will use publicly available data from the U.S. Bureau of Labor Statistics, including time series and map data, but no economics knowledge is required.

Points of appeal:

Learning ggplot2 brings joy and “aha moments” to new R users, keeping them more engaged and eager to grow their R skills. Newer R users will be and feel more empowered with data visualization skills. In addition to experiencing joy in creating beautiful graphics, advanced R users will learn to take advantage of ggplot2’s elegant defaults, saving time on manual plotting tasks like drawing legends. Thus, time and energy can be spent on advanced analyses, not fights with plotting commands.

Learning objectives:

Upon completion of this tutorial, the participants will be able to:

  1. identify the appropriate plot types and corresponding ggplot2 geoms to consider when visualizing their data;
    • Participants will match variable types (categorical, continuous, etc.) to the best visualizations (boxplots, scatterplots, etc.).
  2. implement the ggplot2 grammar of graphics by using ggplot() and building up plots with the + operator;
    • Participants will add geoms, stats, facet_*s, and theme objects together to create beautiful graphics.
  3. iterate through multiple visualizations of their data by changing the aesthetic mappings, geometries, and other graph properties;
    • Participants will practice interchanging geometries and aesthetic mappings to communicate the data in different ways to find effective visualizations.
  4. incorporate custom elements (colors, fonts, etc.) into their visualizations by adjusting ggplot2 theme elements;
    • Participants will be able to customize their graphs for appearance in journals, corporate reports, etc. with specific requirements.
  5. investigate the world of ggplot2 independently to expand upon the skills learned in the course.
    • Participants will acquire “prosthetic knowledge,” or the ability to learn more about ggplot2 and data visualization in R using the Internet.

Computing requirements:

Participants should have a laptop with access to the internet. Having RStudio on the laptop is recommended, but if this is not possible the student may use the RStudio Cloud workspace. Students not using the RStudio Cloud should have the tidyverse suite of packages and the plotly package installed. Or, for the minimalist student, ggplot2, dplyr, tidyr, and plotly will be sufficient for the bulk of the material.

Teaching assistant:

I have a great network of possible TAs for this course in current and former Iowa State University statistics Ph.D. students. I am confident I will be able to recruit two people to participate as TAs.

Lesson plan:

Session 1: ggplot2 basics (90 minutes)

  • Introduction to ggplot2’s grammar of graphics (20 minutes)
    • Theory: grammar of graphics
    • Optimal data format
    • The basics of geom_*s,stat_*s, & aes()
  • One-variable visualization (10 minutes)
    • Bar charts
    • Histograms, density estimates
  • Hands-on (5 minutes)
  • Two-variable visualization (20 minutes)
    • Two categorical variables: geom_count, etc.
    • Two numeric variables: scatterplots, etc.
    • One categorical, one numeric: boxplots, etc.
  • Hands-on (5 minutes)
  • Three or more variable visualization (20 minutes)
    • Augmenting one- and two-variable visualisation with color, and other aesthetic mappings
    • Grouping
    • Faceting
    • Maps
  • Hands-on (10 minutes)

Break (30 minutes)

Session 2: Advanced customization (90 minutes)

  • Combining layers (20 minutes)
    • Using the same data object
    • Different data objects
  • Hands-on (10 minutes)
  • Graph appearance (15 minutes)
    • Themes & Scales
    • Titles, legends, etc.
  • Hands-on (5 minutes)
  • ggplot2 extensions (15 minutes)
    • Animation
    • Domain-specific (e.g. network analysis, time series data)
    • Appearance customization (custom themes, scales, etc.)
  • Hands-on (10 minutes)
  • Interactivity with plotly (10 minutes)
    • ggplotly()
    • Tooltips
  • Hands-on (5 minutes)

Expected level of audience’s R background:

I anticipate two audiences for this tutorial: beginning R users, defined as having less than one year of continual experience, and advanced R users interested in data visualization and new to the tidyverse. I expect the following skills:

  • Basic understanding of R data types and structures
  • Ability to write simple functions and loops
  • Willingness to learn new ways of doing things in R

Other considerations:

For the tutorial I will alternate between lecturing and hands-on time. For the lecture portion, I use slides, and will need a projector and a way to connect my personal laptop, as well as a place to put notes, a water bottle, etc. For the hands-on time, participants will require Internet access and a comfortable work setting with enough space to accomodate everyone’s laptops, notebooks, etc. In additon, I would ask that the organizers please consider body diversity when choosing a space. This includes having tables that can accomodate wheelchairs, chairs without arms for folks in larger bodies, and tables available at standing height for those unable to sit for long periods.

Additional ggplot2 resources:

Sam Tyner
Sam Tyner
AAAS Science & Technology Policy Fellow

I am an applied statistician and data scientist, with a wide range of skills and experiences. I’m passionate about using data to make a difference.

Previous

Related