My useR!2020 tutorial on ggplot2
Today, the useR! 2020 committee announced the July 7th tutorials on twitter. I am so honored and grateful to be one of the selected tutorial leaders! What an amazing group to be a part of!
I will be giving the tutorial, Creating Beautiful Data Visualizations in R: a ggplot2
Crash course. The other tutorials are:
- Morning session:
- First steps in spatial data handling and visualization by S. Rochette, D. Scott, and J. Nowosad
- Predictive modeling with text using tidy data principles by J. Silge and E. Hansen
- So, you want to learn Python? An introduction to Python for the R lover by S. Ellis
- Application of Gaussian graphical models to metabolomics by D. Scholtens and R. Balasubramanian
- Periscope and CanvasXpress 0 Creating and enterprise-grade big-data visualization application in a day by C. Brett
- Seamless R and C++ integration with Rcpp by D. Eddelbuettel
- Create and share reproducible code with R Markdown and workflowr by J. Blischak
- Causal inference in R by L. D’Agostino McGowan
- Reproducible computation at scale with drake: hands-on practice with a machine learning project by W. Landau
- Afternoon session:
- How green was my valley - Spatial analytics with PostgreSQL, PostGIS, R, and PL/R by J. Conway
- Building interactive web application with Dash for R by R. Kyle
- Easy Larger-than-Ram data manipulation with disk.frame by ZJ Dai
- R Markdown recipes by Y. Xie
- Getting the most out of Git by C. Gillespie and R. Davies
- End-to-end maching learning with Metaflow: Going from protoype to production with Netflix’s open source project for reproducible data science by S. Goyal
- Package development by J. Hester and H. Wickham
- What they forgot to teach you about teaching R by M. Cetinkaya-Rundel
Hopefully I made the right guess at the tutorial leaders, and I apologize to those I could not find easily!
The rest of this post is my tutorial proposal that I submitted to the committee. I hope you’ll attend!
Title:
Creating Beautiful Data Visualizations in R: a ggplot2
Crash Course
Audience:
Users of any subject matter background who are interested in data visualization with R are welcome. New R users (less than 1 year of continual experience) will get the most out of this course. Some basic knowledge, such as understanding of different R data types & structures (character, data frame, etc.) and authoring simple functions & loops, is required. The course, however, is not just for beginners. More advanced R users who have little to no experience with ggplot2
and other packages in the tidyverse
will learn many new tools for data visualization in R.
Instructor background:
Dr. Tyner has been using ggplot2
since version 0.9.2 (2012). She co-authored and maintains the ggplot2
extension package geomnet
, and has taught ggplot2
to undergraduate students, graduate students, and professionals. Her material for undergraduate students can be found at csafe-isu.github.io/reu18/slides, and a list of all workshops taught is available in her CV. Dr. Tyner also has expertise in the theory of data visualization, acquired through her dissertation research and as a part of the statistical graphics working group at Iowa State University. She has written about her ggplot2
work in The R Journal and on her blog.
Domain:
Most examples will use publicly available data from the U.S. Bureau of Labor Statistics, including time series and map data, but no economics knowledge is required.
Points of appeal:
Learning ggplot2
brings joy and “aha moments” to new R users, keeping them more engaged and eager to grow their R skills. Newer R users will be and feel more empowered with data visualization skills. In addition to experiencing joy in creating beautiful graphics, advanced R users will learn to take advantage of ggplot2
’s elegant defaults, saving time on manual plotting tasks like drawing legends. Thus, time and energy can be spent on advanced analyses, not fights with plotting commands.
Learning objectives:
Upon completion of this tutorial, the participants will be able to:
- identify the appropriate plot types and corresponding
ggplot2
geom
s to consider when visualizing their data;- Participants will match variable types (categorical, continuous, etc.) to the best visualizations (boxplots, scatterplots, etc.).
- implement the
ggplot2
grammar of graphics by usingggplot()
and building up plots with the+
operator;- Participants will add
geom
s,stat
s,facet_*
s, andtheme
objects together to create beautiful graphics.
- Participants will add
- iterate through multiple visualizations of their data by changing the aesthetic mappings, geometries, and other graph properties;
- Participants will practice interchanging geometries and aesthetic mappings to communicate the data in different ways to find effective visualizations.
- incorporate custom elements (colors, fonts, etc.) into their visualizations by adjusting
ggplot2
theme elements;- Participants will be able to customize their graphs for appearance in journals, corporate reports, etc. with specific requirements.
- investigate the world of
ggplot2
independently to expand upon the skills learned in the course.- Participants will acquire “prosthetic knowledge,” or the ability to learn more about
ggplot2
and data visualization in R using the Internet.
- Participants will acquire “prosthetic knowledge,” or the ability to learn more about
Computing requirements:
Participants should have a laptop with access to the internet. Having RStudio on the laptop is recommended, but if this is not possible the student may use the RStudio Cloud workspace. Students not using the RStudio Cloud should have the tidyverse
suite of packages and the plotly
package installed. Or, for the minimalist student, ggplot2
, dplyr
, tidyr
, and plotly
will be sufficient for the bulk of the material.
Teaching assistant:
I have a great network of possible TAs for this course in current and former Iowa State University statistics Ph.D. students. I am confident I will be able to recruit two people to participate as TAs.
Lesson plan:
Session 1: ggplot2
basics (90 minutes)
- Introduction to
ggplot2
’s grammar of graphics (20 minutes)- Theory: grammar of graphics
- Optimal data format
- The basics of
geom_*
s,stat_*
s, &aes()
- One-variable visualization (10 minutes)
- Bar charts
- Histograms, density estimates
- Hands-on (5 minutes)
- Two-variable visualization (20 minutes)
- Two categorical variables:
geom_count
, etc. - Two numeric variables: scatterplots, etc.
- One categorical, one numeric: boxplots, etc.
- Two categorical variables:
- Hands-on (5 minutes)
- Three or more variable visualization (20 minutes)
- Augmenting one- and two-variable visualisation with color, and other aesthetic mappings
- Grouping
- Faceting
- Maps
- Hands-on (10 minutes)
Break (30 minutes)
Session 2: Advanced customization (90 minutes)
- Combining layers (20 minutes)
- Using the same data object
- Different data objects
- Hands-on (10 minutes)
- Graph appearance (15 minutes)
- Themes & Scales
- Titles, legends, etc.
- Hands-on (5 minutes)
- ggplot2 extensions (15 minutes)
- Animation
- Domain-specific (e.g. network analysis, time series data)
- Appearance customization (custom themes, scales, etc.)
- Hands-on (10 minutes)
- Interactivity with plotly (10 minutes)
ggplotly()
- Tooltips
- Hands-on (5 minutes)
Expected level of audience’s R background:
I anticipate two audiences for this tutorial: beginning R users, defined as having less than one year of continual experience, and advanced R users interested in data visualization and new to the tidyverse
. I expect the following skills:
- Basic understanding of R data types and structures
- Ability to write simple functions and loops
- Willingness to learn new ways of doing things in R
Other considerations:
For the tutorial I will alternate between lecturing and hands-on time. For the lecture portion, I use slides, and will need a projector and a way to connect my personal laptop, as well as a place to put notes, a water bottle, etc. For the hands-on time, participants will require Internet access and a comfortable work setting with enough space to accomodate everyone’s laptops, notebooks, etc. In additon, I would ask that the organizers please consider body diversity when choosing a space. This includes having tables that can accomodate wheelchairs, chairs without arms for folks in larger bodies, and tables available at standing height for those unable to sit for long periods.
Additional ggplot2 resources:
- Documentation, ggplot2.tidyverse.org
- Book, ggplot2-book.org
- Data Visualization with ggplot2 cheatsheet from RStudio (within RStudio, Help \(\rightarrow\) Cheatsheets \(\rightarrow\) Data Visualization with ggplot2)
- StackOverflow has 30,000+ questions tagged “[ggplot2]”
- ggplot2 Github repository has thousands of closed issues, representing problems that users at all levels have encounterd.