Data Visualization

Logo

Wednesday August 2nd, 2023 (1:00 - 4:00 pm PT), Thursday August 3rd, 2023 (1:00 - 4:00 pm PT)

Workshop Description

Data visualization helps us to understand our data, infer patterns, and communicate our insights. Join us for our Visualization for Data Science workshop to learn the basics of this impactful tool using Python, arguably the most popular language to do data science. We will guide you through the essentials of creating meaningful, impactful, and aesthetically pleasing visual representations of data, from exploratory data analysis to presentation-ready graphics. You will get the chance to try out these new skills on simple data sets and get feedback on your designs from peers. No data visualization experience is required, but some familiarity with the basics of Python is useful.

Class outcome

After this class, you will:

  1. Know the principles of data visualization necessary for effective data communication
  2. Become familiar with the possibilities and processes for creating professional-looking visualizations in Python
  3. Gain an appreciation of a key data science functional technique called exploratory data analysis

About the Instructor

Kaleigh Mentzer

Kaleigh Mentzer is a research engineer working on large-scale data problems for a stealth startup. She earned her PhD in Computational and Mathematical Engineering from Stanford, advised by Irene Lo and Itai Ashlagi. Her research focuses on using algorithmic and optimization-based tools to improve equitable access to education and has informed educational policy decisions in San Francisco.

Prerequisites

This workshop is designed to be accessible to beginners with basic experience with Python, on the level of workshop SWS 02 (Introduction to Python).

Our goal is to create an inclusive and supportive learning environment, and we want all students to succeed. However, to set you up for success, we also want to clearly communicate the necessary level of prior programming familiarity. If you are unsure whether you have the required background, please feel free to reach out for guidance.

Requirements

To join the workshop, you’ll need a device with a recent web browser and two-way audio and video access to Zoom. This could be a laptop or desktop computer running any operating system, such as Windows, Mac, or Linux. Participative activities benefit from a larger screen, so joining via a smartphone or tablet may not provide the best learning experience.

You do not need to install Python or any other software before the workshop. We will provide more detailed instructions prior to the start to ensure that you are set up and ready to learn.

Syllabus Outline

This class will cover key data visualization skills for data science applications.

  1. Why Data Visualization
    • Why do we need data visualization to understand our data?
    • Roles of data visualization
  2. Data Visualization for Yourself: Exploratory Data Analysis
    • What is exploratory data analysis (EDA)?
    • Nominal, ordinal, and quantitative data
    • Missing or invalid data
    • Bar charts, scatter plots, line plots, histograms, heatmaps, pairplots
    • Implementation in Python
    • Exercise: EDA of sample data set
  3. Data Visualization for Others: Communicating Findings
    • What should you consider when designing for communication? How is it different from EDA?
    • Human visual perception considerations for data visualization design
    • Colors and accessibility
    • Titles, labels, and annotation
    • Data storytelling
    • Positive and negative data visualization examples
    • Exercise: Build a visualization and get peer feedback on design

Workshop Materials

Lecture

Recordings

Notebooks

In-Class Exercises

Day 1:

  1. Get Google Colab notebook running.
  2. Make a ydata_profiling report for the airline delay data.

Day 2: Solo Exercise:

  1. Refine a visualization based on one of the findings in the exploratory data analysis process. Think about a descriptive title, labels, and legends!
  2. Don’t worry if you don’t finish. You’ll have the chance to share your ideas with a peer later and get feedback.

Group Exercise:

  1. Introduce yourself to your partner
  2. The person whose first name comes first alphabetically shows their visualization first, and the non-visualization designer will explain back what they think the point of the visualization is.
  3. Discuss what went well and what the designer can improve to communicate more clearly
  4. Swap roles and repeat

Homework for Wednesday Night

  1. Explore the Airline Delay dataset using the tools you learned today.
  2. Create a scatter plot of departure delay vs arrival delay. How correlated are the two? What does this suggest about why flights are delayed?
  3. Develop 1-3 candidate ideas for a visualization to polish tomorrow.

Slides

Datasets

Resources

Colors:

Data Storytelling - Resources:

Data Storytelling – Examples:

Pre-workshop Checklist

  1. Please test your Google Colab setup by following these instructions.
  2. If you have never used the Python package pandas before, you may want to watch a brief intro video such as this one.

Please email us (kmentzer@stanford.edu) if you encounter any challenges with the pre-workshop requests.

Schedule

Wednesday August 2nd - Intro and Exploratory Data Analysis

Thursday August 3rd – Data Visualization for Communication