[Tutorial Kit] Using RStudio to Analyze and Visualize Air Quality

Shamini V De Silva
9 minutes ago
8 min read

Video Chapters

Recorded live Feb 24 2026, timestamps below

00:00 Intro

01:06 Bar Chart Overview

02:47 PM 2.5 What is it?

03:55 Step-by-Step

04:00 Step 1. Data Files

06:19 Step 2. Posit Cloud

07:05 Step 3. Install Packages

09:39 Step 4. R code

16:38 Step 5. Bar Chart

🎯 Data Challenge

Create a bar chart in RStudio on air pollution for your state that compares three counties with the highest air pollution levels to the primary (health-based) annual standard set by U.S. Environmental Protection Agency (EPA).

Example bar chart created in Posit Cloud

Steps

Collect data on fine particulate matter (PM2.5), identify 3 counties in your state with the highest pollution concentrations, calculate how many people in your state are exposed to high levels of PM2.5, and create a bar chart summarizing results.

3 Learning Objectives

Query & Collect 📊 data on county-level air pollution in your state.
Analyze data using the data tool: 🧰 🛠️ RStudio Online (PositCloud).
Visualize & Humanize data by creating a bar chart exploring how local air pollution concentrations compare to the EPA primary (health-based) standards.

Prerequisites. Beginner-friendly, some knowledge required:

R packages and functions
assigning a variable in R

Keywords and core concepts also covered:

🧰 🛠️ R programming basics: Installing packages, assigning a variable, using functions, and creating a bar chart
How do we measure air pollution and why does it matter to health?
Defining the EPA primary (health-based) standards for air pollution.

What you'll need to complete this data challenge

⏰ Time: 30-45 minutes
🧰 Tools:
- An account in Posit Cloud (a tool for running a cloud version of RStudio - no R software installation needed)
📊 Data:
- Indicator: PM2.5: Highest Annual Average Concentration (Monitor + Modeled Data), 2020, by County.
  - Data Source: National Environmental Public Health Tracking Network, Data Explorer tool. https://ephtracking.cdc.gov/DataExplorer/
- 📁 Files
  - Please download the zipped folder 'tutorial-kit-R-air-pollution' containing the files and data mentioned above.
    - air_pollution_and_health.R - sample R Code
    - pm_2020.csv - PM2.5 data in a CSV (.csv) file
    - population_data_2020.csv - Number of people living in a region
    - fips_counties_2021.txt - County FIPS code geographic identifiers and and county names, a Text (.txt) file

DOWNLOAD ZIP FILE

click the folder below to download all files

Key Terms and Definitions

PM_2.5(or PM2.5) is fine particulate matter that is 2.5 micrometers (i.e. microns, µm) in diameter or smaller.
PM2.5 concentration (µg/m³) is measured as particle weight (micrograms, µg) for every cubic meter of air (m³).
The primary (health-based) standard for PM2.5. The Environmental Protection Agency's (EPA's) annual National Ambient Air Quality Standard (NAAQS) for fine particulate matter (PM2.5) is 9.0 µg/m³. Above 9.0 µg/m³ is considered harmful to health (EPA, 2025).

Step-by-Step Walkthrough

Overview

The steps below will guide you to analyze air pollution data from the National Environmental Public Health Tracking Network and create a bar chart showing county-level PM2.5 levels. You’ll also calculate key metrics, such as the proportion of people in a state exposed to pollution high enough to harm health, and include this information in a dynamic caption below the bar chart.

🔍 Data Hunt: Finding Data

Air Pollution Data

How do you find PM2.5 data from the National Environmental Public Health Tracking Network?

Navigate to the National Environmental Public Health Tracking Network Data Explorer Tool

In the Query Panel, select:
- STEP 1: CONTENT
  - Content Area: Air Quality
  - Indicator: Current and Historical Air Quality
  - Measure: PM2.5 (highest annual average concentration, monitored + modeled)
- STEP 2: GEOGRAPHY TYPE: National by county
- STEP 3: GEOGRAPHY: All Counties
- STEP 4: TIME: 2020
- STEP 5: ADVANCED OPTIONS: No Advanced Options
- Click Button: GO
Navigate to table view (button in upper right corner) and EXPORT data

Start in Query Panel to obtain PM2.5 data from the National Environmental Public Health Tracking Network, Data Explorer Tool

Arrows indicate how to export data as CSV files

Population Data

There are many sources of population data published from the U.S. Census Bureau, we are going to use the indicators shared in the National Environmental Public Health Tracking Network Data Explorer Tool.

While still in the Explorer tool, open the Query Panel:
- click: SELECT DATA (button upper left corner)
In Query Panel:
- STEP 1: CONTENT
  - Content Area: Demographic & Socioeconomics
  - Indicator: Demographics
  - Measure: Number of People by Demographic Group
- STEP 2: GEOGRAPHY TYPE: National by county
- STEP 3: GEOGRAPHY: All Counties
- STEP 4: TIME: 2020
- STEP 5: ADVANCED OPTIONS: No advanced options selected
- Click Button: GO

Query used to obtain population number data for all U.S. counties from the National Environmental Public Health Tracking Network, Data Explorer Tool

🧰 🛠️ Data Tool

Once you have the data it is time to analyze the data using the data tool.

Step 1. 📁 Data Files

Get the data files by downloading the ZIP file below.

DOWNLOAD ZIP FILE

click the folder below to download all files

Step 2: ☁️ Posit Cloud

Set up Posit Cloud and create a new RStudio Project

Instead of installing R and RStudio locally, this tutorial uses Posit Cloud, a browser-based environment that runs RStudio online. We are using the Posit Cloud service to ensure that everyone is working in the same R environment.

The template code has been designed and tested specifically in the Posit Cloud environment (using R version 4.5.3 and tidyverse version 2.0.0). If you choose to run the code in a local RStudio setup, some parts may not work as expected and may require modification.

To get started:

Go to posit.cloud
Click 'Sign Up' in the top-right corner
Choose the free version for this project → Click 'Learn more'
Then click ‘Sign Up’ and create an account using email or services like Google or GitHub.
Once your workspace loads, click 'New Project' on the right-hand side of the screen.
Select 'New RStudio Project' to start a new project. This will also open the RStudio interface within your browser.

Step 3: 📦 Install Packages

Open the R Script, install, and load the required packages and data files

In the Files pane, click the .R script file ('air_pollution_and_health.R') to open it. The code script contains the code used to generate the bar graph and will be displayed in the Script Editor.

The script also includes a header at the beginning that explains:

the purpose of the code,
how to use the code,
how we accessed the data, and
copyright and attribution information (please provide appropriate credit if you plan to adapt and share the code).

Highlight and run (Ctrl + Enter or click ‘Run’ in the top-right corner of the Script pane) the lines of code below the header to install the tidyverse and scales packages (you only need to do this once).

The tidyverse package is a collection of R packages for data cleaning (or wrangling), analysis, and visualization. The scales package contains scaling functions that can make plots easier to read, for instance, by turning raw numbers into readable formats (6,500,000 → 6.5 million). The installation process will run in the Console pane and may take a few minutes. Once complete, run the next two lines containing the library( ) function to load these packages into the R environment.

To learn more about some of the basics of R programming, please watch the step-by-step walkthrough in the video above (timestamp - 03:55 Step-by-Step)

About the Interface

The RStudio interface contains several panes:

Script Editor (upper left): where you write and edit code
Console (bottom left): where commands are executed, and outputs/errors are displayed
Environment (upper right): shows variables and datasets loaded in the session
Files/Plots/Packages (bottom right): used to upload files, view plots, and access documentation

Because Posit Cloud runs online, any files you want to use must be uploaded manually.

Upload Files

After downloading the zipped folder containing the necessary files and data, upload the files to PositCloud.

To upload:

Navigate to the 'Files' tab in the bottom right pane.
Locate the ‘Upload’ button → Click on ‘Browse’ → Select and upload the four files in the folder (.R file, .csv files, and .txt file). Leave the target directory at the default location “/cloud/project/”.
After uploading, you should see the files listed in the Files pane.

If needed, please watch the video above for a demonstration of these steps (timestamp - 03:55 Step-by-Step).

Step 4: 👩🏽‍💻 R Code

Edit and run the code to calculate key metrics

Specify your state and check file name variables

Locate and update the section where you define:

your state of interest and assign to the variable your_state (e.g., "Washington" or "Louisiana")
file names for your datasets - update these if needed so they match the uploaded files in the Files pane (see screenshot below)

Run code

Start by running the code that imports:

PM2.5 data (county-level pollution) - pm_2020.csv
Population data - population_data_2020.csv

These will appear as data frames (‘pm_2020.csv’ and ‘population_data_2020.csv’) in the Environment pane.

Run the next section of code to calculate the percent of the population in the specified state exposed to PM2.5 levels greater than the EPA standard (9 µg/m³) and to store it in a dataframe assigned to ‘percent_pop_high_pollution’.

Then run the following section to find the national percentage for comparison. This will be summarized in a dataframe assigned to ‘percent_pop_high_pollution_US’.

These metrics will later be highlighted in the bar chart caption.

Step 5: 📊 Bar Chart

Run the code to create the bar graph

Start by preparing the data for visualization. This includes merging datasets using FIPS codes contained in the ‘fips_counties_2021.txt’ file, adding the county names, and filtering for counties with the highest pollution levels in your selected state.

The subsequent lines of code will build the visualization using ggplot2, which works in layers:

Base layer initializes the plot
Additional layers add:
- Bars (PM2.5 levels by county)
- Labels and formatting
- A vertical line marking the EPA standard (9.0 µg/m³)
- A dynamic caption with your calculated metrics

Run all plotting code to generate the final chart, which you can preview in the bottom right pane under the ‘Plots’ tab.

Save and Export the Graph

Run the ggsave( ) function to save the chart as a .png file. This should appear under the ‘Files’ pane.

Screenshot showing PNG of bar chart in Files pane in Posit Cloud

Click on the .png file in the Files pane to open the image in your browser → right-click on the image to save to your computer.

Final Output

Your completed visualization will show:

Three counties with the highest PM2.5 levels
The EPA primary (health-based) threshold
A caption summarizing population exposure and health context

Example

For example, in Washington in 2020 (please see the example graph below):

Benton County, Skamania County, and Grant County were areas with PM2.5 levels greater than the EPA threshold of 9.0 µg/m³
About 7 million people (87.9%) in Washington live in areas where air pollution is high, which is also greater than the U.S. average of 38%

You've Earned a Certificate!
BroadStreet Certificate (FREE)	Submit Infographic for a Certificate
CPH - Certified in Public Health Recertification Credits (1 credit hr) ($10)	Submission in progress to National Board of Public Health Examiners (NBPHE)

Note: We review projects every 2-4 weeks, and typically at the end of the month.

Instructors

Teresa Tse, MS

Public Health Data Analyst

Teresa Tse uses R every week to support the data and epidemiology teams of a metropolitan public health department. With a background in biomedical engineering, Teresa has a passion for using programming, research, and data analysis skills to help improve health outcomes. Teresa is a long-time contributor to BroadStreet Institute as a training program manager on the Maternal and Infant Health Track.

Shamini De Silva, BSc, Tutorial Instructor

Shamini V De Silva

Program Planner, BroadStreet Institute

Shamini is a BroadStreet Program Planner and aspiring researcher learning RStudio. With a background in Biomedical Science and experience working in Clinical Research, Shamini has realized the potential and impact of high-quality data and the growing demand for data handling and analysis skills. As Shamini learns R, she is sharing that learning with others.

INSTITUTE

[Tutorial Kit] Using RStudio to Analyze and Visualize Air Quality

Video Chapters

🎯 Data Challenge

Steps

3 Learning Objectives

What you'll need to complete this data challenge

DOWNLOAD ZIP FILE

Key Terms and Definitions

Step-by-Step Walkthrough

Overview

🔍 Data Hunt: Finding Data

Air Pollution Data

How do you find PM2.5 data from the National Environmental Public Health Tracking Network?

Population Data

🧰 🛠️ Data Tool

Step 1. 📁 Data Files

DOWNLOAD ZIP FILE

Step 2: ☁️ Posit Cloud

Set up Posit Cloud and create a new RStudio Project

Step 3: 📦 Install Packages

Open the R Script, install, and load the required packages and data files

About the Interface

Upload Files

Step 4: 👩🏽‍💻 R Code

Edit and run the code to calculate key metrics

Specify your state and check file name variables

Run code

Step 5: 📊 Bar Chart

Run the code to create the bar graph

Save and Export the Graph

Final Output

Example

You've Earned a Certificate!

Instructors

Related Posts