top of page

[Tutorial Kit] Using RStudio to Analyze and Visualize Air Quality

  • Writer: Shamini V De Silva
    Shamini V De Silva
  • 9 minutes ago
  • 8 min read



Video Chapters

Recorded live Feb 24 2026, timestamps below

00:00 Intro

01:06 Bar Chart Overview

02:47 PM 2.5 What is it?

03:55 Step-by-Step

04:00 Step 1. Data Files

06:19 Step 2. Posit Cloud

07:05 Step 3. Install Packages

09:39 Step 4. R code

16:38 Step 5. Bar Chart



🎯 Data Challenge


Create a bar chart in RStudio on air pollution for your state that compares three counties with the highest air pollution levels to the primary (health-based) annual standard set by U.S. Environmental Protection Agency (EPA).


Example bar chart created in Posit Cloud
Example bar chart created in Posit Cloud

Steps

Collect data on fine particulate matter (PM2.5), identify 3 counties in your state with the highest pollution concentrations, calculate how many people in your state are exposed to high levels of PM2.5, and create a bar chart summarizing results.


3 Learning Objectives


  1. Query & Collect 📊 data on county-level air pollution in your state.

  2. Analyze data using the data tool: 🧰 🛠️ RStudio Online (PositCloud)

  3. Visualize & Humanize data by creating a bar chart exploring how local air pollution concentrations compare to the EPA primary (health-based) standards.


Prerequisites. Beginner-friendly, some knowledge required:

  • R packages and functions

  • assigning a variable in R


Keywords and core concepts also covered:

  • 🧰 🛠️ R programming basics: Installing packages, assigning a variable, using functions, and creating a bar chart

  • How do we measure air pollution and why does it matter to health?

  • Defining the EPA primary (health-based) standards for air pollution.


What you'll need to complete this data challenge


  • ⏰ Time: 30-45 minutes

  • 🧰 Tools:

    • An account in Posit Cloud (a tool for running a cloud version of RStudio - no R software installation needed)

  • 📊 Data:

    • Indicator: PM2.5: Highest Annual Average Concentration (Monitor + Modeled Data), 2020, by County.

    • 📁 Files

      • Please download the zipped folder 'tutorial-kit-R-air-pollution' containing the files and data mentioned above.

        • air_pollution_and_health.R - sample R Code

        • pm_2020.csv - PM2.5 data in a CSV (.csv) file

        • population_data_2020.csv - Number of people living in a region

        • fips_counties_2021.txt - County FIPS code geographic identifiers and and county names, a Text (.txt) file


DOWNLOAD ZIP FILE

click the folder below to download all files





Key Terms and Definitions

  • PM2.5 (or PM2.5) is fine particulate matter that is 2.5 micrometers (i.e. microns, µm) in diameter or smaller.

  • PM2.5 concentration (µg/m³) is measured as particle weight (micrograms, µg) for every cubic meter of air (m³).

  • The primary (health-based) standard for PM2.5. The Environmental Protection Agency's (EPA's) annual National Ambient Air Quality Standard (NAAQS) for fine particulate matter (PM2.5) is 9.0 µg/m³. Above 9.0 µg/m³ is considered harmful to health (EPA, 2025).


Step-by-Step Walkthrough


Overview


The steps below will guide you to analyze air pollution data from the National Environmental Public Health Tracking Network and create a bar chart showing county-level PM2.5 levels. You’ll also calculate key metrics, such as the proportion of people in a state exposed to pollution high enough to harm health, and include this information in a dynamic caption below the bar chart.


🔍 Data Hunt: Finding Data


Air Pollution Data

How do you find PM2.5 data from the National Environmental Public Health Tracking Network?


  • In the Query Panel, select:

    • STEP 1: CONTENT

      • Content Area: Air Quality

      • Indicator: Current and Historical Air Quality

      • Measure: PM2.5 (highest annual average concentration, monitored + modeled)

    • STEP 2: GEOGRAPHY TYPE: National by county

    • STEP 3: GEOGRAPHY: All Counties

    • STEP 4: TIME: 2020

    • STEP 5: ADVANCED OPTIONS: No Advanced Options

    • Click Button: GO

  • Navigate to table view (button in upper right corner) and EXPORT data



Start in Query Panel  to obtain PM2.5 data from the National Environmental Public Health Tracking Network, Data Explorer Tool
Start in Query Panel to obtain PM2.5 data from the National Environmental Public Health Tracking Network, Data Explorer Tool


Arrows indicate how to export data as CSV files
Arrows indicate how to export data as CSV files


Population Data


There are many sources of population data published from the U.S. Census Bureau, we are going to use the indicators shared in the National Environmental Public Health Tracking Network Data Explorer Tool.

  • While still in the Explorer tool, open the Query Panel:

    • click: SELECT DATA (button upper left corner)

  • In Query Panel:

    • STEP 1: CONTENT

      • Content Area: Demographic & Socioeconomics

      • Indicator: Demographics

      • Measure: Number of People by Demographic Group

    • STEP 2: GEOGRAPHY TYPE: National by county

    • STEP 3: GEOGRAPHY: All Counties

    • STEP 4: TIME: 2020

    • STEP 5: ADVANCED OPTIONS: No advanced options selected

    • Click Button: GO

Query used to obtain population number data for all U.S. counties from the National Environmental Public Health Tracking Network, Data Explorer Tool
Query used to obtain population number data for all U.S. counties from the National Environmental Public Health Tracking Network, Data Explorer Tool


🧰 🛠️ Data Tool

Once you have the data it is time to analyze the data using the data tool.



Step 1. 📁 Data Files

Get the data files by downloading the ZIP file below.


DOWNLOAD ZIP FILE

click the folder below to download all files





Step 2: ☁️ Posit Cloud

Set up Posit Cloud and create a new RStudio Project


Instead of installing R and RStudio locally, this tutorial uses Posit Cloud, a browser-based environment that runs RStudio online. We are using the Posit Cloud service to ensure that everyone is working in the same R environment.


The template code has been designed and tested specifically in the Posit Cloud environment (using R version 4.5.3 and tidyverse version 2.0.0). If you choose to run the code in a local RStudio setup, some parts may not work as expected and may require modification.


To get started:

  • Go to posit.cloud

  • Click 'Sign Up' in the top-right corner

  • Choose the free version for this project → Click 'Learn more'

  • Then click ‘Sign Up’ and create an account using email or services like Google or GitHub.

  • Once your workspace loads, click 'New Project' on the right-hand side of the screen.

  • Select 'New RStudio Project' to start a new project. This will also open the RStudio interface within your browser.




Step 3: 📦 Install Packages

Open the R Script, install, and load the required packages and data files


In the Files pane, click the .R script file ('air_pollution_and_health.R') to open it. The code script contains the code used to generate the bar graph and will be displayed in the Script Editor.


The script also includes a header at the beginning that explains:

  • the purpose of the code,

  • how to use the code,

  • how we accessed the data, and

  • copyright and attribution information (please provide appropriate credit if you plan to adapt and share the code).


Highlight and run (Ctrl + Enter or click ‘Run’ in the top-right corner of the Script pane) the lines of code below the header to install the tidyverse and scales packages (you only need to do this once).


The tidyverse package is a collection of R packages for data cleaning (or wrangling), analysis, and visualization. The scales package contains scaling functions that can make plots easier to read, for instance, by turning raw numbers into readable formats (6,500,000 → 6.5 million). The installation process will run in the Console pane and may take a few minutes. Once complete, run the next two lines containing the library( ) function to load these packages into the R environment.


To learn more about some of the basics of R programming, please watch the step-by-step walkthrough in the video above (timestamp - 03:55 Step-by-Step)



About the Interface


The RStudio interface contains several panes:

  • Script Editor (upper left): where you write and edit code

  • Console (bottom left): where commands are executed, and outputs/errors are displayed

  • Environment (upper right): shows variables and datasets loaded in the session

  • Files/Plots/Packages (bottom right): used to upload files, view plots, and access documentation



Because Posit Cloud runs online, any files you want to use must be uploaded manually.


Upload Files


After downloading the zipped folder containing the necessary files and data, upload the files to PositCloud.


To upload:

  • Navigate to the 'Files' tab in the bottom right pane.

  • Locate the ‘Upload’ button → Click on ‘Browse’ → Select and upload the four files in the folder (.R file, .csv files, and .txt file). Leave the target directory at the default location “/cloud/project/”.

  • After uploading, you should see the files listed in the Files pane.


If needed, please watch the video above for a demonstration of these steps (timestamp - 03:55 Step-by-Step).




Step 4: 👩🏽‍💻 R Code

Edit and run the code to calculate key metrics



Specify your state and check file name variables

Locate and update the section where you define:

  • your state of interest and assign to the variable your_state (e.g., "Washington" or "Louisiana")

  • file names for your datasets - update these if needed so they match the uploaded files in the Files pane (see screenshot below)



Run code

Start by running the code that imports:

  • PM2.5 data (county-level pollution) - pm_2020.csv

  • Population data - population_data_2020.csv


These will appear as data frames (‘pm_2020.csv’ and ‘population_data_2020.csv’) in the Environment pane.


Run the next section of code to calculate the percent of the population in the specified state exposed to PM2.5 levels greater than the EPA standard (9 µg/m³) and to store it in a dataframe assigned to ‘percent_pop_high_pollution’.


Then run the following section to find the national percentage for comparison. This will be summarized in a dataframe assigned to ‘percent_pop_high_pollution_US’.


These metrics will later be highlighted in the bar chart caption.


Step 5: 📊 Bar Chart

Run the code to create the bar graph


Start by preparing the data for visualization. This includes merging datasets using FIPS codes contained in the ‘fips_counties_2021.txt’ file, adding the county names, and filtering for counties with the highest pollution levels in your selected state.


The subsequent lines of code will build the visualization using ggplot2, which works in layers:

  • Base layer initializes the plot

  • Additional layers add:

    • Bars (PM2.5 levels by county)

    • Labels and formatting

    • A vertical line marking the EPA standard (9.0 µg/m³)

    • A dynamic caption with your calculated metrics


Run all plotting code to generate the final chart, which you can preview in the bottom right pane under the ‘Plots’ tab.


Save and Export the Graph


Run the ggsave( ) function to save the chart as a .png file. This should appear under the ‘Files’ pane.


Screenshot showing PNG of bar chart in Files pane in Posit Cloud
Screenshot showing PNG of bar chart in Files pane in Posit Cloud

Click on the .png file in the Files pane to open the image in your browser → right-click on the image to save to your computer.



Final Output


Your completed visualization will show:

  • Three counties with the highest PM2.5 levels

  • The EPA primary (health-based) threshold

  • A caption summarizing population exposure and health context


Example

For example, in Washington in 2020 (please see the example graph below):

  • Benton County, Skamania County, and Grant County were areas with PM2.5 levels greater than the EPA threshold of 9.0 µg/m³

  • About 7 million people (87.9%) in Washington live in areas where air pollution is high, which is also greater than the U.S. average of 38%


Example bar chart created in Posit Cloud
Example bar chart created in Posit Cloud



You've Earned a Certificate!


BroadStreet Certificate (FREE)


CPH - Certified in Public Health Recertification Credits (1 credit hr) ($10)

Submission in progress to National Board of Public Health Examiners (NBPHE)

Note: We review projects every 2-4 weeks, and typically at the end of the month.




Instructors


Teresa Tse, MS, Tutorial Instructor
Teresa Tse, MS, Tutorial Instructor

Teresa Tse, MS

Public Health Data Analyst


Teresa Tse uses R every week to support the data and epidemiology teams of a metropolitan public health department. With a background in biomedical engineering, Teresa has a passion for using programming, research, and data analysis skills to help improve health outcomes. Teresa is a long-time contributor to BroadStreet Institute as a training program manager on the Maternal and Infant Health Track.


Shamini De Silva, BSc, Tutorial Instructor
Shamini De Silva, BSc, Tutorial Instructor

Shamini V De Silva

Program Planner, BroadStreet Institute


Shamini is a BroadStreet Program Planner and aspiring researcher learning RStudio. With a background in Biomedical Science and experience working in Clinical Research, Shamini has realized the potential and impact of high-quality data and the growing demand for data handling and analysis skills. As Shamini learns R, she is sharing that learning with others.



 
 

The BroadStreet Institute is a registered 501(c)3. We are a growing community of volunteer data enthusiasts who believe that good data can heal the world. Volunteer teams manage the Community Health Impact Fund. All are welcome to give, volunteer, and to join our community.

We believe that good data can heal the world. Our vision is a healthy world for all. Our mission is to empower the next generation of leaders in community health through training in tools and skills for data-guided decision-making.

 

We are proud to have The BroadStreet Data Co-op as a major partner and sponsor. Website made with 🤍 in Milwaukee.

© 2025 by BroadStreet Institute

SITE CONTENT:

We send information on big events and updates. We respect your inbox and your privacy. This email will only be used to share our Newsletter.

CONNECT WITH US!

  • Instagram
  • LinkedIn
  • TikTok
  • Youtube
  • bluesky
  • Threads
  • X
  • Reddit
  • Facebook
  • GitHub
bottom of page