Model — Erma Swartz

Global CO2 emissions change map 1995-2020

feature importance analysis

from sklearn.ensemble import RandomForestRegressor

# Train RF on 25 years of country-level emissions data
rf = RandomForestRegressor(n_estimators=500, max_depth=12)
rf.fit(X_train, y_train)

# Extract and rank feature importances
importances = pd.Series(
    rf.feature_importances_, index=feature_names
).sort_values(ascending=False)

# Top predictor: energy_mix_fossil (0.31)
# R² = 0.89 on held-out test set

Machine Learning · Spring 2025

Predicting Global CO2 Emissions

Which countries are on track to blow past their emissions targets, and why? I trained Random Forest and XGBoost models on 25 years of country-level data to predict per-capita CO2 trajectories — then used feature importance analysis to identify the socioeconomic and energy-mix variables that actually drive national emissions profiles.

So what: The model revealed that energy mix composition and GDP growth rate explain more emissions variance than population or industrialization level — suggesting that energy policy, not demographic change, is the key lever.

Python scikit-learn Random Forest XGBoost pandas

Download Full Report (PDF)

0.89R² on held-out test

25years of data (1995–2020)

309countries assembled

200×RF error reduction vs. linear baseline

Feature Importance

Energy mix and GDP growth rate dominate — demographic variables like population barely register, challenging conventional assumptions about what drives emissions.

Model Validation

Actual vs. predicted CO2 emissions on held-out test data. Tight clustering along the diagonal confirms strong predictive accuracy across diverse country profiles.

Feature Correlations

Mapping how every candidate predictor co-varies with per-capita emissions — exposing redundant features and isolating the independent signals the Random Forest leans on.

Error Distribution

Residuals cluster tightly around zero with no systematic bias — the model isn't just accurate on average, it's accurate consistently, country to country.

Urban Tech · Spring 2025

Citi Bike Expansion in Upper Manhattan

Where should NYC add Citi Bike capacity in upper Manhattan to maximize ridership and climate co-benefits? Working from a full year of 2023 Citi Bike trip records (12 GB across 12 months), I scored every existing station in Inwood, Washington Heights, and the Upper West Side — then ran a pairwise optimization that ranks candidate expansions by demand and by CO₂ and PM2.5 avoided per trip diverted from cars.

So what: A handful of nodes — Broadway & W 185 St, Seaman Ave & Isham St, and Dyckman St & Staff St — anchor most of the top 20 recommended pairs. Together those expansions project ≈100,000 new bike-miles, 57 kg of CO₂ avoided, and 4 kg of PM2.5 emissions prevented annually — concentrated in the neighborhoods where the new capacity would land.

Python pandas Optimization Climate Co-Benefits

Seasonal Ridership

Trip volume peaks in summer and drops sharply in winter — a key variable for station capacity planning and rebalancing logistics.

Trip Duration

Average trip length increases in warm months as riders take longer recreational routes, while commute-pattern trips stay consistent year-round.

Distance Traveled

Distance patterns mirror duration trends — longer rides in summer suggest stations in parks and waterfront areas see disproportionate seasonal demand.

Urban Tech · Spring 2025

Air Quality & PM2.5 Reduction Scenarios

What if 20% of subway commuters switched to micromobility? I modeled the air quality and climate co-benefits of mode-shift scenarios, estimating PM2.5 exposure avoided and CO2 reductions per trip.

So what: Even a modest 20% shift produces measurable health benefits — equivalent to removing thousands of car-trips from the most polluted corridors. The findings support micromobility subsidy arguments.

Python Scenario Modeling Environmental Data

Transit Ridership Model

Interactive regression scatter plot

NJTPA · 2025–2026

Transit Ridership Prediction

Can you predict how many people will ride transit in a given neighborhood based on how the system is built around them? I built OLS and Random Forest models across 2,181 NJ census tracts, engineering spatial features from 31,000+ bus stops and 165 rail stations.

So what: Bus stop density alone accounts for 55% of the Random Forest's predictive power — more than all demographic variables combined. Access, not demographics, drives ridership.

Python statsmodels scikit-learn GeoPandas

spatial feature engineering

# Engineer transit accessibility features per tract
for tract in gdf.geometry:
    # Count bus stops within 2-mile buffer
    buffer = tract.buffer(3218)  # meters
    stops_nearby = bus_stops[bus_stops.within(buffer)]
    gdf.loc[idx, 'bus_density_2mi'] = len(stops_nearby)

    # Distance to nearest rail station
    gdf.loc[idx, 'rail_dist_m'] = tract.distance(
        rail_stations.unary_union
    )

Embodied carbon scenario impact modeling

Thesis · 2025–2026

Embodied Carbon Scenario Modeling

How much carbon could NYC save if it mandated low-carbon concrete in all new construction? I built scenario models estimating embodied carbon reductions under four policy pathways — procurement reform, subsidies, regulation, and a hybrid approach.

So what: The hybrid scenario projects 57 million tCO₂e in savings over 20 years and reaches market viability in 5 years. Procurement reform alone saves 19M — doing nothing costs us all of it.

Read the full thesis deep-dive →

Python Random Forest OLS Regression Scenario Modeling

What Drives Carbon

Floor area and structural system explain the most variance in embodied carbon — meaning design-phase choices matter more than construction-phase efficiency.

Waste Scenarios

Aggressive construction waste recycling could reduce embodied carbon by 8-15%, with concrete and steel diversion yielding the greatest gains.

Emissions Avoided

Each policy lever contributes a wedge of cumulative savings — stacked together, they show the full potential of a combined circular economy strategy.

Predict & Simulate