Project Overview
01 — What This Is
Transit usage depends on whether people can realistically access the system
This project uses spatial analysis, regression, and machine learning to understand where transit is working, where access is limiting ridership, and where targeted investments would most effectively increase usage across ~2,181 census tracts in New Jersey.
Scale
Census tracts analyzed across New Jersey
Data Points
NJ Transit bus stops incorporated into accessibility metrics
02 — Key Innovation
Measuring access instead of just infrastructure
Distance to the nearest stop and the number of stops within 2 miles explain more variation in transit use than demographic variables alone. Proximity and network coverage matter more than simply having a stop in a tract.
RF Model Performance
The Random Forest model explains 58% of variation in transit share, compared to 56% using OLS — reflecting nonlinear relationships between access and ridership
Top Feature
of RF feature importance from bus stop density alone — more than all other variables combined
03 — What Makes This Different
A small set of accessibility measures explains most of the system
bus_density_2mi (54.8%) and dist_to_rail (10.8%) together account for ~66% of Random Forest feature importance — computed using spatial joins and distance calculations, accounting for most of the model's predictive power. dist_to_bus drops to 3.3% once density is controlled for. Demographic variables combined account for ~27%. The analysis distinguishes between where demand exists and where access is actually provided.
Key Findings at a Glance
Finding 01
Transit use increases where access is strong
Bus stop density within 2 miles (bus_density_2mi) accounts for 54.8% of RF feature importance — more than all other variables combined. Distance to rail is 2nd at 10.8%.
Finding 02
Areas with unmet demand are the clearest opportunities
Tracts with positive predicted-minus-actual gaps are not low-demand areas — they are access-constrained areas. The gap map surfaces the investment priority queue directly from model output.
Finding 03
There is a clear distance threshold
Beyond roughly one mile from a bus stop, transit use drops sharply even in otherwise strong demand areas. This nonlinear effect — missed by OLS — is exactly what the Random Forest captures.
Finding 04
Targeted investment reaches the highest-need communities
Scenario 1 (uniform +20% expansion) produces the largest system-wide ridership gain. Scenario 2 concentrates impact in 215 equity-priority tracts. Demographic data shows where demand exists; improving access is what changes behavior.
Model Diagnostic
What This Demonstrates
Portfolio Signal
GIS + data science + policy reasoning, integrated end-to-end
This project combines spatial data engineering (GeoPandas, EPSG:3424 joins), econometric modeling (OLS, statsmodels), machine learning (Random Forest, scikit-learn), and scenario simulation — then translates all of it into maps and findings a planning audience can use. The goal is not to demonstrate every technique, but to surface the insight that drives better investment decisions.