Transit Access Analysis

01 — What This Is

Transit usage depends on whether people can realistically access the system

This project uses spatial analysis, regression, and machine learning to understand where transit is working, where access is limiting ridership, and where targeted investments would most effectively increase usage across ~2,181 census tracts in New Jersey.

Scale

2,181

Census tracts analyzed across New Jersey

Data Points

31k+

NJ Transit bus stops incorporated into accessibility metrics

02 — Key Innovation

Measuring access instead of just infrastructure

Distance to the nearest stop and the number of stops within 2 miles explain more variation in transit use than demographic variables alone. Proximity and network coverage matter more than simply having a stop in a tract.

RF Model Performance

0.58

The Random Forest model explains 58% of variation in transit share, compared to 56% using OLS — reflecting nonlinear relationships between access and ridership

Top Feature

55%

of RF feature importance from bus stop density alone — more than all other variables combined

03 — What Makes This Different

A small set of accessibility measures explains most of the system

bus_density_2mi (54.8%) and dist_to_rail (10.8%) together account for ~66% of Random Forest feature importance — computed using spatial joins and distance calculations, accounting for most of the model's predictive power. dist_to_bus drops to 3.3% once density is controlled for. Demographic variables combined account for ~27%. The analysis distinguishes between where demand exists and where access is actually provided.

Finding 01

Transit use increases where access is strong

Bus stop density within 2 miles (bus_density_2mi) accounts for 54.8% of RF feature importance — more than all other variables combined. Distance to rail is 2nd at 10.8%.

Finding 02

Areas with unmet demand are the clearest opportunities

Tracts with positive predicted-minus-actual gaps are not low-demand areas — they are access-constrained areas. The gap map surfaces the investment priority queue directly from model output.

Finding 03

There is a clear distance threshold

Beyond roughly one mile from a bus stop, transit use drops sharply even in otherwise strong demand areas. This nonlinear effect — missed by OLS — is exactly what the Random Forest captures.

Finding 04

Targeted investment reaches the highest-need communities

Scenario 1 (uniform +20% expansion) produces the largest system-wide ridership gain. Scenario 2 concentrates impact in 215 equity-priority tracts. Demographic data shows where demand exists; improving access is what changes behavior.

Predicted vs Actual Transit Share — All 2,165 Tracts

Points above the diagonal = suppressed demand. Orange-red tracts are where infrastructure is the binding constraint. These form the spatial input for Scenario 2.

Portfolio Signal

GIS + data science + policy reasoning, integrated end-to-end

This project combines spatial data engineering (GeoPandas, EPSG:3424 joins), econometric modeling (OLS, statsmodels), machine learning (Random Forest, scikit-learn), and scenario simulation — then translates all of it into maps and findings a planning audience can use. The goal is not to demonstrate every technique, but to surface the insight that drives better investment decisions.