Findings
Transit use increases where access is strong
The most predictive variables are not raw infrastructure counts, but how dense and how close the network is — particularly the rail network. bus_density_2mi accounts for 54.8% of Random Forest feature importance — more than all other variables combined. dist_to_rail is 2nd at 10.8%. Notably, dist_to_bus falls to just 3.3% once density is in the model — the density variable already captures bus proximity. A tract with many stops that are far away underperforms a tract with fewer but walkable stops.
Areas with unmet demand are the clearest opportunities for investment
By comparing predicted transit share (from the Random Forest model) to actual observed ridership, the analysis surfaces tracts with significant positive residuals — places where the model expects high usage but observes low usage. These are not low-demand areas; they are infrastructure-constrained areas. They cluster in middle-ring suburbs with fragmented bus networks — particularly in Bergen, Morris, Somerset, and Middlesex counties — where demand exists but the network has not kept up. The gap map is the most actionable output of this project.
There is a clear distance threshold
The relationship between distance and ridership is not linear. Beyond roughly one mile from the nearest bus stop, transit use drops sharply — even in areas with strong underlying demand. Tracts past the 1-mile mark show near-zero ridership regardless of other favorable conditions. Closing access gaps in the 0.5–1.5 mile range produces far greater ridership returns than expanding service in already well-served areas.
Supporting visual — Finding 03
Distance to nearest bus stop (miles) vs transit commute share. Color = bus stop density within 2 miles. Ridership collapses past the 1-mile mark.
Demographics indicate need — infrastructure determines whether it can be met
Demographic variables combined account for ~27% of Random Forest feature importance — meaningful but secondary to infrastructure (~73%). pct_hispanic (7.8%) and pct_foreign_born (6.1%) are the most influential demographic predictors, reflecting both transit dependency and residential proximity to transit corridors. pct_black is not statistically significant in OLS (p = 0.45), suggesting Black-majority tracts are not uniformly over- or under-served relative to their infrastructure profile. Demographic data shows where demand exists; improving access is what changes behavior.
Targeted investment produces stronger results than system-wide expansion
The uniform +20% expansion (Scenario 1) produces the largest system-wide average gain — +0.61 percentage points across all 2,165 tracts, with 1,110 tracts showing improvement. Scenario 2 (targeted bus) concentrates investment in 215 high-need tracts with low bus density and high demand scores — producing +0.15 pp average in those tracts but with equity-focused reach. Scenario 3 (targeted rail) shows mixed results: areas already well-served by transit show limited additional responsiveness to rail improvements. When total ridership gain is the goal, uniform expansion produces more. When equity is the goal, targeted investment reaches the communities with the greatest unmet need.
Summary
The Central Argument
Where transit is close and dense, people use it. Where it's sparse, they don't — not because they don't want to, but because it isn't there.
Bus stop density within 2 miles (54.8% of RF feature importance) and distance to the nearest stop explain far more variance in ridership than income, race, or age — individually or combined. Demographic maps reveal who needs transit; the gap map reveals where to build. Prioritize the gap zones — the areas where infrastructure is the binding constraint, not the absence of demand.
Planning Application
Decision Support
This analysis supports corridor screening, investment prioritization, and performance-based planning.
It identifies where access constraints suppress demand and provides a data-driven way to target transit improvements where they will have the greatest impact on ridership. The gap map is the key output: it translates model predictions into a spatial priority queue that can inform service planning, grant applications, and long-range network design.