PREDICTING BIKER DENSITY AT BIKESHARE STATION INTERSECTIONS IN SAN FRANCISCO

Bike sharing platforms are becoming increasingly common alternatives to public transportation in cities, improving accessibility to areas not reachable by bus, train, or tram. While this can be beneficial for improving city connectivity, it also increases the likelihood of biker related accidents and vehicle collisions, especially in areas where protected bike lanes and safety infrastructure are not already in place. We compare machine learning models to predict biker density at road intersections in the city of San Francisco, using publicly available trip data from the city’s most widely used bikeshare service, formerly called Ford GoBike.

Alongside our predictive models we develop a heatmap visualization application to display our predictions, providing an additional mode of interaction for users to access the forecasted information. The intended usage of our work is to predict areas of highest biker density at different times so that drivers and bikers can experience improved shared road safety. The deployment of our models can also inform city planning and alternative public transportation development.

MACHINE LEARNING MODELS

Neural networks are mathematical models that can be trained to learn highly complex, nonlinear relationships in data, and were a natural fit for the predictive regression problem we aim to solve. We built two different types of models for each bike station to learn to make predictions for the next hour of biker density based on the recorded usage statistics of the previous 24 hours. The first model we use is a multi-layered network consisting of 3 fully connected dense layers with a tapering number of units and a final activation layer on the output. We then constructed an additional model based on the sequential learning approach of neural networks using a Long Short Term Memory (LSTM) network. LSTMs are suited for time series prediction tasks as they are designed specifically to learn from ordered sequences.

EVALUATION

Once the models were trained, we evaluated their error in predicting bike density for unseen sets of data. On average, with our regression models we see a mean squared error of 0.00501 for training and 0.00939 on testing data. In most cases, the LSTM outperforms the associated regression model, with average mean squared errors of 0.00403 and 0.00899 for training and testing respectively. Results from both models prove that accurate biker density prediction is possible and quite simple to implement. Given the potential impact to reduce casualties by providing road-users with such information we hope our approach can be integrated into existing systems to better inform communities about such patterns. We developed a visualization to accompany our predictions, which further illustrate our understanding of cyclist travel patterns in the city by depicting biker density with a heatmap and the addition of tooltips related to historical accident data.

Screen Shot 2020-04-03 at 1.23.12 AM.png

Text on this page is adapted from the full paper published and presented at IEEE GHTC 2019 in Seattle, WA.

READ THE FULL PAPER

IEEE GHTC