Urban link travel time estimation using large-scale taxi data with partial information
Introduction
Accurate estimation and prediction of urban link travel times are important for improving urban traffic operations and identifying key bottlenecks in the traffic network. They can also benefit users by providing accurate travel time information, thereby allowing better route choice in the network and minimizing overall trip travel time. However, to accurately assess link travel times, it is important to have good real-time information from either in-road sensors such as loop detectors, microwave sensors, or roadside cameras, or mobile sensors (e.g. floating cars) or Global Positioning System (GPS) devices (e.g. cell phones). However, in most of these cases, only limited information is available related to speed or location, hence, one has to develop appropriate methodologies to accurately estimate the performance metric of interest at the link, path or network level.
In the last few years, there has been a growing trend of implementing GPS installed taxicabs in urban areas. While GPS-equipped taxicabs have many advantages, including the ability to locate taxis and track lost packages, they also serve as useful real-time probes in the traffic network. Taxis equipped with GPS units provide a significant amount of data over days and months thereby providing a rich source of data for estimating network wide performance metrics. However, currently there are limited methodologies making use of this new source of data to estimate link or path travel times in the urban network. Within this context, this paper proposes a new method for estimating hourly urban link travel times using large-scale taxicab data with partial information. The taxicab data used in this research provides limited trip information, which only contains the origin and destination location coordinates, travel time and distance of a trip. However, the extensive amount of data records compensates for the incompleteness of the data and makes the link travel time estimation possible. A novel algorithm for estimating the link travel times will be presented and tested in this paper using a test network in New York City.
Previous research on urban link travel time estimation and prediction has largely relied on various data sources, including: loop detectors (Coifman, 2002, Zhang and Rice, 2003, Oh et al., 2003, Wu et al., 2004), automated vehicle identification (AVI) (Park and RILETT, 1998, Li and Rose, 2011, Sherali et al., 2006), video camera, Remote Traffic Microwave Sensors (RTMS) (Yeon et al., 2008), and automated number plate recognition (Hasan et al., 2011). All of these data collection methods require installing corresponding sensors to retrieve data. Therefore a large number of sensors are required to achieve a reasonable accuracy level based on these data sources. The cost of installing and maintaining such a large number of sensors is prohibitive. Hence predicting link travel times with reasonable accuracy and network coverage based on sensor data could be expensive.
On the other hand, there is a significant potential to use emerging large-scale data sources to estimate dynamic demand and dynamic network conditions in urban areas. For instance, GPS devices in dedicated fleets of vehicles or in users’ mobile phones can be viable sources of data for monitoring traffic in large cities (Herrera et al., 2010). Industry models, such as Inrix,1 have also gained popularity in recent years where private entities install, collect, utilize and sell “large-scale” historical traffic data from GPS-equipped vehicles or mobile phones. With an increasing amount of GPS data available from taxi, transit, and mobile phones, a new option of using such large-scale decentralized data for link travel time estimation becomes realistic. Herring et al. (2010) used GPS traces data from a fleet of 500 taxis in San Francisco, CA. to estimate and predict traffic conditions. However, in this work, instead of link travel times, discrete traffic states were predicted. Zheng and Van Zuylen (in press) also proposed an ANN model to estimate urban link travel times based on sparse probe vehicle data (e.g., GPS traces from GPS-equipped vehicles or smartphones). Hunter et al. (2009) proposed a statistical approach for path and travel time inference using GPS probe vehicle trajectory data. The GPS data used in their study has been recorded each minute, where the inferred path consists of at most five link segments. This method is not applicable if the GPS data has a longer recording interval or only has the starting and ending coordinates. Estimating link travel times from GPS data provides a much cheaper and a larger coverage area in the urban network compared with approaches using fixed sensor data. However, all of the above mentioned approaches are only applicable for GPS trace data, in which the trajectories of vehicles are available. To the best of our knowledge, there is no study found in literature that used OD level GPS data for urban link travel time estimation, even though extensive amount of such less detailed data (e.g. taxicab data) is generated and recorded every day.
In New York City, GPS devices are installed in each taxicab. The taxicab data is collected and archived by the New York City Taxi and Limousine Commission (NYTLC), an agency that is responsible for all taxi related issues in New York City. The New York City has the largest market for taxis in North America with 12,779 (in 2006) yellow medallion taxicabs serving about 240 million passengers a year. The taxi service transports 25% of all fare-paying bus, subway, taxi and for-hire vehicle passengers that are traveling within Manhattan (Schaller Consulting, 2006, King et al., 2012).
In this paper, data collected from New York City taxicabs is used to estimate the link travel times. The dataset provides an extensive amount of taxi trip data, which records the trip starting and ending geo-location, along with information about trip distance, time and fare. Unlike the detailed GPS trajectory data used in previous studies, the dataset only provides the trip origin and destination information (i.e. starting, ending location and time) without the exact trajectory of the taxicab; only path travel time and distance are known. However, the advantage of the massive amount of data (the number of observations recorded within a day range between 450,000 and 550,000) makes it possible to infer the possible routes that the taxicab is taking and further, to estimate the link travel times in the New York City network. There is potential bias associated with measuring network link travel times from taxis, as taxi drivers are just one particular group of all drivers in the network. However, given the high penetration rate of taxicabs, it is reasonable to assume that taxis are good probe vehicles and therefore taxi travel times are a good representation of the actual network condition.
In this research we propose a methodology to estimate urban link travel times based on taxi GPS data that includes only the information about the origin and destination of the trip and total travel time to reach the destination. The goal of this study is to show the potential of using taxicab data as a complimentary data source in urban transportation operation and management. The link travel times estimated from taxicabs provide an hourly aggregate measure of the urban network condition, which can be fused with the information from other existing data sources such as fixed sensors in the future.
The paper is organized as follows: the next section describes the methodological approach developed in the paper to estimate link travel times; the subsequent sections present the test data and network, and the model results respectively. The final section presents the concluding remarks.
Section snippets
Methodology
This section presents the proposed link travel time estimation model. We treat the path taken by a taxi as latent and derive the expected path travel time as a summation of each of the probable path travel time multiplied by the probability of taking that particular path. Link travel time estimation problem then becomes estimating the link travel times that minimize the least square error between the observed and expected path travel times. An MNL model is embedded to compute the probability
Testing data and network
The data used in this research was collected by New York City Taxi and Limousine Commission on a trip by trip basis. The data records each trip origin and destination GPS coordinate, trip distance and duration, fare, payment method, and other related information. The data set contains data from February 2008 to November 2010. In this study, a week’s data (from 3/15/2010 to 3/21/2010) is selected to test the proposed method.
A small region in the southeast of Central Park of Midtown Manhattan is
Model results
To implement the model discussed in the previous section, a Matlab code is written using Parallel Computing Toolbox. A k-shortest path set is required to be computed for each nodal pair in the network and this process takes a considerable amount of time. But once the process is complete, the path sets are stored and needs no further computation. The steps of data mapping and constructing reasonable path sets take little time to complete, as they make use of the information from already computed
Discussion and conclusion
In this study, a new model is proposed to use the limited information provided in the taxi GPS data to estimate urban link travel times. The taxicab data used in this study lacks the information of actual paths taken by the taxi drivers. The proposed model treats the path taken as latent, constructs a reasonable path set, formulates an MNL model to compute the probability of a path being taken by the driver, and estimates the link travel times by optimizing a nonlinear least square problem.
Acknowledgments
This research presented in this paper was supported by RITA/USDOT project “The Use of Large Scale Datasets for Understanding Network State” for which the authors are grateful. The authors are solely responsible for the findings of the research work.
References (22)
Vehicle reidentification and travel time measurement on congested freeways
Transportation Research Part A: Policy and Practice
(2002)- et al.
Valuing travel time variability: characteristics of the travel time distribution on an urban road
Transportation Research Part C: Emerging Technologies
(2012) - et al.
Evaluation of traffic data obtained via GPS-enabled mobile phones: the mobile century field experiment
Transportation Research Part C: Emerging Technologies
(2010) - et al.
Incorporating uncertainty into short-term travel time predictions
Transportation Research Part C: Emerging Technologies
(2011) - et al.
A discrete optimization approach for locating automatic vehicle identification readers for the provision of roadway travel times
Transportation Research Part B: Methodological
(2006) - et al.
Travel time estimation on a freeway using discrete time Markov chains
Transportation Research Part B: Methodological
(2008) - et al.
Short-term travel time prediction
Transportation Research Part C: Emerging Technologies
(2003) - Fletcher, R., 1971. A Modified Marquardt Subroutine for Nonlinear Least Squares. Rpt. AERE-R 6799, Harwell. Matlab...
- Grynbaum, M.M., 2010. Gridlock May Not Be Constant, but Slow Going Is Here to Stay. New York Times. Retrieved July 31,...
- et al.
Modeling of travel time variations on urban links in London
Transportation Research Record: Journal of the Transportation Research Board
(2011)
Estimating arterial traffic conditions using sparse probe data
Proceedings of the ITS
Cited by (175)
Urban path travel time estimation using GPS trajectories from high-sampling-rate ridesourcing services
2024, Journal of Intelligent Transportation Systems: Technology, Planning, and OperationsReal-time ridesharing operations for on-demand capacitated systems considering dynamic travel time information
2023, Transportation Research Part C: Emerging TechnologiesA novel modelling approach of integrated taxi and transit mode and route choice using city-scale emerging mobility data
2023, Transportation Research Part A: Policy and PracticeMethodology for database collection of Taxi drivers' behavior in real time
2023, Social Sciences and Humanities OpenTraffic state estimation of urban road networks by multi-source data fusion: Review and new insights
2022, Physica A: Statistical Mechanics and its ApplicationsPR-LTTE: Link travel time estimation based on path recovery from large-scale incomplete trip data
2022, Information Sciences