How to Provide Accurate and Robust Traffic Forecasts Practically?

Transportation traffic management and covering different technologies, and the open exchange of to improved technologies and applications.


Introduction
With the development of our modern cities, growing traffic problems adversely affect people's traveling convenience more and more, which has become one of the most crucial factors considered in urban planning and design in recent years. Urban traffic congestion is a severe problem that significantly reduces the quality of life in particularly metropolitan areas. However, frequently constructing new roads is not realistic and untenable in social and economic aspects. In the effort to deal with this intractable problem, so-called intelligent transportation systems (ITS) technologies are successfully implemented widely throughout the world nowadays. ITS with two important components advanced traffic management systems (ATMS) and advanced traveler information systems (ATIS) aim to relieve the increasing congestion and decrease travel time through providing information to the drivers by means of radio broadcasts or dynamic route guidance systems.
The provision of accurate real-time information and predictions of traffic states such as traffic flow, travel time, occupancies, etc., is much fundamental and contributive to the great success of ITS (Chen et al., 2010;Dong et al., 2010;Vlahogianni et al., 2004;Tan et al., 2009;Tang et al., 2003;Thomas et al., 2010;Zhang & Liu, 2008, 2009c. As an important part of ITS, traffic states analysis and traffic forecasting are important in directing commuters to pick optimal routes, which have attracted many researchers to focus on this subject in recent decades. In general, as illustrated in the statement, the traffic forecasting "can be separated into two paradigms: the empirical based, incorporating fairly standard statistical methodology on the one hand, and that based on traffic process theory, either of demand or of supply, on the other" (Van Arem et al., 1997).
Because of the feasibility of data collection from numerous kinds of equipments and the requirements of dynamic management, the empirical approaches for traffic forecasting correspond with the development trends of ITS. It aims to find out the hidden regularity of traffic states through the random and uncertain traffic data by systematic analysis and a variety of mathematics/physics methods.
The empirical approaches can be approximately divided into two types: basic forecasts approaches and combined forecasts approaches. The basic forecasts approach means to predict the traffic state using a certain particular prediction model. The robustness and www.intechopen.com accuracy of these approaches lie on the prediction models themselves. Furthermore, basic forecasts approaches can be roughly classified into two types: parametric and nonparametric techniques. Both techniques have shown their own advantages on different occasions in recent years (Tsekeris & Stathopoulos, 2010;Zhang & Liu, 2009f, Zhang & Liu, 2010. On the basis of the classification, the chapter provides a systematic review of these models such as historical-mean (HM), filtering algorithm, linear and nonlinear regression, autoregressive process, neural network (NN), fuzzy systems, support vector regression (SVR), and Bayesian networks, etc.
The combined forecasts approach means to combine different forecasts into a single one that is assumed to produce a more accurate forecast. The robustness and accuracy of combined forecast approaches lie not only on the prediction effect of individual prediction model, but also on the efficiency of combination. Because the combined method is to apply each predictor's unique feature to capture different patterns in the data, it would give a smaller error variance than any of the individual methods (Bates & Granger, 1969). This advantage may make the approach fully scalable to the very large amounts of traffic data practically. Due to its simplicity and practicability, the combined forecasts approach becomes very important to traffic forecasting, and researchers have focused on it, both theoretical and applied.
Though the data-driven traffic forecasting gains many achievements, there still exist some unsolved problems. From the practical point of view, data gained from some detectors are incomplete, i.e., partially or completely missing or substantially contaminated by noises. The missing data sometimes render an entire dataset useless, which is a major hurdle in analyzing traffic information. As missing data treatment is an important preparation step for effective management of ITS, some proper solutions to solve missing data problems are provided in the chapter. And the chapter ends with a brief introdcution of Shanghai Integrated Transportation Information Platform (SITIC), which represents the level of informatization development in transportation.

A brief review of data-driven traffic forecasting
The data-driven traffic forecasting refers to predicting the future state of a certain transportation system based on the historical data, existing traffic data and the related statistics data (Brockwell & Davis, 2002;Chrobok, 2004). Traffic forecasting is a branch of forecasting, and it is an important part of modern transportation planning and intelligent transportation system. Usually, traffic flow, average speed and travel time etc., are defined as the basic parameters of traffic state. Specifically, traffic forecasting is essentially the prediction of these basic parameters based on dynamic road traffic time series data. For instance, most of literature foucs on traffc flow forecasting (Jiang & Adeli, 2004;Qiao et al., 2001;Abdulhai et al., 1999;Castillo et al., 2008;Chen & Chen, 2007;Dimitriou et al., 2008;Ding et al., 2002;Huang & Sadek, 2009;Ghosh et al., 2005Ghosh et al., , 2007Smith et al., 2002), travel time forecasting, and related analysis such as validation, optimization, etc. (Chan et al., 2003;Chang et al., 2010;Kwon, 2000;Kwon & Petty, 2005;Lam, 2008;Lam et al., 2002Lam et al., , 2008Lam & Chan, 2004;Lee et al. 2009;Nath et al., 2010;Schadschneider et al., 2005;Tam & Lam, 2009;Tang & Lam, 2001;Yang et al., 2010).
Overall, the process of traffic state variation is a real-time, nonlinear, high dimensional and non-stationary stochastic process. With the shortening of statistical time range, the stochastic and uncertainty of traffic state are more and more strong. Short-term traffic state variation is not only related to the state of the local road section over the past few hours, but also influenced by the traffic states of upstream and downstream road sections, weather situation and unexpected events, etc.
From the spatial and temporal point of view, the traffic state can reflect regular variation. For example, the traffic states of various road sections of urban road network during peak and non-peak period show periodic variation respectively; and the traffic states in urban highway traffic on weekdays and weekends also show different periodic variation, which reflects the temporal regularity of road network traffic. Meanwhile, the urban road network topology, the length of each road link, lane width and traffic direction, etc. can determine the variation of traffic state on a particular road link, which reflects the spatial regularity of road network traffic. Therefore, in the research of transportation prediction, it is essential to fully consider real-time traffic state variation with the randomness and regularity temporally and spatially. Namely, real-time traffic forecasting should predict the future traffic state on the basis of studying the specific sections of the historical traffic data, the whole spatial-temporal road network traffic condition variation, weather situation, and other influence factors. Fig.1 describes the framework of data-driven traffic forecasting models.

Historical Traffic Information
Real-time Traffic Information

Traffic forecasting approaches
The following factors are usually used for the classification of traffic forecasting approaches: single road link or transportation network, freeways or urban streets, physical models or mathematical methodologies, univariate or multivariate method, etc. From the methodology point of view, the traffic forecasting approaches can be divided into two types: the empirical based approaches and traffic process theory based approaches. For the convenience of data collection from numerous kinds of equipments, a large amount of the historical traffic information and real-time traffic information can be obtained. And the empirical approaches become the new trend of ITS. In this part, we focus on the achievements concerned with empirical approach according to its classification. www.intechopen.com

Basic forecasts approaches
A large amount of scientific literature has been concerned with basic forecasts approaches. On the basis of the classification, the chapter provides a systematic review of parametric and nonparametric traffic forecasting techniques briefly.

Combined forecasts approaches
The basic idea of the combined forecasts approach is to apply each predictor's unique feature to capture different patterns in the data (Zhang & Liu, 2009d. The complement in capturing patterns of data sets is theoretically essential for more accurate prediction (Timmermann, 2005;Huang, 2007). "Both theoretical and empirical findings suggest that combining different methods can be an effective way to improve forecast performances." (Yu et al., 2005a). The linear combining forecasts methodology has a long historical background. Compared to computational intelligence based nonlinear ensemble forecasting models (Chen & Zhang, 2005;Chen & Chen, 2007), the linear combination retains the conceptual and computational simplicity. In the part, we focus on the application of linear combination method. Researchers have proposed various combined methods since the pioneering work of Bates and Granger. Clemen provided a review and annotated bibliography of the literature for reference (Clemen, 1989). "Research in various fields has strongly suggested that the performance of prediction can be enhanced when (sometimes even in simple fashion) forecasts are combined." (Yang, 2004).
Basically, we can describe the main problem of combined forecasts as follows. Suppose there are N forecasts such as  1 P V (t),  2 P V (t), …,  N P V (t) (including correlated or uncorrelated forecasts), where  Pi V (t) represents the forecasting result obtained from the ith model during the time interval t. The combination of the different forecasts into a single forecast  P V (t) is assumed to produce a more accurate forecast. The general form of such a combined forecast can be described with formula where w i denotes the assigned weight of  Pi V (t), and commonly the sum of the weights is equal to one, i.e., ∑ i w i =1. Our studies mainly investigate the combined models with the additional restriction that no individual weight can be outside the interval [0, 1]. Various methods can be applied to determine the weights used in the combined forecasts. Four common methods are presented in the following.

Equal Weights (EW) methods
The EW method, applying a simple arithmetic average of the individual forecasts, is a relatively robust method with low computational efforts. Namely, each w i is equal to 1/N (i=1, 2, …, N), where N is the number of forecasts. The beauty of using the simple average is that it is easy to understand and implement, not requiring any estimation of weights or other parameters (Jose & Winkler, 2008). This makes it robust because they are not sensitive to estimation errors, which can sometimes be substantial. It often provides better results than more complicated and sophisticated combining models (Clemen, 1989). Although the approach has non-optimal weights, it may give rise to better results than time-varying weights that are sometimes adversely affected by some unsystematic changes over time. Under the circumstances, the method has the virtues of impartiality, robustness and a good "track-record" in time series forecasting. It has been consistently the choice of many researchers in the combination of forecasts.

Optimal Weights (OW) methods
Bates & Granger proposed that using a MV criterion can determine the weights to adequately apply the additional information hidden in the discarded forecast(s) (Bates & Granger, 1969), and Dickinson extended the method to the combinations of N forecasts (Dickinson, 1973). Assuming that the individual forecast errors are unbiased, we can calculate the vector of weights to minimize the error variance of the combination according to the formula where I n is the n×1 matrix with all elements unity (i.e. n×1 unit vector) and M V is the covariance matrix of forecast errors (e.g. M Vij is the covariance between the error of forecast i and forecast j at a particular point in time). Granger & Ramanathan pointed that the method is equivalent to a least squares regression in which the constant is suppressed and the weights are constrained to sum to one (Granger & Ramanathan, 1984). In the case of a combination of two forecasts, we suppose there is no correlation between forecast errors.

Minimum Error (ME) methods
The ME method minimizes the forecasting errors when combining individual forecasts into a single one. A solution for this method applies linear programming (LP) whose principle and computational process are described as follows (Yu et al., 2005a). Set the sum of absolute forecasting error (i.e., ∑ i E i (t) during the time interval t) as where F LP is the objective function of LP; V O (t) denotes the observed value during the time interval t and T the number of forecasting periods. To eliminate the absolute sign of the objective function, assume that The introduction of u i (t) and v i (t) aims to transform the absolute sign of the objective function so as to be consistent with the standard form of LP. Obviously, . On the basis of the above specification, the LP model can be constructed as follows: where i denotes the number of individual forecasts, and t represents the forecasting periods. In the equation group, assuming w i ≥0 aims to make every forecast method contribute to the combined forecasting results. The ME method is equivalent to a simple dynamic linear programming problem; thus, the optimal solutions to the LP can be obtained by the simplex algorithm. The method is an effective combination methodology with time-variant weights.

Minimum Variance (MV) methods
The linear combining forecasts methodology has a long historical background. Researchers Since the negative value of the weight has no factual meaning, researchers usually add the restriction that no individual weight can be outside the interval [0, 1] practically. The main ideas are described as where M V is the matrix of error variance. By solving the quadratic programming (QP) problems, an optimal weight set can be obtained for the combining forecasts (Yu et al., 2005b). The problem with this optimizing approach is that it still requires M V to be properly estimated. Practically, M V is often not stationary, in which case it is estimated on the basis of a short history of forecasts and thus the method becomes an adaptive approach to combining forecasts (De Menezes et al., 2000).

Data imputation
Various imputation techniques have been developed in the past decade. Techniques including naïve imputation, expectation maximization (EM) algorithm (Schafer, 1997;Dempster et al., 1977), data augmentation (DA) algorithm (Tanner & Wong, 1987), and regression imputation, etc. lead logically to modern approaches. Regression imputation and MI have been proved to www.intechopen.com be more effective than the others, especially the latter one (Ni et al., 2005;Zhong et al., 2004). State space methodology is found to be extremely significant to ensure more accurate results in nearest nonparametric regression (Kamarianakis & Prastakos, 2005). The amelioration including the historical information in the state space may further improve imputation accuracy. Zhang & Liu proposed LS-SVMs method incorporating with the multivariate state space approach to recover missing traffic flow data in arterial streets of Xuhui district, Shanghai (Zhang & Liu, 2009a, 2009b. The state space not only incorporates lagged values but also is supplemented with aggregate measures such as historical information, spatial information, etc. Fully applying spatial and temporal information, the state space based approaches can model the traffic flow successfully. In this part, we focus on the imputation techniques based on state space method (Zhang & Liu, 2009a, 2009b. In time series, state is defined as a series of system values measured during the past k intervals (k N). Measurements at time t, t-1,…, t-k compose a state vector and k is an appropriate number of lags. A state vector of traffic flow measured by loop detector(s) l every F minute(s) can be described by: where V l (t) denotes the traffic flow from the detector(s) l during the time interval t; V l (t-1) represents the traffic flow from the detector(s) l during the previous F-minute interval, etc. If L loop detectors are considered around the object detector(s) in the traffic network, the values of l range from 1 to L (L  N). When t≤k l , X l (t, k l ) contains the last k l -t+1 parameters measured in theday before the chosen particular day. Object detector(s) can be defined as the detector(s) with missing data.
Considering the historical information in the past week(s), the state space X(t, L) can be defined as: where V Ghist,w (t) is the historical traffic flow from the object detector(s) at the time-of-day and day-of-the-week associated with time interval t at week w that is usually selected as the previous week. The selection of appropriate L and k l for each detector l is based on the spatio-temporal analysis of traffic flows collected from loop detectors at different intersections. Input-output pairs for the training process can apply the vectors where V G (t) is the traffic flow obtained from the object detector(s) G during the time interval t; T is the number of time intervals in a day; k max denotes the maximum value among the lags k l , l=1, 2,…, L. This training process must suppose the good condition of detector(s) G and close relation between V G (t) and V l (t). The total number of training samples is (T-k max +1). When detector(s) G cannot supply complete data V G (T+h) at time T+h, h N, due to some malfunctions, vectors X(T+h, L) are used as input variables to obtain the predicted results  G V (T+h) that can replace the missing values. Comparison between the imputed values and the actual ones V G (T+h) can be utilized to verify the efficiency of different imputation methods.

A brief review of Shanghai integrated transportation information platform
In recent years, we have been exploring traffic informatization and building Shanghai Integrated Transportation Information Platform (SITIC) that provides a mechanism to connect isolated islands of information. After three periods of construction, the system software/hardware, backbone networks, information distribution channels have been completed successfully. The guiding thought for the development of SITIC is "Investigating the present state, revealing the objective laws, and guiding the urban transport more scientifically and efficiently".
Classifying the transportation into Road Traffic, Public Traffic, Inter-city Traffic and District/Transport Hub, sorts of information of vehicles and people were collected from kinds of sources, which is the basis of the normal running of SITIC and further data mining. Different real-time information on the transportation of Shanghai can be clearly shown in SITIC. Researching the transportation problems in metropolis, especially the traffic prediction, we found that mastering the situation of transportation is important to traffic management, which leads to the essentiality of level division of the road network into macro (network), meso (district), and micro (link) levels. Meanwhile, data gained from some detectors are incomplete, i.e., partially or completely missing or substantially contaminated by noises. This may be caused by malfunctions in data collection and recording systems that often occur in practice. The missing data sometimes render an entire dataset useless, which is a major hurdle in analyzing traffic information. Missing data treatment is another important preparation step for effective management of intelligent transportation systems (ITS). The following figure describes the contents and function of the platform briefly. www.intechopen.com

Conclusion
The chapter summarizes data driven approaches for traffic prediction in three parts. First, on the basis of classification of the main methods for traffic forecasting, the chapter aims to describe a large amount of literature of traffic forecasting models. And we focus on the decription of combined forecasts approaches that we believe represent the trend of the development of traffic forecasting in practice. Second, from the practical point of view, proper solutions to solve missing data problems are decribled, espertially the state space based approaches. Finally, from the perspective of dynamic traffic management, it presents the corresponding work and experience of traffic informatization in Shanghai. Intelligent Transportation Systems (ITS) have transformed surface transportation networks through the integration of advanced communications and computing technologies into the transportation infrastructure. ITS technologies have improved the safety and mobility of the transportation network through advanced applications such as electronic toll collection, in-vehicle navigation systems, collision avoidance systems, and advanced traffic management systems, and advanced traveler information systems. In this book that focuses on different ITS technologies and applications, authors from several countries have contributed chapters covering different ITS technologies, applications, and management practices with the expectation that the open exchange of scientific results and ideas presented in this book will lead to improved understanding of ITS technologies and their applications.