Predictor fusion for short-term tra ﬃ c forecasting

develop

traffic conditions, using recurrent traffic information. A common characteristic of these studies is that the recorded traffic data that had been influenced by abnormal conditions are detected and removed from the database used in traffic prediction applications (Castro-Neto et al., 2009). A wide range of different statistical and machine learning prediction models have been applied to such scenarios. For example, many early traffic prediction models use the basic Auto-Regressive Integrated Moving Average (ARIMA) (Box and Jenkins, 1970) method (Hamed et al., 1995;Szeto et al., 2009;Van Der Voort et al., 1996). More recently, machine learning methods have been widely used to predict future traffic variables. Commonly used methods include Neural Networks (NN) (Abdulhai et al., 1999;Ishak and Alecsandru, 2004;Park and Rilett, 1999; van Hinsbergen et al., 2009;Zheng et al., 2006), Support Vector Regression (SVR) (Wu et al., 2004), k-Nearest Neighbours (kNN) (Clark, 2003;Habtemichael and Cetin, 2016;Krishnan and Polak, 2008;Smith and Demetsky, 1997;Smith et al., 2002) and Random Forest (RF) (Guo et al., 2017;Guo et al., 2014;Leshem and Ritov, 2007). More recently still, deep learning methods such as Convolutional Neural Networks (CNN) and stacked auto-encoders (SAE) have been applied in short-term traffic prediction (e.g. Lv et al. (2015), Polson & Sokolov (2017)).
Comparison of prediction accuracy between these above methods was made in many studies (e.g. Guo et al. (2010); Guo et al. (2017); Smith & Demetsky (1997); Smith et al. (2002); Vlahogianni et al. (2004)). In general, these comparisons show that there is no single method that can best predict traffic states in all types of datasets and under different conditions (Tan et al., 2009;Vlahogianni et al., 2014;Zheng et al., 2006). Some studies discussed the reasons why a combination of individual forecasting methods might improve accuracy (Clemen, 1989;Makridakis, 1989). In recent years, researchers have begun to investigate strategies to combine predictors for traffic forecasting to increase accuracy under normal traffic conditions. The Bayesian approach was used in Zheng et al. (2006) to combine short-term predictive results from two single neural network predictors. Both training and testing data were collected from four locations under typical normal traffic conditions on an expressway in Singapore. A neural network based approach was used in Tan et al. (2009) to aggregate the results of three time-series forecasting methods. They used one-hour traffic flow data collected from one site on a highway in China. The results showed that the aggregation method could improve one-step ahead prediction accuracy. The authors stated that the developed model in their research only worked under normal traffic conditions because only simple time series prediction methods were used. More recently, a probabilistic method was used in Djuric et al. (2011) to fuse results from four single traffic speed predictors. Data used in their study was collected from a loop detector on a freeway in the USA. However, the developed aggregation models could not provide accurate prediction results during abnormal traffic conditions caused by incidents, sports events and severe weather. More recently, Tselentis et al. (2015) tested both linear regression and Bayesian combination methods with individual time series methods for short-term freeway traffic speed prediction. Both spatiotemporal and exogenous information such as rainfall and volume were considered in the proposed models. Similarly, Qiu et al. (2016) proposed an integrated precipitation-correction model for freeway traffic flow prediction using fusion techniques with four basic forecasting models. Vlahogianni (2015) proposed the surrogate model using three prediction methods for combination in short-term freeway traffic speed prediction. These studies have demonstrated that the combination of different predictors can improve the final accuracy of traffic prediction under normal traffic conditions on freeways and motorways. Table 1 shows the summary of the key features of traffic fusion literature reviewed above, under a number of headings covering the characteristics of the prediction context (such as urban/freeway), the fusion method and the traffic conditions (normal/ abnormal) within which the models were implemented.
The challenges of prediction under abnormal traffic conditions (such as non-recurrent traffic congestion that is caused by planned events such as road works or unplanned events such as incidents or accidents) have received much less attention in the literature, despite their significance, especially in urban areas. However, in the past decade interest in prediction under abnormal traffic conditions has grown. For example, Tao et al. (2005) used NN to predict short-term travel time during incidents using data collected from a highway corridor in the United States. Castro-Neto et al. (2009) proposed an Online-Support Vector Regression (OL-SVR) model to forecast traffic flow variables during holidays and traffic incidents using data collected from the United States_ENREF_5. Random Forests (RF) was used in Guo et al. (2014) to predict travel time under incident conditions using data collected in urban areas in the United Kingdom.
Because different machine learning tools use different strategies to learn relationship from the training dataset, different predictors have differential performance in different circumstances. There is no single method that best all traffic variables under all traffic conditions. The strategy of predictor fusion is widely used to improve prediction accuracy in many fields, such as power (e.g., Bonissone et al., 2011), computer science (e.g., Loh & Henry, 2002) and biology (e.g., Chan & Stolfo, 1997). Motivated by this, a fusion-based framework is proposed to leverage the strengths of different machine learning tools using the same inputs for traffic prediction under a range of traffic conditions.

Fusion-based methods
Fusion is used in machine learning to combine individual learning processes (Chan and Stolfo, 1997). The main advantage of fusing multiple methods is to leverage their complementary predictive characteristics. In order to combine multiple predictors, several different strategies can be used. The average method is the simplest method to combine multiple predictors. This method calculates the mean predictive values among different individual predictors. A variant of the simple average method is the weighted method (Chan and Stolfo, 1997). In this method, each predictor has a weight, which is calculated on the basis of how accurately the predictor performs using a validation dataset. The final prediction is the sum of weighted results of multiple predictors. However, the values of weights are predetermined and constant during forecasting. In recent studies, machine learning methods such as kNN are also used to fuse different predictors (Bonissone et al., 2011). The kNN method, as a typical lazy learning method (Hastie et al., 2001), does not involve any explicit model construction and can locate nearest neighbours of observations from a historical dataset. The weights of each predictor are dynamically calculated and updated with new observations and are not pre-determined. Given the characteristics of kNN, unexpected traffic conditions such as incidents and accidents can easily be incorporated into the fusion framework. kNN is selected as one of the fusion methods for traffic forecasting under different traffic conditions in this paper.

Fusion-based methods
The basic idea behind a fusion-based framework is to generate a final prediction result by combining the output of two or more stand-alone predictors. Fig. 1 shows a prediction framework that combines ⩾ m m ( 2)predictors.

Fusion strategies
In this paper, three different fusion strategies, average fusion, weighted fusion and kNN fusion, are used within the framework to fuse multiple predictors. (2) y i is the prediction result using i th predictor and α i is the weight of i th predictor. Weights are calculated using training dataset. Weighted fusion-based method is the same as simple average method when all the weights are equal to m 1/ . In this paper, the weights are calculated by the inverse of Mean Absolute Percentage Error (MAPE), • The kNN fusion-based method: The kNN method is highly unstructured and does not require any pre-determined model specification. The basic idea of kNN fusion-based method is, given a current traffic state, to search nearest neighbours to this state in the training used historical datasets, calculate prediction errors of nearest neighbour set, estimate weights of each predictor, and combine the final predicted outputs of individual predictors based on these weights. Fig. 2 depicts the flowchart of the kNN fusionbased method. There are two steps to summarise the kNN fusion-based method.
Step 1: Neighbourhood searching procedure: The search procedure finds the nearest neighbours, which are the historical observations that are most similar to the current observation. Euclidean distance is used in this paper to determine the distance between the current input feature vector and historical observations. More details of the kNN parameter selection can be found in Guo et al. (2017). k is the number of historical observations with the nearest distances to the input feature vector. The set of k nearest neighbours of the input feature vector x c can be written as 1,2, , j n 1 2 , and n is the dimension of feature space. The trial and error method introduced in Guo et al. (2012) is used for setting the parameters of kNN. k is set to 5 in this paper.
Step 2: Weighted parameter estimation process: This step is used to calculate weights of each predictor. For each vector x j , the predicted valueŝ=

Machine learning methods used in prediction
Three commonly used machine learning methods, Neural Networks (NN), Support Vector Regression (SVR) and Random Forests (RF), are selected as individual predictors in this paper. These selected machine learning tools use different algorithms for learning the relationships between two related data series. Hence, these machine learning methods are selected in the proposed framework to achieve model diversity.
• Neural networks (NN): Neural networks (NN) (Mitchell, 1997) capture complex relationship between multiple inputs and multiple outputs. As a data driven method, NNs do not require the prior specification of an explicit physical relationship between data. The relationship between inputs and outputs is inferred from a training dataset. NN has been widely used in traffic engineering for tasks such as such as traffic prediction and incident management. In this paper, a well-known feedforward NN (Mitchell, 1997) with a backpropagation algorithm is selected to predict traffic data. Backpropagation has a well-developed theoretical background and has good capability for modelling relationship between continuous variables (Smith and Demetsky, 1997). Moreover, it has been proven to be effective in a wide range of applications that require the representation of complex relationships, including in the context of traffic prediction (Park and Rilett, 1999). Cross-validation for the selection of parameters is used in this paper. After minimising the Mean Absolute Percentage Error in the process of optimisation, the number of hidden layers is set to 8 in this paper.
• Support Vector Regression (SVR): Support Vector Regression (SVR) was developed by Vapnik (1995) based statistical learning theory. SVR can solve the non-linear regression problems by mapping the input data into a higher dimension feature space where a linear regression model can be used in the calculation (Vapnik, 1995). A detailed explanation of SVR and its applications can be found in Vapnik (1995). In this paper, a Radial Basis Function (RBF) is used as the kernel function because it is shown to be more suitable for traffic prediction under different conditions (Guo et al., 2017). There are many methods introduced in the literature to F. Guo et al. Transportation Research Part C 92 (2018) 90-100 select the optimised SVR parameters (e.g. Cherkassky and Ma, 2004). The process of parameter optimisation of SVR has been described in Guo et al. 2012. After optimising, the capacity value C of SVR is set to 80 and the ε-insensitive loss function ( = ε 0.05) is used.
• Random forests (RF): Random forests (RF) was proposed by (Breiman, 2001) based on ensemble learning methods (see, e.g. Hastie et al., 2001). The main difference between RF and traditional tree method is that RF randomly selects variables to split each tree and node, which changes the methods of tree construction and avoids overfitting problems. In the application of regression, the predictive result is calculated by taking the average of all corresponding trees. More details of RF are described in Verikas et al. (2011). One of the key parameters is the number of trees. In the optimisation process, the trial and error method is used. In this paper, the number of trees is set to 500.

Prediction accuracy measurement
In this paper, there are three criteria, Mean Percentage Error (MPE), Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) used to evaluate the final prediction accuracy. MPE is used to evaluate the prediction bias at the disaggregate level. MAPE is used to calculate the mean of the absolute differences between predictive and observed traffic variables. RMSE gives additional weights to larger absolute errors. Therefore, these three measures are used to evaluate the accuracy and precision of prediction results. The definition of these measures are as follows: where t i is the observed traffic variables;t i is the predicted traffic variables and n is the total sample size.

Experiments
The context of this paper for short-term traffic prediction is the urban road networks in the UK. Two test locations in urban areas are selected to examine the transferability of the proposed fusion based prediction framework. Two different types of traffic variables, traffic flow and travel time are used. Moreover, the accuracy of the proposed framework is tested under both normal and incident conditions to examine framework robustness.
Traffic flow data used in this paper are from Inductive Loop Detectors (ILDs), part of SCOOT (Robertson and Bretherton, 1991) traffic control system in Central London. Traffic flow data collected from Cromwell Road and Marylebone Road were selected. Cromwell Road is a major urban road located in the Royal Borough of Kensington and Chelsea. A map of the Cromwell Road and the location of ILDs is shown in Fig. 3(a). The Marylebone Road is a famous thoroughfare in the centre of London from Euston Road at Regent's Park to the A40 Westway at Paddington. This corridor has three lanes of traffic in both directions. A map of the Marylebone corridor is shown in Fig. 3(b).
Link travel time data from London was obtained from the London Congestion Analysis Project (LCAP) (TfL, 2010). LCAP is operated by Transport for London (TfL) to estimate and store link travel time data in London based on ANPR camera data. Link 1309 on the A40 in London with a length of 5.63 km was selected for this application. The topology of this road link in the LCAP system is shown in Fig. 3(c). The direction of travel is from West to East.
In this paper, the proposed fusion-based prediction model was assessed under four test cases using both traffic flow and travel time data as described below: Case 1: 5-min traffic flow data from Cromwell Road (Fig. 3(a)) is used for this test case. In this case, due to the accessibility of traffic flow data in Central London, traffic flow data from 05:00 to 22:00 was extracted. Traffic data are divided into training dataset and testing dataset. The training dataset was collected from 1st November 2013 to 16th December 2013. The testing dataset was collected from 17th December 2013 to 23rd December 2013. Data from weekends are excluded since traffic profiles are different during weekends.
Case 2: 15-min traffic flow data from the Marylebone corridor is used in Case 2. The training dataset was from April, May and June 2008. Only traffic data under normal conditions were used for training. A severe traffic incident happened on the testing day, 20th June 2008, a Friday. The incident happened around 17:30 and was cleared around 20:00. The location of this incident was near the intersection of Mac Farren Place and Marylebone Road (point A in Fig. 3(b)).
Case 3: Link travel time data from A40 in London (Fig. 3(c)) under normal traffic conditions without incidents or accidents was selected. Travel time data for a period of three months between January and March is divided into two datasets. Training data is from 3rd January 2011 to 17th March 2011; while testing data is from 17th March 2011 to 24th March 2011. Data from weekends are excluded from training and test datasets.
Case 4: Link travel time data from A40 in London during the incident is used for this test case. In order to evaluate the accuracy of the proposed model when traffic data was disrupted, travel time prediction during incidents is tested in Case 4. Training data collected during periods of normal traffic are from weekdays during October, November and December 2010; testing data is from 21st December 2010. One lane was blocked on A40 on the testing day because of a broken down vehicle. This incident occurred around 18:00 and was cleared around 18:40. The location of this incident is shown at point A in Fig. 3(c).

Experimental results
Traffic data were predicted using individual machine learning methods, NN, SVR and RF, and with the fusion-based structure using the averaged fusion method, the weighted fusion method and the kNN fusion based method. All the methods were tested using both traffic flow and travel time data under normal and incident conditions. Figs. 4-6 and 7 are examples of time series plots of prediction results using different methods in Case 1, Case 2 Case 3 and Case 4, respectively. The results of comparisons are shown in Table 2 for four test cases with and without fusion-based structures. The mean values of MPE, MAPE and RMSE of three individual methods also shown in Table 2 are used as a baseline in order to examine the accuracy improvement of fusion based methods.
In test Case 1short-term traffic flow prediction under normal traffic conditions, the mean value of MAPE using three individual predictors is 6.42%. The MAPE values using three fusion based methods of averaged, weighted and kNN fusion based methods are 6.41%, 6.40% and 6.28% respectively. There is no significant accuracy improvement using Wilcoxon signed-rank test for these fusion methods in traffic flow prediction under normal traffic conditions.
In Moreover, the kNN fusion based method is more accurate in short-term travel time prediction during incidents. For example, the improvement in MAPE value is 21.85% using kNN fusion based method compared with the individual methods in Case 4. Similarly, the RMSE metric improves from 199.12 s of the baseline to 153.75 s using the kNN fusion based method, a 22.79% improvement.
From the viewpoint of fusion methods, the results of Table 1 indicate that simple fusion methods such as averaged method and F. Guo et al. Transportation Research Part C 92 (2018) 90-100 weighted methods only offer moderate improvements to the overall prediction accuracy. By contrast, the kNN fusion based method improved the final accuracy in four different test cases, especially Case 2 and Case 4incident traffic conditions. For example, in Case 4, the MAPE value is reduced to 8.30% from 9.82% which is the best result using the individual method, a 15.5% increase in accuracy. In Case 2 and Case 4forecasting during incident traffic conditions, the prediction results in terms of the absolute percentage errors (APE) cannot be analysed using Wilcoxon signed-rank test because the incident periods are not long enough to create sufficient samples. When the absolute predictive percentage errors of NN and kNN-based fusion method using the whole testing day data including incident periods were analysed as the samples, the results show that the predictive errors of kNN based fusion method are significantly smaller than those of NN when the significance level was set to 0.025 for hypothesis testing. In this paper, there are two types of input data, traffic flow and travel time. In short-term traffic flow prediction, the average improvement of kNN fusion based method using the MAPE metric is 6.43% under both normal and incident conditions. In short-term travel time prediction, the corresponding improvement is 15.81%.

Conclusions
Traffic states are affected by traffic incidents and accidents, which cause uncertainties for short-term traffic prediction models using conventional machine learning tools. This paper presents a framework for the fusion of multiple predictors to reduce these uncertainties and improve the performance of short-term traffic prediction. A series of experiments using both traffic flow and travel time data from Central London under different traffic conditions were undertaken to evaluate the accuracy of different individual predictors and different fusion methods. The results of these experiments indicate that there are significant differences in the  F. Guo et al. Transportation Research Part C 92 (2018) 90-100 prediction accuracy of different fusion strategies used. The results show that while the accuracy of the averaged and weighted fusion methods is comparable to the accuracy of individual methods, the kNN fusion based method can achieve significantly superior results, especially during disrupted traffic conditions. Overall, the results support the hypothesis that suitably configured fusion based   The bold numbers are the smallest absolute values in each column.
F. Guo et al. Transportation Research Part C 92 (2018) [90][91][92][93][94][95][96][97][98][99][100] methods that exploit the complementary strengths of different predictors can indeed improve final prediction results. They also indicate, however, that the way in which the fusion is performed is critical. However, the proposed fusion framework requires calibrating multiple prediction models in parallel. This increases the computational complexity compared with the standalone prediction methods used within the framework. Thus, there is a trade-off to be had between prediction accuracy and computational efficiency of the fusion method. Future work will aim to refine further our understanding of the most effective fusion methods. More machine learning methods will be used as individual predictors within the fusion framework. In addition, the prediction performance of methods discussed in the literature needs to be compared with those of the proposed framework under disrupted conditions.