Forecasting Bus Passenger Flows by Using a Clustering-Based Support Vector Regression Approach

As a significant component of the intelligent transportation system, forecasting bus passenger flows plays a key role in resource allocation, network planning, and frequency setting. However, it remains challenging to recognize high fluctuations, nonlinearity, and periodicity of bus passenger flows due to varied destinations and departure times. For this reason, a novel forecasting model named as affinity propagation-based support vector regression (AP-SVR) is proposed based on clustering and nonlinear simulation. For the addressed approach, a clustering algorithm is first used to generate clustering-based intervals. A support vector regression (SVR) is then exploited to forecast the passenger flow for each cluster, with the use of particle swarm optimization (PSO) for obtaining the optimized parameters. Finally, the prediction results of the SVR are rearranged by chronological order rearrangement. The proposed model is tested using real bus passenger data from a bus line over four months. Experimental results demonstrate that the proposed model performs better than other peer models in terms of absolute percentage error and mean absolute percentage error. It is recommended that the deterministic clustering technique with stable cluster results (AP) can improve the forecasting performance significantly.


I. INTRODUCTION
Forecasting bus passenger flows is an important component of the intelligent transportation system (ITS). It plays a key role in resource allocation, network planning, and frequency setting. As a result, it has attracted wide attention from researchers and engineers [1].
Forecasting models can generally be classified into three categories: parametric, non-parametric, and hybrid models [2], [3]. Main difference between parametric and non-parametric models lies in the functional dependency assumed between independent and dependent variables [4]. For parametric techniques, several methods have been used to The associate editor coordinating the review of this manuscript and approving it for publication was Zhe Xiao . forecast transportation demand, such as the Box-Jenkins [5], smoothing techniques [6], autoregressive integrated moving average (ARIMA) [4], grey forecasting [7], and state space models [8]. Of these models, the ARIMA model, which is a linear function of time-lagged variables and error terms, has been commonly used as early as 1970s [9]. However, bus passenger flows exhibit high fluctuations, non-linearity, and periodicity. Therefore, traditional parametric models may not be suitable for capturing the structure of non-linear flows because it assumes that the relationships between time-lagged variables are linear.
Within the non-parametric model, neural networks [10], k-nearest neighbors [11], Kalman filters [12], support vector regression (SVR) [13], [14], and other methods [15] were applied to predict passenger flow in transportation. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ Neural networks are mainly expressed as the ability to describe the uncertain and complex nonlinearity of passenger flow time series [16]. They have great potential in forecasting nonlinearity and randomness. However, they are sensitive to parameter selection and prone to local minima and overfitting to some extent [17]. Unlike neural networks, the SVR implements the structural risk minimization principal, which seeks to minimize an upper bound of the generalization error rather than minimize the training error [18]. To some extent, it has the potential to overcome the inherent defects of neural works [19]. It also has the advantages of a global optimum, a simple structure, and a strong ability to promote small samples [20]. Therefore, it can be used to solve problems such as nonlinearity, small sample size, and a high number of dimensions and has been successfully applied to forecasting short-term traffic forecasting [21], complex motion of floating platforms [22], and electric load [23]. All the aforementioned cases have achieved good forecasting results.
Each model tends to have its own advantages and limitations for different applications [24]. Combining different models and retaining the advantages of each is the basic idea of the combination approach [25], [26]. To improve forecasting accuracy, the adoption of a hybrid model has become common [27]. Those hybrid models have achieved some good results when dealing with the forecasting problems.
For these models, it is challenging to recognize complex nonlinear passenger flows caused by emergencies, holiday trips, restricted traffic, and so on. To solve this problem, the data preprocessing for pattern recognition and feature learning maybe a way to reduce the complexity inherent in passenger flows [28]. There are two approaches: time series decomposition and cluster analysis [29], [30] to resolve the problem. For time series decomposition, the time series is viewed as a combination of quasi-periodic signals with noises. Many techniques such as wavelet analysis [16], singular spectrum analysis [31], and empirical mode decomposition [32] have been applied successfully. Meanwhile, cluster analysis attempts to classify data elements into clusters based on their similarity [33]. Differing from random time series analysis, similarity-dependent cluster analysis avoids resolution blurring and outliers in prediction models while considering only the similarity between data. Various similarity recognition methods, such as K-means method [34], Gaussian kernel-based fuzzy c-means clustering (KFCM) [35], and affinity propagation (AP) [36] clustering algorithm, have been proposed for clustering. Compared with K-means and KFCM methods, as an automatic clustering algorithm, AP algorithm does not require a specified number of cluster centers beforehand and can be used to obtain unchangeable clustering results with many tests. At same time, especially for large data, AP or AP-based methods can obtain better solutions with fewer errors and less time than previous algorithms [37]. Although it does not guarantee a global optimum, AP algorithm has been consistently superior to previous algorithms and has become an attractive clustering method [38].
Based on the preceding analyses, to adjust the fine characteristics of bus passenger flows, this paper introduces a novel forecasting model based on clustering and nonlinear simulation: an affinity propagation-based support vector regression (AP-SVR) model. First, the AP algorithm is used to partition the bus passenger flow and generate clustering-based intervals. Then, the SVR, with parameters optimized by partial swarm optimization (PSO) [39]- [41], is applied to fit and forecast the passenger flow for each cluster. Finally, the prediction results of the PSO-SVR are rearranged by chronological order rearrangement.
The rest of this paper is organized as follows. Section II describes the methodology in detail. Case study information, evaluation criteria, and model development are introduced in Section III. Section IV gives the experiment results and analysis of the proposed model. Conclusions are given in Section V.

A. AFFINITY PROPAGATION
AP algorithm is an efficient clustering method introduced by Frey and Dueck [36] in 2007. It has been widely studied and applied in many fields because of its distinctive advantages [42]. AP is taken as an input for a collection passenger flow similarities between data points, where the similarity s(i, k) indicates how well the data point with index k is suited to be the exemplar for data point i. Similarity between data x i and x k can be measured by the negative pairwise Euclidean distance: where p denotes ''preference''. Data points with a larger value of s(k, k) are more likely to be chosen as exemplars [43]. There are two types of messages changing between data points: responsibility and availability. First, the responsibility r(i, k) is sent from data point i to candidate exemplar point k. It reflects the accumulated evidence for how well-suited point k is to serve as the exemplar for point i. Second, the availability a(i, k) is sent from candidate exemplar point k to point i. It reflects the accumulated evidence for how appropriate it would be for point i to choose point k as its exemplar [44].
Letting the availability a(i, k) = 0, the responsibility r(i, k) are computed using the rule where s(i, j) is the similarity between points i and j for i = j.
For k = i, r(k, k) is set to the input preference that point k be chosen as an exemplar. Then a(i, k) updates as The ''self-availability'' a(k, k) is updated differently: It should be noted that Eq. (4) is a special case of Eq. (3) by employing the self-availability concept. When updating the messages, it is important that they are damped to avoid numerical oscillations. The damping factor λ was introduced to avoid numerical oscillations when calculating a(k, k) as shown in Eq. (4). Each message is set to λ times its value from the previous iteration plus 1-λ times its prescribed updated value, where the damping factor λ is between 0 and 1.
The responsibilities and availabilities are iterated until the cluster center remains unchanged for a user-set number of iterations. Then, clusters e(k) are given by maximizing over the sum of responsibility and availability: For e(k) = i, point i is an exemplar or cluster center. If e(k) = i, then point k is an exemplar for point i.

SVR was developed for regression purposes by
Vapnik et al. [45] in 1997. It has been used recently in most transportation fields and forecasting models with good accuracy. As an improved regression algorithm of the standard SVR, the least squares SVR (LSSVR) [46] uses the least squares linear system as a loss function to replace the quadratic programming question in the SVR. The changes greatly reduce the computational complexity and improve the speed of the algorithm [47]. In this study, a data-driven method is used to determine the general input structure of the LSSVR model F(.) for the passenger flow prediction. The ith hour bus passenger flow can be associated with the historical data, i.e., Considering a given sample set T = {(x i , y i )|i = n + 1, n + 2, . . . , l} with input data and output data, the LSSVR can be represented in the feature space as where ω ∈ R nf , b ∈ R, and ϕ(x) represent the nonlinear mapping function from the input space to high dimensional feature space. To solve the above problem, the objective function is obtained by minimizing where ξ i is the error variable and γ the penalty coefficient. A Lagrange function can be defined as with α i (i = 1, 2, 3, . . . , l) are Lagrange multipliers. The conditions for optimality can be rewritten as the following set of linear equations eliminating ω, ξ where q is a l×1 dimensional column vectors, a = [a 1 a 2 . . . a l ] T , y = [y 1 y 2 . . . y l ] T and k is the kernel function that satisfies the Mercer condition. The solution to the problem can be obtained by There are large variety of kernel functions with relevance to LSSVR, e.g., linear, polynomial, sigmoidal, and radial basis function (RBF). Since the reproducing kernel completely characterizes the hypothesis feature space, its choice has a crucial impact on the ability of the LSSVR [48]. The widely selected kernels for traffic flow prediction are RBF and polynomial (Poly) kernels, because of their good learning ability and good generalization ability, respectively [49]. They can be expresses as the following equations where Eq. (13) is RBF kernel function, and Eq. (14) represent Poly kernel, in which q determines the number of dimensions in the induced feature space, and t is the bias term in the kernel function.
There are two parameters (γ , σ 2 ) for the RBF based LSSVR and these parameters (γ , t, q) for the Poly based LSSVR to be determined. Considering the simple implementation (easy to realize and parameter determination), parallel processing, and global optimization, the PSO is applied to determine the parameters of the LSSVR model in this paper. For the standard PSO, it first initializes a set of random particles (usually 10< N <100). Each particle has a position and a velocity for determining the direction and distance of the search. In addition, a fitness value is used to measure the merits of the particle based on the optimization problem. The velocity and position are determined by where the velocity is restricted to [-v max , v max ]; r 1 and r 2 are random values within [0 1]; positive constants c 1 and c 2 are personal and social learning factors, respectively; ω is an inertia weight or constriction coefficient; x i,d and ν i,d represent the ith particle position and velocity, respectively; and d is the dimension usually randomly initialized in a search space. Individual extreme pbestd(t) is found by the particle itself, and global extreme gbestd(t) is found by the whole population.

C. OVERVIEW OF THE PRESENT APPROACH
There are five steps in constructing the separation-fusion approach for forecasting bus passenger flows. These can be summarized as follows and are described in Fig. 1. Step 1: Collect the passenger flow dataset T partitioned into a training dataset (80%) and a test dataset (20%).
Step 2: Apply AP clustering method to divide T into C disjoined clusters, named as C 1 , C 2 , . . . , C N ., where each cluster contains both training data as well as testing data.
Step 3: Testing data are predicted by LSSVR model optimized by the PSO for each disjoined cluster.
Step 5: Rearrange these representations [p c1 ,p c2 , . . . ,p ck ] in chronological order, and out final forecasting results. As shown in Fig. 2, the passenger flow of the Guangzhou bus line is almost cyclical and similar every week, except  for the abnormal time intervals during (a) Nation Day. The tendency is enlarged and can be seen clearly in Fig. 3(a). Passenger flow on National Day is significantly less than those on normal days. However, the tendency of passenger flow during one week (chosen randomly from the dataset) is the same as that on Nation Day. As such, it is considered in the proposed model. At the same time, the hourly relevant time series displays a similar regular trend and repeatable pattern from Monday to Sunday.

III. CASE STUDY
Looking at typical observed samples for all time intervals on typical weekdays and weekends (shown in Fig. 3(b)), there are two obvious peaks in the morning and afternoon every day. However, unlike on weekdays, passenger flows which are severely affected by random factors, exhibit flexibility and variability on weekends. On weekdays, the peak hour periods are 7:00-9:00 and 17:00-19:00, and the off-peak periods are 10:00-16:00 and 20:00-21:00. On weekends, there are two peaks each day. However, the peak periods are different and later than those on weekdays (shown in Fig. 3(b)). Obviously, weekends are more suitable for people to travel.

B. PERFORMANCE CRITERIA
Two criteria, including the variance of absolute percentage error (VAPE) and mean absolute percentage error (MAPE), are applied to evaluate the prediction performance in this paper. The MAPE is used to measure the mean prediction accuracy, while the VAPE for presenting the prediction stability, described as where y i is the actual value andŷ i the predicted value.

C. MODEL DEVELOPMENT
For AP algorithm, the similarity matrix is first computed as the input. It takes the time intervals and passenger flow to construct the s(i, j) with 2016 points. An important parameter that decides the number of cluster centers, p is set as the diagonal value (s(k, k)). The default value for λ is 0.9 as suggested by [33]. For the SVR, there is not a universal way to determine the optimal model order n (as described in Eq. (6)). One may adopt a trial-and-error method and correlation analysis for this task. The maximum n value is set to be 8 with experience, and n is obtained when the minimum of the MAPE first appears. After determining n, PSO is designed to optimize the parameters of the LSSVR.
To testify the performance of the suggested prediction models, the collected 2016 data points are split into two sets: training and testing sets. The training set contains the first 1680 consecutive data points (the first 15 weeks, August 25 -December 8, 2014), and the testing set contains the remaining 336 data points (the remaining 3 weeks, December 8 -28 2014). In the following study, the original time series and intervals are normalized as [0, 1].

A. CLUSTERING RESULTS
The bus passenger flows for all of the data points are clustered into 20 components (C 1 , C 2 , . . ., C 20 ) by AP algorithm as shown in Fig. 4. Fig. 4 proves that the peak periods are clustered into C 15 , C 1 , C 18 , C 10 , and C 11 and that the non-peak periods are classified into the remaining clusters. It shows that the passenger flows in the same time intervals tend to be classified into different clusters except for C 4 and C 3 because AP considers the similarity between data points. The more the quantities of passenger flow are equal in the same time intervals, the easier they are to classify into the same cluster. When the distance between two-time intervals is more than 0.1, AP algorithm tends to partition equal passenger flows into different clusters such as C 11 and C 20 .

B. FORECASTING RESULTS
After identifying patterns, SVR is designed to predict bus passenger flows. Table 1 lists the performance by using the proposed method with RBF kernel and Poly kernel for the passenger flow prediction, respectively. One can see that the RBF kernel has better prediction accuracy than the Poly kernel for the given case. It means that the RBF kernel is superior to the Poly kernel in the LSSVR model for the passenger flow prediction.
The best-fitted structure models based on RBF kernel for each model are identified according to Table 2, and an optimal n is also obtained toward the minimum MAPE for the testing data.

C. PATTERN RESULTS
It is hard to determine the type of kernel function for specific data patterns [14]. However, RBF kernel function is easier to implement and is capable of non-linearly mapping the training data into infinite dimensional space. Therefore, RBF kernel function is specified in this study.
After obtaining the prediction results of multiple clusters using SVR, the final results are rearranged by chronological order rearrangement. The pattern combination is plotted in Fig. 5. Note that the real line represents the records, and the dashed line means the forecasts.
As shown in Fig. 5, the proposed model captures the deviation between real values and prediction results. The tendency of those curves shows that the forecasting results fit the actual values well, except for the rush hour (marked with arrowhead) on weekends. The passenger flow on weekends is more flexible and changeable than on weekdays due to the severe effect of random factors.
To further analyze the superiority of the proposed approach, the aforementioned MAPE and VAPE were employed as the statistical variables for the test. It is calculated that MAPE value is 7.13% and VAPE value is 6.77% for the forecasting results, indicating high accuracy and stability (detailed comparisons are shown in Table 4).
According to both quantitative and qualitative analyses, the proposed model is suitable for capturing the structure of non-linear passenger flow well. The performance can be attributed to the similarity feature observation.

D. COMPARISONS WITH PEER METHODS
To highlight the forecasting performance of the proposed model, four models are employed for comparisons using the same dataset, including three non-parametric models (KFCM-based SVR, SVR, and BPNN models) and a parametric model (seasonal ARIMA (SARIMA) model). Structures and parameters of these models are listed in Table 3.   Note that the structures of the models are determined by the experiments (except for the SARIMA model, which is determined by Ref. [4]). Their training processes and parameters are the same as those of the proposed model.
KFCM model uses a new kernel-induced distance to replace Euclidean distance in the FCM model. It uses a nonlinear function to map into a feature space that may have more dimensions. Thus, the original linear indivisible sample points become linearly separable in the nuclear space [31]. The clustering results of the KFCM model are shown in Fig. 6. Note that the description of Fig. 6 is consistent with the description of Fig. 4. Comparing Figs. 4 and 6 show that the clustering result of AP algorithm is more detained than that of KFCM model. Using the peak period as an example, there are five clusters (C 15 , C 1 , C 18 , C 10 , C 11 ) in AP algorithm and two clusters (K 2 , K 6 ) in KFCM model. Quantitative evaluation results with MAPE are 6.981%, 0.291%, 3.869%, 4.599%, and 3.534% for C 15 , C 1 , C 18 , C 10 , and C 11 , respectively, and 8.574% and 10.966% for K 2 and K 6 , respectively. As such, the peak periods with more clusters are stable and suitable for forecasting models. At the same time, the cluster center of AP exists in the real data. However, KFCM model depends on a partition matrix. Comprehensively, AP is an automatic clustering algorithm. The number of clusters in the algorithm is obtained from the real-valued messages between data points. On the contrary, KFCM model comes from the lowest Xie-Beni Validity Index [50].
Forecasting results for different models are compared in Fig. 7 using the testing data of the first week. Comparing with AP-SVR and SVR model, KFCM-SVR model performances poor quality both in forecasting peak value and hush hour at weekends due to scant cluster analysis. Fig. 7 shows that the SARIMA model has limitations in predicting nonlinearity and randomness. Comprehensively, the AP-based SVR has the best performance among these models.
It is notes that MAPE is the mean value of all the absolute percentage errors (APEs). Fig. 8 shows the contour map of the forecast APEs for different models. The AR-SVR model provides the best prediction. Generally, the highest APEs for the four models occur in the early morning period (6:00-7:00) on weekends, especially for the SARIMA model (APE higher than 90%). The passenger flow fluctuates greatly on weekends due to the destinations and departure times. It is hard to capture all of the different features for different intervals, especially for the transition periods. Comparatively, the AP-SVR model performs better than the other models for this period. On weekends, other than the early morning period, the times 15:00 and 20:00 exhibit higher APEs. Unlike the weekdays, these two time intervals represent peak hours for travelers heading out and back. For weekdays, during the 6:00-9:00 period, the prediction APEs are under 12%, except for Tuesday in the AR-SVR and BPNN models, Thursday in the KFCM-SVR model, Tuesday and Wednesday in the LSSVR model, and Monday in the SARIMA model. This may be due to the relatively stable recurrent demand for the AM peak period. The APEs of the four models during the 12:00-14:00 period and at 20:00 are higher than during the other time intervals (except for 6:00-9:00). Fig. 3(b) shows that 12:00-14:00 is rush hour. At 20:00, the passenger flow decreases after the peak period. Comprehensively, the AP-SVR model can deal with this rush hour more intelligently.
By comparing Fig. 5(a) with Fig. 7, one can barely identify (but not very clear) the good result of the present AR-SVR. For this reason, the aforementioned MAPE and VAPE were again employed as the statistical variables for the comparison. Table 4 shows the corresponding prediction performances using different models. It shows that the addressed AP-SVR model performs well in terms of accuracy because it has the lowest MAPE value (7.13%), whereas the SARIMA model performs worst because it has the highest MAPE value (12.57%). Obviously, the forecasting accuracy of the AP-SVR model, considering the similarity of data points and presents data in a finite and small number of oscillatory modes, outperforms the other models. Compared with the LSSVR model, the KFCM-SVR model is limited in pattern identification. Additionally, the VAPE values of the four models are shown in Table 4. The AP-SVR model presents the best forecasting stability because it has the smallest VAPE value. The MAPE and VAPE values of the SARIMA model for the test dataset are 12.57% and 15.55%, respectively. This model has the worst forecasting accuracy and stability among all of the models considered in this paper. Obviously, the SARIMA model does not predict the passenger flow well.
The results indicate that it is not suitable for constructing a forecasting model for non-linear passenger flow due to its assumption of linearity.

V. CONCLUSION
A novel AP-SVR approach has been proposed to forecast bus passenger flows. The dataset was first segmented into different clusters by AP algorithm. Subsequently, SVR was optimized by PSO for forecasting passenger flows for each cluster. Finally, the results of all SVRs were rearranged with chronological order. Taking bus line 15 in Guangzhou, China as a case study, this paper evaluated AP-SVR model. According to both quantitative and qualitative analyses, the proposed model is superior to the comparison models. The addressed model weakens the non-linear characteristics of bus passenger flows to some extent due to its similarity clustering. Pattern recognition and feature learning are capable of improving the performance accuracy and stability. However, the model does not comprehensively forecast disruptions caused by special events and the external environment, such as weather and temperature. These factors and additional forecasting capabilities should be investigated in future.