ARIMA-FSVR Hybrid Method for High-Speed Railway Passenger Traffic Forecasting

,


Introduction
At present, commonly used passenger flow prediction methods are based on historical data including time-series methods, support vector machines, and neural networks [1][2][3]. For instance, Ni et al. [4] applied the autoregressive moving average (ARIMA) method to solve traffic flow prediction and proved that it can solve the problem of modeling about nonstationary time-series prediction. Xie et al. [5] designed the fuzzy timeseries ARIMA method for long-term waterway traffic volume prediction. Li et al. [6] proposed a robust v-support vector regression (RSVR) method to forecast vessel traffic flow. Liu et al. [7] adopted a support vector machine-(SVM-) based regression prediction to predict the bus passenger flow in the target time window. Li et al. [8] put forward a backpropagation neural network (BPNN) model with population per distance band for traffic flow prediction of urban rail transit station. Hu et al. [9] developed a model re-sample recurrent neural network (RRNN) to forecast passenger traffic on mass rapid transit systems.
Due to the different advantages and disadvantages of various prediction methods, the prediction effect of a single mechanism prediction method is often not ideal. If two or more methods are organically combined to form a hybrid prediction method, it will overcome the deficiencies of a single prediction mechanism and improve the performance of passenger flow prediction [10,11]. Khan et al. [12] combined wavelet transform (WT) with artificial neural network (ANN) and ARIMA into a hybrid model for meteorological drought forecasting, and the model inherits the merits of both WT and ANN-ARIMA. Wu et al. [13] created a hybrid model of ARIMA and wavelet neural network (WNN) combined with genetic algorithm to predict the river water quality. Yu et al. [14] built a novel SVR-ANN combined model with EEMD for rainfall prediction. Luo et al. [15] explored a combined prediction model based on the empirical mode decomposition, support vector regression, and wavelet neural network (EMD-SVR-WNN) to forecast the structural settlement and deformation. e above models achieved satisfactory results. It can be found that SVR and neural network are suitable for solving complex nonlinear problems, and the time-series model has great advantages for time-based prediction. However, there are still some inherent defects in the neural network model, such as ease of sinking into local optimization and the overfitting. erefore, the SVR and time-series method are selected for hybrid prediction.
In this thesis, a combination of differential integrated moving average autoregressive model (ARIMA) and fuzzy support vector regression machine (FSVR) is used to implement a mixed forecasting strategy for railway passenger flow. And, apply it to the actual passenger flow forecast of Shanghai-Guangzhou high-speed railway in order to obtain good forecast performance. Support vector regression (SVR) is a general learning method based on the statistical learning theory of limited samples (SLT) [16]. Fuzzy support vector regression (FSVR) is a new type of support vector regression machine that combines fuzzy mathematics and support vector regression. It introduces fuzzy membership and improves the generalization of machine learning ability. According to the theory of time-series analysis, the ARIMA model is suitable for the prediction and analysis of stationary time series, and the passenger flow data is generally nonstationary series, which needs to be smoothed by difference. erefore, the differential autoregressive moving average model (ARIMA) is used to predict passenger flow.

ARIMA
Differential autoregressive moving average model (ARIMA) is an important method for studying time series. In ARIMA (p, d, q), AR is autoregressive and p is the number of autoregressive items, MA is the moving average, q is the moving average item number, and d is the number of differences made to make it a stationary sequence. e ARIMA (p, d, q) model is an extension of the ARMA (p, q) model. e basic form of the ARMA model is where c is the constant, ϕ 1 , ϕ 2 , ..., ϕ p , θ 1 , θ 2 , ..., θ q is the coefficient, ε t is the white noise sequence, p is the autoregressive order, and q is the moving average order. After passing the difference, the basic form of the ARIMA model is where L is the lag operator and d is the difference order, d ∈ Z, d > 0.

Fuzzy Support Vector Regression
e principle of FSVR is to find a function by minimizing the prediction error, use the nonlinear mapping function ϕ to map the data x i in the input space to the high-dimensional space H, and perform linear regression calculation in H to achieve the effect of nonlinear regression in the original lowdimensional space [17].
In practical applications, different data points contribute differently to the training results, so FSVR solves the problem of overlearning due to the presence of noisy data by introducing fuzzy parameters to eliminate the influence of noise [18], that is, there is a fuzzy degree and each data point is connected so that a training set with fuzzy members will be generated.
For FSVR, let the training set be S � (x 1 , y 1 , s 1 ), (x 2 , y 2 , s 2 ), · · · (x N , y N , s N ) , where x i ∈ R n , y i ∈ R, and s i ∈ [0, 1]. In the time-series problem, the membership degree s i is a function of the time series t i (1 ≤ i ≤ N). In this thesis, the fuzzy membership function f(t i ) is the quadratic function of the time series t i , namely, s i � f(t i ): e boundary conditions are FSVR is for solving quadratic programming problems: where ω is the regression hyperplane weight vector, b is the deviation coefficient, C is the penalty parameter (as a constant value), ε is the regression hyperplane bandwidth, ξ i and ξ * i are the relaxation variable, and s i is the fuzzy membership.
e dual form of equation (5): Solving the dual problem (6), we can get the FSVR regression function:

Experiments
Using the high-speed rail passenger flow between Shanghai and Guangzhou as experimental data, the passenger flow is obtained by day, a total of 176 days of sample data are collected, the first 165 days of sample data are used to build the model, and the last 8 days of sample data are used as test samples to predict comparative analysis. In order to reduce the computational complexity and accuracy of parameter selection, the raw data is normalized. Table 1 shows part of the passenger flow data.
Using the ARIMA model to predict the values, the results are as follows.
It can be seen from the prediction results shown in Figure 1, and the ARIMA model can realize the prediction and analysis of railway passenger traffic. e fluctuation of its prediction results is consistent with the actual passenger traffic curve, but there is a large delay phenomenon which causes a large prediction error and the prediction effect is not ideal.
Based on FSVR's passenger flow prediction, the results are as follows.
It can be seen from the prediction results shown in Figure 2, and the FSVR has a strong nonlinear approximation ability; it has shown good prediction performance in the process of railway passenger traffic forecast, especially in the short-term passenger traffic forecast; its prediction error is small, and the passenger traffic continues to increase or continue. e prediction error is small during the decrease, but at the extreme point, where the passenger traffic trend changes, that is, the passenger traffic changes from increasing to decreasing, or from decreasing to increasing, the prediction error is large. In other words, the dramatic fluctuations in passenger traffic reduce the generalization ability of FSVR and affect its prediction performance.
Using the above ARIMA forecast results as the input items of FSVR, the mixed forecast of railway passenger traffic is realized. e results are as follows.
It can be seen from the prediction results shown in Figures 3 and 4 that the hybrid prediction method can combine the advantages of the two prediction methods to obtain the best prediction results. Compared with the ARIMA method, the delay of the hybrid method prediction results is greatly improved; compared with the FSVR, the prediction effect at the extreme point is significantly improved, and the prediction error is greatly reduced.
In order to prove the performance of the proposed algorithm, it is compared with the ARIMA-WNN method and the EMD-SVR-WNN method. e results of the three hybrid prediction methods are shown in Figure 5.
It can be seen from the prediction results in Figure 5 that, though the ARIMA-WNN method is accurate in the early prediction, it gradually appears the phenomenon of delay after 4 days. e overall trend of the EMD-SVR-WNN method is consistent with the original data; however, the overall predicted value is small. Compared with the above two methods, the prediction results of the ARIMA-FSVR method are more accurate. e forecast error indexes of various methods are shown in Table 2.   It can be seen from Table 2 that the standard error of the ARIMA-FSVR prediction is smaller than the ARIMA and FSVR methods. It is also smaller than the other two hybrid methods. e correlation coefficient of the ARIMA-FSVR method is less than 0.0001, and the P value is 0.9822. Compared with the other four methods, the correlation coefficient is larger and the P value is lower, which proves that the trend of the ARIMA-FSVR method is more accurate and can accurately predict the railway passenger traffic.
It can be found from the experimental results that the ARIMA-FSVR method can accurately predict the railway passenger traffic, handle complex nonlinear relationships, and obtain satisfactory prediction results.

Conclusions
In this paper, a new hybrid method was successfully proposed which achieved great improvements regarding both the prediction accuracy and robustness of the single-item models: (1) e ARIMA-FSVR hybrid prediction method overcame the shortcomings exposed in the singleitem forecasting method, and it can improve the ARIMA delay phenomenon. (2) e ARIMA-FSVR hybrid prediction method surmounts the extreme point problem of the FSVR method.
(3) Empirical studies on the realistic passenger flow data indicated that the ARIMA-FSVR hybrid method was clearly superior to other benchmark hybrid models. is hybrid method obtained the lowest prediction error and had higher accuracy and more reliable prediction results.
In conclusion, the ARIMA-FSVR hybrid method can accurately predict the railway passenger traffic, overcoming the shortcomings of the single-item forecasting method and, at the same time, merging the advantages of single-item forecasting and improving the accuracy of the forecast. is method effectively solves the nonlinear problem of railway   traffic data and provides a new and effective method for the nonlinear prediction problem in practical applications.
Data Availability e case analysis data used to support this study are available from the railway passenger transport department upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.