An Adaptive Decomposition Ensemble Learning Approach for Metro Passenger Flow Forecasting

： Accurate and timely metro passenger flow forecasting is critical for the successful deployment of intelligent transportation systems. However, it is quite challenging to propose an efficient and robust forecasting approach due to the inherent randomness and variations of metro passenger flows. In this study, we present a novel adaptive decomposition ensemble learning approach to accurately forecast the volume of metro passenger flows that combines the complementary advantages of variational mode decomposition (VMD), seasonal autoregressive integrated moving averaging (SARIMA), a multilayer perceptron (MLP) network and a long short-term memory (LSTM) network. Our proposed decomposition ensemble learning approach consists of three important stages. The first stage applies VMD to decompose the metro passenger flow data into periodic components, deterministic components and volatility components. Then, we employ the SAIMA model to forecast the periodic component, the LSTM network to learn and forecast the deterministic component and the MLP network to forecast the volatility component. In the last stage, these diverse forecasted components are reconstructed by another MLP network. The empirical results show that our proposed decomposition ensemble learning approach not only has the best forecasting performance compared with the relevant benchmark models but also appears to be the most promising and robust based on the historical passenger flow data in the Shenzhen subway system and several standard evaluation measures.


Introduction
Metro transportation systems have played a vital role in urban traffic configurations. They not only provide a means of reducing ground traffic congestion and delays but also offer the advantages of high safety, reliability and efficiency, and they have become increasingly popular. According to the relevant data, there were approximately 5.1 million metro trips every day in Shenzhen in 2018, accounting for 48% of the total public passenger flow. With the widespread application of Intelligent Transportation Systems (ITS) worldwide, managers of metro transportation systems can obtain vast amount of "real-time" data. While these data are extremely important, researchers and managers have realized that potential values of ITS cannot be achieved if the metro passenger flow cannot be predicted in the short-term. Passenger flow forecasting plays a critical role in urban metro system because it is significantly important to develop a reasonable operating plan to match transport capacity and passenger demand, fine-tune passenger travel behaviors, improve transport services and reduce the level of congestion. Specifically, accurate passenger flow forecasting can improve the management staff's ability to deals with passenger flow peaks and emergencies, thereby avoiding the potential safety hazards in passenger transportation. On the other hand, it provides real-time passenger flow information for public, which helps passengers choose a reasonable travel route, and contributes to the balanced distribution of urban public transportation.
In the field of transportation, research on metro passenger flow forecasting has attracted increasing attention and can be categorized as studying short-term, medium-term and long-term issues; the short-term issue is foremost in the recent literature. Obviously, the change in metro passenger flow is a real-time, nonlinear and nonstationary random process. With the shortening of the statistical period, the metro passenger flow becomes more uncertain and random. Accordingly, forecasting short-term metro passenger flow is not an easy issue, and there is still much to do to improve the accuracy of short-term metro passenger flow forecasting, which is a critical element in traffic systems .
Various econometric methods have been applied to predict metro passenger flow, such as moving average models, autoregressive integrated moving average (ARIMA) model, seasonal autoregressive integrated moving average (SARIMA) model, and vector auto-regression (VAR) model (Wang et al. 2018;Tsui et al. 2014). However, due to the linear assumption between time lag variables, these traditional econometric methods fail to capture the nonlinearity and complexity of metro passenger flow, resulting in weak forecasts. To overcome this problem, a lot of advanced artificial intelligence (AI) methods are developed to predict metro passenger flow, such as artificial neural networks (ANNs), long short-term memory (LSTM), and support vector regression (SVR) (Vlahogianni et al. 2004). However, the AI methods still have numerous problems, such as overfitting, local optimal problem and parameter optimization. Due to the thresholds and weights of AI models need to be initialized, the forecasting results of these models are mostly unstable. To address above obstacles, some hybrid ensemble methods combining the advantages of different methods seem to improve the accuracy and stability of predictions (Chan et al. 2012;Tchrakian et al. 2012;Wei and Chen 2012b). Jiang et al. (2014a) developed a hybrid short-term demand forecasting approach which integrated grey support vector machine (GSVM) and empirical mode decomposition (EEMD). Yang and Chen (2019) proposed a hybrid empirical model decomposition (EMD) model and deep learning model, which can be applied to the multistep ahead forecasting prediction of traffic flow.
Previous studies have shown that the decomposition ensemble method is feasible, and ensemble empirical mode decomposition (EEMD) and empirical mode decomposition (EMD) method are usually used to decompose the original time series. Nevertheless, EMD and EEMD employ interpolation to interpolate between extreme points to obtain envelope curves that are sensitive to sampling and noise. Moreover, EMD and EEMD based on iterative algorithms, lacks an exact mathematical model (Dragomiretskiy and Zosso 2014). Considering the importance of decomposition algorithm for model training, the advanced decomposition algorithm undoubtedly improves the forecasting performance. Many forecasting studies have found that, compared with EEMD and EMD, the variational mode decomposition (VMD) is more effective (Lahmiri 2017). Surprisingly, VMD is hardly mentioned in the literature of subway passenger flow prediction. Hence, this study applies VMD method to decompose metro passenger flow series.
Metro passenger flow tends to have daily, weekly and seasonal periodic patterns, and the pedestrian movement patterns of passengers on weekdays and weekends are completely different (Ke et al. 2017;Diao et al. 2019). The majority of passenger regularly take metros as commuter vehicles on weekdays, while on weekends, metros are randomly used (Sun et al. 2015). Furthermore, the pedestrian movement patterns of passengers are sensitive to special events, extreme weather conditions, accidents, etc., and they may slightly regulate their travel time, transferring stations and trip mode to avoid rush hours . The time series of metro passenger flow obviously has characteristics of temporal periodicity, high fluctuation and nonlinearity. Understanding the dynamic characteristics of passenger flow is the key to build an appropriate and accurate forecasting model. Therefore, this study regards passenger flow as consisting of three parts: periodic components, deterministic components and volatility components, refers to (Zhang et al. 2014).
Previous studies have shown that SARIMA, LSTM, and MLP models are the most commonly used models. If we combine the advantages of these three technologies, we can obtain better prediction performance. Hence, this study proposes a novel decomposition ensemble learning approach for metro passenger flow forecasting with variational mode decomposition (VMD) to obtain periodic components, deterministic components and volatility components. We use the SARIMA model to predict the periodic component, use the LSTM network to learn and predict the deterministic component, and use the MLP network to predict the volatility component. In the final stage, various prediction components are reconstructed through another MLP network. The main contributions and innovations in this paper can be summarized as follows: (1) Based on the principle of "decomposition and ensemble" and the concept of "divide and conquer", this study proposes a novel decomposition ensemble learning approach for metro passenger flow forecasting with VMD technology, SARIMA model, LSTM network, and MLP network. It combines the advantages of various methods, and improves the accuracy of prediction and model robustness.
(2) Considering that the walking patterns of passengers on weekdays and weekends are completely different, this paper establishes prediction models for weekdays and weekends respectively, so that the model can better identify the dynamic characteristics of passenger flow and improve the prediction accuracy.
(3) Our proposed decomposition ensemble learning approach not only has the best forecasting performance compared with the relevant benchmark models but also appears to be the most promising and robust based on the historical passenger flow data in the Shenzhen subway system and several standard evaluation measures.
The rest of this study is organized as follows: a comprehensive literature review is shown in Section 2. Then, the related methodology is introduced in Section 3. The empirical results and performance of the approach we proposed are given in Section 4. At last, conclusions and suggestions for future work are offered in Section 5.

Literature review
Over the past few decades, short-term traffic forecasting has attracted widespread attention from researchers all over the world. Generally, traffic forecasting models can be divided into two major categories: parametric models and nonparametric models. In addition, hybrid models and decomposition techniques are also widely applied in short-term traffic forecasting. Each family of the above models is described in detail below.
First, in a multiplicity of parametric models, many prototypes of different models have been applied to traffic flow forecasting, such as exponential smoothing models, moving average models, gray models, autoregressive integrated moving average (ARIMA) models (Hamzacebi 2008), and state space models (Stathopoulos and Karlaftis 2003). ARIMA is one of the most common models used for stationary time series forecasting. Due to the time series needs to be stationary, ARIMA can capture linear relationships well, but it is difficult to capture nonlinear relations (Zhang 2003). Other models also have their shortcomings such as exponential smoothing models that give a small proportion of long-term time series data, resulting in poor long-term forecasting effect.
Second, in the family of nonparametric models, numerous approaches have been applied to forecast traffic flow, including nonparametric regression methods such as Gaussian maximum likelihood (Tang et al. 2003), artificial neural networks (Chen et al. 2012;Tsai et al. 2009;Li et al. 2019), support vector regression (Chen et al. 2012;Wu et al. 2004;Sun et al. 2015;Yao et al. 2017), and other models (Dumas and Soumis 2008;Sun 2016). Among these nonparametric models, artificial neural networks have become a research hotspot in passenger flow forecasting because of their nonlinearity, adaptability and fault tolerance. The applications of artificial neural networks develop from simple structures to complex structures such as finite impulse response networks (Yun et al. 1998), Jordan's sequential neural networks (Yasdi 1999), spectral basis neural networks (Park et al. 1999), wavelet-based neural networks (Boto-Giralda et al. 2010), Elman neural networks (Chen and Grant-Muller 2001), and Kalman filtering-based multilayer perceptron (Lippi et al. 2013). Obviously, no method is perfect and artificial neural networks also have a few shortcomings, such as the issues of overfitting, low interpretability and determination of parameters. Additionally, in order to have a strong generalization ability, larger in-samples are needed. Cortes and Vapnik (1995) proposed another widely used nonparametric model named support vector machine (SVM), which is based on the principle of minimizing structural risk. Compared to artificial neural networks, SVM has better generalization ability and more efficient to deal with the issue overfitting, especially in small samples. However, the SVM algorithm is difficult to implement on large-scale training samples.
Third, as a result, it is difficult to accurately forecast metro passenger flow using linear or nonlinear models alone (Bai et al. 2017;. Because of the temporal periodicity, high volatility and nonlinearity of metro passenger flow, decomposing the metro passenger flow and using a hybrid model for prediction is an effective solution. Generally, hybrid models have been demonstrated to provide better performance than single models in traffic flow forecasting, including empirical mode decomposition and gray support vector machine (Jiang et al. 2014b), nonlinear vector auto-regression neural network combined with mean impact value (Sun et al. 2019), and ARMA combined with kernel extreme learning machine (KELM) (Jin et al. 2020). Any individual model is not perfect and has its own weaknesses/advantages, but the ensemble learning framework enabled each individual estimation model to be combined to improving the accuracy of traffic state estimation (Ni et al. 2017).
Fourth, to better capture traffic characteristics, numerous approaches have been applied to decompose traffic flow into different components, including seasonal decomposition, empirical mode decomposition (EMD), wavelet transform (WT) and variational mode decomposition (VMD). Seasonal decomposition is an effective method to decompose time series into seasonal components, trend components and irregular components. Xie et al. (2014) proposed two hybrid approaches to conduct short-term forecasting of air passengers based on least squares support vector regression (LSSVR) and seasonal decomposition. EMD is an effective, empirical and adaptive multiresolution signal decomposition technique. Therefore, it has powerful advantages in processing non-stationary and non-linear data, obviously. Compared with the SARIMA, a hybrid of neural networks and EMD had stronger stability and higher accuracy to forecasting metro passenger flows (Wei and Chen 2012a). Wavelet decomposition is an effective way of analyzing the passenger flow data in both time and frequency domains. Paper (Diao et al. 2019) decomposed a traffic volume series into several components by discrete wavelet transform and predicted different components with a Gaussian process model and a tracking model. VMD is a novel non-recursive and adaptive signal decomposition method, which has a good processing effect on non-stationary and non-linear signals. Paper ) decomposed an air cargo time series using an enhanced decomposition formwork consisting of empirical mode decomposition (EMD), sample entropy (SE) and variational mode decomposition (VMD). Container throughput time series were decomposed into high-frequency components and low-frequency components by Variational mode decomposition (VMD) (Niu et al. 2018).
This study proposed a novel decomposition ensemble learning approach based on the principles of "divide and conquer". The original passenger flow series is decomposed into three patterns by VMD algorithm, and then periodic components, deterministic components and volatility components are predicted by SARIMA, LSTM and MLP. Then various prediction components are reconstructed through another MLP network. To verify the superiority of our proposed learning method, we established five predictive models (i.e., seasonal autoregressive integrated moving averaging (SARIMA) model, multilayer perceptron (MLP) neural network, long short-term memory (LSTM) network, and two decomposition ensemble learning approaches including VMD-MLP and VMD-LSTM) then used them as benchmarks to make multistep prediction comparisons of three Shenzhen metro stations.

Related methodology 3.1 Variational mode decomposition
Variational mode decomposition (VMD), originally proposed by Dragomiretskiy and Zosso (Dragomiretskiy and Zosso 2014), is a novel non-recursive and adaptive signal decomposition method that can accommodate much more sampling and noise than some popular decomposition methods such as empirical mode decomposition (EMD) and wavelet transform (WT). The main goal of VMD is to decompose a time series into a discrete set of band-limited modes k u , where each mode k u is considered to be compact around a center pulsation k  , which is determined during the decomposition.
For instance, the time series f is decomposed into a set of modes k u around a center pulsation k  according to the following constrained variational problem: where k is the number of modes,  is the Dirac distribution, and  is the convolution operator.   k u and   k  represent the set of modes   12 , , , k u u u K and the set of center The above constraint variational problem can be headed with an unconstrained variational problem according to Lagrange multipliers  , which is given as follows: where  represents a balance parameter,  represents the Lagrange multipliers, and denotes a quadratic penalty term for the accelerating rate of convergence.
Furthermore, the solution to Eq.
(2) can be solved by the alternative direction method of multipliers (ADMM) by means of finding the saddle point of the augmented Lagrangian function L in a sequence of iterative sub optimizations. Therefore, the solutions for k  , k  and  can be obtained as follows: , and the number of iterations is n .
The number of modes k needs to be determined, before the VMD method. There is no theory on the optimal selection of the parameter k . In this study, its value is set to 3. For further details on the VMD method, please refer to Dragomiretskiy and Zosso (2014).

Seasonal autoregressive integrated moving average
The SARIMA is an extremely effective forecasting method on periodic time series, and the goal of the SARIMA is to describe the autocorrelation of the time series. Seasonal , and then regressing the on its lag value and the present and lag values of the random error term. It can be expressed by: where d is the difference order, D is the seasonal difference order, B is the backshift operator defined by a

Multilayer perceptron network
In addition to the input and output layers, there can be multiple hidden layers in a multilayer perceptron network. Due to the complex mapping from inputs into outputs in the multilayer perceptron (MLP) network, it can approximate almost any non-linear function. The relationship between the inputs ( 12 , , , As the backpropagation (BP) algorithm can minimize the total square errors of the in-sample dataset, it's used to train parameters in MLP networks, commonly. Determining the learning rates, momentum parameters, the number of hidden layers and the number of neurons in each layer is an urgent issue. These parameters are determined by the trial-and-error method to search out the optimal architecture of the multilayer perceptron (MLP) network. And the autoregressive model is used to identify the input size in this study.

Long short-term memory network
The long short-term memory (LSTM) neural network, originally proposed by Hochreiter and Schmidhuber (Hochreiter and Schmidhuber 1997), is a kind of recurrent neural network. The cells of the LSTM network are used to store the information that meets the algorithm certification or to forget the non-conforming information. LSTM neural networks have powerful and stable capabilities to solve short-term and long-term dependency issues. The memory cell is the key parameter in the LSTM neural network for the function of adding or removing information to the cell state.
In this study, the LSTM neural network consists of three layers: one input layer, one hidden layer and one output layer. We define as the historical input data and   12 , , , T y y y y  K as the output data. Then, the predicted metro passenger flow can be calculated by the following equations: where t i represents the input gate, t f represents the forget gate, t o represents the output gate, t c represents the activation vectors for each memory cell, t m represents the activation vectors for each memory block, W represents the weight matrices, b represents the bias vectors and o represents the scalar product of two vectors.
   g ,   g g and   h g represent the logistics sigmoid function as follows: The hyperparameters of the LSTM network are trained based on the backpropagation algorithm. Minimizing the mean squared error of the in-sample dataset is the objective function of the LSTM network. For more information and details on the LSTM network, please refer to Hochreiter and Schmidhuber (1997).

The framework of the decomposition ensemble learning approach
According to the discussion in the introduction, we assume that metro passenger flow consists of three components: periodic components, deterministic components and volatility components. And periodic components, deterministic components and volatility components are added together in our proposed structure: where t x is the observed metro passenger flow during time t , t p is the periodic component of the metro passenger flow, which is expressed as regression of the present on periodic sines and cosines, t d is the deterministic component of the metro passenger flow after removing the periodic components, and t v is the volatility component of t x . The SARIMA model is applied to capture the periodic patterns in the metro passenger flow data and regression on the periodic trend. After removing these periodic components in the data, the LSTM neural network is used to fit the deterministic components of the metro passenger flow data. The MLP network is employed to model the volatile components of the metro passenger flow data. Fig. 1 provides a flowchart of our proposed decomposition ensemble learning approach.

Empirical study
In this section, we evaluated the performance of our proposed decomposition ensemble learning approach for metro passenger flow forecasting and compared it with several other benchmark models to demonstrate the superiority of our proposed decomposition ensemble learning approach. To accomplish this task, we collect smart card data from the Shenzhen metro system to test the forecasting performance of our proposed decomposition ensemble learning approach. The data description and evaluation criteria are introduced in Section 4.1, and the empirical results are analyzed in Section 4.2.

Data description and evaluation criteria
In this study, our proposed decomposition ensemble learning approach was applied to smart card data collected from the Shenzhen metro as a case study. The Hui-Zhan-Zhong-Xin (HZZX) station, Fu-Ming (FM) station and Gang-Xia (GX) station are the three most typical stations, which meet a large amount of travel demand in the Shenzhen metro system. Hence, the metro passenger flows used in this study were collected from these three stations and aggregated into 15-min time intervals for the HZZX, FM and GX subway stations between Oct. 14, 2013 and Nov. 30, 2013. For these stations, the service time of the metros is from 6:30 to 24:00. Because of the different passenger flow patterns between weekdays and weekends, the metro passenger flow data were divided into weekdays and weekends (Ke et al. 2017). The weekday and weekend data of the first two-thirds were selected as the in-sample dataset, and the remaining one-third of the data was selected as the out-of-sample dataset. Table 1 shows the descriptive statistics of the metro passenger flow data. This clearly indicates the difference in the statistical features among the datasets. For these metro stations, most of the metro passenger flow data have a sharp peak and a fatter tail，which means that the metro passenger flow data do not satisfy the normal distribution but satisfy the leptokurtic t distribution. The detailed data is not listed here but can be obtained from the authors. Additionally, to compare the forecasting performance of our proposed decomposition ensemble learning approach with several other benchmark models, two evaluation criteria, namely, the mean absolute percentage error (MAPE) and root mean square error (RMSE), were employed to evaluate the forecasting performance of the in-sample dataset and out-of-sample dataset: where ˆi y and i y denote the forecasted and actual metro passenger flow at time t, and N is the number of observation samples. MAPE and RMSE measure the deviation between the actual and forecasted values.
In this research, we use the h-step-ahead prediction horizons to assess the advantage of our proposed decomposition ensemble learning approach. t y , 1, 2, , ) (tn  L is a given time series, and the h-step-ahead prediction for ˆt h y  is as follows: where ˆt h y  is the h-step-ahead predicted value at time t, and l is the lag intervals for endogenous.

Empirical results
To confirm the superiority of our proposed decomposition ensemble learning approach, five models are built and used as benchmarks (i.e., three single models, including the seasonal autoregressive integrated moving averaging (SARIMA) model, multilayer perceptron (MLP) neural network, and long short-term memory (LSTM) network), and two decomposition ensemble learning approaches, including VMD-MLP and VMD-LSTM. The reasons for choosing these benchmarks are as follows: (1) The SARIMA model has a noticeable impact on metro passenger flow forecasting as one of the periodicals and seasonal models introduced in the econometrics literature and has shown its capacity in forecasting metro passenger flows (Smith et al. 2002). (2) The MLP and LSTM techniques are the most widely used neural networks in metro passenger flow forecasting, as introduced in Section 1. (3) The VMD-MLP and VMD-LSTM decomposition ensemble approaches verify the capability of adaptive modeling in our proposed approach.
The parameters of the SARIMA model are estimated by means of an automatic model selection algorithm implemented using the "forecast" program package in R software. For the MLP model, the partial mutual information method is used to determine the number of inputs (maximum embedding order d=24). Trial-and-error experiments are used to determine the number of hidden nodes (varying from 4 to 15), and the number of outputs is set to one. The logistic sigmoid function is selected as the activation function, and the backpropagation algorithm is employed to train the MLP. The MLP is carried out by the neural network toolbox in MATLAB 2017a software. Regarding the VMD algorithm, the optimal mode number is set to 3 using the difference between the center frequencies of the adjacent subseries, as the center frequency is closely related to the decomposition results of VMD (Dragomiretskiy and Zosso 2014). The VMD algorithm is implemented using the VMD package in MATLAB 2017a software. For the LSTM neural network, the partial mutual information method is used to determine the number of input nodes (maximum embedding order d=24). Trial-and-error experiments are used to determine the number of hidden nodes (varying from 4 to 25), and the number of hidden layers is set to one. The number of output nodes is set to one. The LSTM is carried out by the LSTM package in the MATLAB 2017a software.
Using the research design mentioned above, forecasting experiments were performed for metro passenger flow. Then, two accuracy measures were used to evaluate the forecasting performance of all the checked models.       The decomposition results of the weekday and weekend passenger flow series at the three metro stations using VMD are shown in Figs. 2-7. We note that each original passenger flow dataset is decomposed into periodic, deterministic and volatile components through the VMD algorithm. The periodic components of metro passenger flow series manifest as a one-day cycle. Additionally, the following measures are considered when analyzing each component, such as the mean period of each component, the correlation coefficient between the original passenger flow series and each component, and the variance percentage of each component. Table 2 presents the measures of each component for the weekday and weekend metro passenger flows at the three stations. The mean period under study is defined as the value obtained by dividing the total number of points by the peak number of each component because the amplitude and frequency of a component may change continuously with time and the period is not constant. The Pearson correlation coefficient is used to measure the correlations between the original passenger flow series and each component. However, because these components are independent of each other, it may be possible to use the variance percentage to explain the contribution of each component to the total volatility of the observed passenger flow series. The results of all six decompositions show that the dominant mode of the observed data is not volatility and deterministic parts but the periodic trend. For all the weekday metro passenger flow decompositions, the coefficients between the original passenger flow series and periodic component reach 0.86, 0.89 and 0.87 for the HZZX, FM and GX stations, respectively. However, for all the weekend metro passenger flow decompositions, the coefficients between the original passenger flow series and periodic component reach high levels of more than 0.97, 0.95 and 0.94 for the HZZX, FM and GX stations, respectively. Moreover, the variance of the periodic component accounts for more than 45% of the total volatility of the observed passenger flow data. The highest value is more than 86%.
After the decomposition, as discussed in Section 3.5, the SARIMA model is used to forecast the extracted periodic component, the LSTM neural network is employed to forecast the extracted deterministic component, and the MLP neural network is employed to forecast the extracted volatile component. Finally, using another MLP neural network to integrate the forecasting results of the periodic, deterministic and volatile components into an aggregated output.
The forecasting performance of the examined models (i.e., AdaEnsemble, VMD-LSTM, VMD-MLP, LSTM, MLP, and SARIMA) under study at the three stations across the ten forecasting horizons (h-step-ahead, i.e., h=1, 2, …, 10) for RMSE and MAPE are shown in Tables 3-8.      The results in the above tables show that our proposed decomposition ensemble approach is the best one for metro passenger flow forecasting among all forecasting horizons (h-step-ahead, i.e., h=1, 2, …, 10) for the three metro stations compared with the other five benchmarks under study. Being able to understand, compared with our proposed decomposition ensemble approach, the defect of LSTM and MLP is that they are both pure neural networks, which cannot directly model the periodic components. Therefore, to build a better forecaster, data preprocessing, such as time series decomposition, is critical and necessary, which is an essential step in our proposed decomposition ensemble approach.
Additionally, from the results of all examined models, the SARIMA model is all the worst forecasting model for each metro passenger flow in terms of forecasting accuracy and horizons. Being able to understand, SARIMA is a typical linear model, which cannot capture nonlinear components in metro passenger flows.
From the above analysis of the empirical results, several interesting findings can be drawn.
(2) In a comparison between VMD-LSTM (VMD-MLP) and LSTM (MLP), VMD-LSTM (VMD-MLP) is the winner. This means that mode decomposition of the metro passenger flow time series before forecasting is highly effective to enhance the forecasting power for metro passenger flow forecasting.
(3) Due to the highly nonlinear and periodic patterns in the metro passenger flow series, AI-based nonlinear models are more suitable than linear models to forecast time series with highly periodic volatility. (4) Our proposed decomposition ensemble approach is consistently the best compared with other benchmarks under study for metro passenger flow forecasting by means of statistical accuracy and forecasting horizons. (5) Our proposed decomposition ensemble approach can be considered a promising solution for forecasting time series with highly periodic volatility.

Conclusions
In this research, we present a novel adaptive decomposition ensemble learning approach to accurately forecast the volume of metro passenger flows. This approach decomposes the time series of metro passenger flows into periodic components, deterministic components and volatility components by variational mode decomposition (VMD). Then, we employ the SARIMA model to forecast the periodic component, the LSTM network to learn and forecast the deterministic component and the MLP network to forecast the volatility component. In the last stage, the diverse forecasted components are reconstructed by another MLP network.
Due to the highly nonlinear and periodic patterns in the metro passenger flow series, the advantage of our proposed approach is that it decomposes the original data into periodic components, deterministic components, and volatility components and then employs suitable methods to predict the characteristics of diverse components. Finally, the diverse forecasted components are reconstructed by an MLP network. The empirical results show that (1) mode decomposition of the metro passenger flow time series before further forecasting can effectively enhance the forecasting power for metro passenger flow forecasting; (2) the hybrid model with linear models and nonlinear models is more suitable to forecast time series with highly periodic volatility; and (3) our proposed decomposition ensemble learning approach has the best forecasting performance compared with the state-of-the-art models in terms of forecasting horizons and statistical accuracy.
The metro passenger flows are influenced by a lot of factors, such as special events, extreme weather conditions, and accidents. Our proposed decomposition ensemble learning approach is a univariate and hybrid model, and it is difficult to accurately capture the uncertainty in the metro passenger flow. In a future study, we will try to address these issues and improve prediction accuracy by employing new methods, new variables or an integrated forecasting framework.