ST-LSTM: A Deep Learning Approach Combined Spatio-Temporal Features for Short-Term Forecast in Rail Transit

the original


Introduction
With the development of urban scale, the short-term traffic forecast has become a core issue of ITS.Accurate shortterm traffic forecast can provide technical support for the surveillance and the forewarning of passenger flow.Therefore, over the past few decades, many data analysis models have been proposed to promote the forecast accuracy.Among these models, LSTM network is widely recognized as the most suitable model to deal with traffic forecast.LSTM unit has three gates, namely, input gate, forget gate, and output gate, which can adjust the state of unit dynamically, so LSTM network is able to capture the features on longer time span.Therefore, LSTM network can provide a higher accuracy in traffic forecast because traffic data is usually collected according to time series.
In recent years, researchers pay more attention to the spatial features of traffic flow.It is widely acknowledged that traffic forecast is a problem with spatio-temporal complexity, i.e., the problem of spatial transportation in temporal dimension.In [1], Zheng Zhao et al. establish a network by connecting several LSTM units, which aimed to imitate the structure of urban traffic.However, it failed to imitate the structure of large urban scale.Xiaobo Chen et al. proposed a new method to process spatial features by using sparse hybrid genetic algorithm [2].Liu Qingchao et al. proposed a model based on manifold similarity to capture the spatial regularity from freeway data [3].These two approaches are sensitive to the spatial features, but compared with LSTM network, they cannot process temporal features well.
In this paper, the object of study is the short-term forecast of rail transit.In the research, we find that differing from other transportation, rail transit has stations with fixed position, vehicles with uniform speed, and regular schedule.Because of these characteristics, the spatial correlation between stations can be transformed into the time cost.Based on this analysis, this paper proposes a new method to capture spatio-temporal features from rail transit data and input the features into a new model named spatio-temporal long short-term network (ST-LSTM), which is based on LSTM network.Compared with most existing methods, the proposed model has a better performance on accuracy and meets the real-time requirement.

Related Work
There are many methods that have been proposed to improve traffic forecast, including historical average and smoothing [4,5], dynamic linear methods [6,7], traffic theory-based methods [8,9], and machine learning methods [10,11].These forecast approaches can be divided into two categories, namely, parametric approaches and nonparametric approaches.Autoregressive integrated moving average (ARIMA) model is widely recognized as a classic method in parametric approaches.As early as the 1970s, Levin and Tsao found that ARIMA model was the most statistically significant in traffic forecast [12].Parametric approaches have favorable properties and capture regular variations very well.However, traffic data usually shows irregular variations.To solve this problem, researchers also paid attention to nonparametric approaches, such as nonparametric regression models [13], support vector machine (SVM) [14,15], and recurrent neural network [16,17].Afterwards, recurrent neural network [18] was proposed to process temporal features, such as evolutionary neural network (ENN) [19], dynamic neural network (DNN) [20], and nonlinear autoregressive models with exogenous inputs (NARX) [21].Among them, RNN is widely recognized as a suitable method to capture the temporal features of passenger flow.However, previous studies proved that RNNs failed to capture the long-term features because of vanishing gradient and exploding gradient.To solve these problems, long short-term memory neural network (LSTM NN) [22] was applied in the traffic forecast.In recent years, some approaches have been proposed to deal with the spatio-temporal complexity of traffic data, which are mentioned in Section 1.
Different from these methods, this paper proposes a new method to capture spatio-temporal features and a new network based on LSTM to forecast the exit passenger flow of rail transit.The remainder of this paper is as follows.Section 3 introduces the architecture of ST-LSTM network.Experiments based on the data of Chongqing rail transit are shown in Section 4. Section 5 is composed of the analysis of experiment result, and future work is at the end of this paper.

Methodology
Short-term forecast for rail transit is a problem with spatiotemporal complexity.Suppose the exit passenger flow of station  is needed to be predicted.The temporal features are the correlation between historical data and current data, i.e., the previous exit passenger flow of station .These features can be extracted directly because the rail transit data is collected according to the temporal dimension.The spatial features are the transportation of passenger flow on geographic position; i.e., the summation of estimated passenger flows from the other stations.For every two stations, the spatial features include the volume and the cost of transportation between them.The volume of transportation is reflected by the passenger flow between two stations.In the proposed model, spatial correlation matrix (SCM) is integrated to calculate the volume of transportation.The cost of transportation is reflected by several factors, such as time cost, economic cost, and distance.Among them, time cost is the most suitable factor to reflect the spatial correlation, which is mentioned in Section 1.Therefore, time cost matrix (TCM) is introduced to calculate the cost of transportation.The proposed model is based on the technologies, including passenger information system (PIS), features extraction method, and ST-LSTM network.The detail of the technologies will be explained in this section.
. .Passenger Information System.Sufficient data is the basis of accurate forecast, and PIS can provide us with comprehensive data.PIS is a huge and complex network.Various rail transit data can be collected in real time through the gate system, ticketing system, and vehicle scheduling system.With the development of data acquisition technology, PIS is able to provide sufficient support for short-term forecast.Based on the card records, the entrance and exit passenger flow of stations are calculated with a frequency of 10 min, which can be denoted by where  is a card record., , , , , and  are the attributes of , which represent card identification, origin time, origin station, destination time, destination station, and date, respectively. is the database of card records.  , is the entrance passenger flow of station  in time , and   , is the exit passenger flow of station  in time .
. .Feature Extraction Method.The extraction of spatiotemporal features is one of the core problems of the proposed model.The proposed model extracts temporal features and spatial features, respectively, and then put, them together into the ST-LSTM network.The temporal features can be extracted directly, because rail transit data is recorded according to the temporal dimension.To extract the spatial features, TCM matrix and SCM matrix are integrated into the method.
. . .Time Cost Matrix.Time cost is the most suitable factor to reflect the spatial correlation between stations, so the time cost between all stations constitutes the TCM matrix.Due to the changes of schedule and passenger flow, TCM matrix is dynamic with time going on.Suppose there are  stations in the rail transit system; then the size of TCM matrix is  × , which can be denoted by where   is the TCM matrix in time .û , is the average of time cost between  and  in historical time series, where  is origin station and  is destination station.û  is the time cost between two stations in record .  is a card record in database , which has been defined in Eq.(1). * is the historical time series of time . is a week and  is the number of weeks.
In the analysis, it is found that passenger flow varies by people's routine cycle, i.e., from Monday to Sunday.For example, the passenger flow in Thursday is similar to the one in last Thursday not yesterday.Therefore, to promote the extraction, û , in time  represents the average time cost in historical time series of time .In Eq. ( 6),  * is the historical time series, which consists of time  and the same period in several weeks ago of it.This method is also used in the calculation of spatial correlation matrix.

. . . Spatial Correlation Matrix.
To forecast the exit passenger flow at station  in time , passengers from station  in time  − û , have to be considered.The entrance passenger flow of station  in time  − û , (  ,−û , ) is available.However, time  has not happened, so the proportion of passengers in   ,−û , , which set off to station , is unavailable.To solve this contradiction, spatial factor is introduced in this paper.Spatial factor  , is the historical average probability of passengers between station  and  in entrance passenger flow.When forecasting the exit passenger flow at station  in time , the spatial influence from station  can be calculated by multiplying the spatial factor  , and entrance passenger flow   ,−û , .The spatial factors between all stations constitute the spatial correlation matrix.There is an SCM matrix in each time, because the factors vary according to the time.Suppose there are  stations in the rail transit system; then the size of SCM matrix is  × , which is denoted by where   , is the temporal features of station  in time .  ,−1 is the exit passenger flow of station  in time  − 1.   , is the spatial features of station  in time . is the set of stations and  is a station in it. , is the spatial factor, which has been defined in Eq. ( 9).  ,−û , is the entrance passenger flow of station  in time  − û , .
. .Structure of ST-LSTM Network.Based on LSTM network, a fully connected layer is added to combine temporal features and spatial features in ST-LSTM network.The model will acquire the best mode of combination through the training.
The structure of ST-LSTM network is shown in Figure 2. The input of the model is spatio-temporal features   , and   , , and the output is the forecast result   .There are four layers in this model, namely, fully connected layer, input layer, hidden layer, and output layer.The fully connected layer combines the features at first and conveys the result   to the input layer.The input of hidden layer   is calculated through the input layer.The hidden layer has three gates, namely, input gate   , forget gate   , and output gate   .Moreover, the state of the hidden layer is indicated by   .The inputs of every gate are   and the previous state  −1 .The blue points in Figure 2 are confluences, which stand for multiplications, and dashed lines are the transmitting of the previous state.Based on the information flow, the structure of ST-LSTM network can be summarized as ) where  ) ,  () ,  () ,  () ,  () ,  () ,  () , and  ()   Step .Obtaining the Inputs and Labels.Capture the temporal features and the spatial features in each time , which are the input of model.Collect the exit passenger flow in each time  + 1 as the labels.
Step .Fine-tuning the Whole Network.Fine-tune the whole network by adjusting the weight matrices and bias vectors in order to minimize the output of cost function.The process will be stopped until the output meets the qualification or the time of training reaches the limit.

Experiment
Based on the data of Chongqing rail transit, four models are contained in the experiment, namely Seasonal ARIMA (SARIMA) [23], Support Vector Regression Model combined with Particle Swarm Optimization (PSO-SVR) [24], LSTM network [25], and the proposed ST-LSTM network.The target of forecast is exit passenger flow with a frequency of 10 min.
The four models will be trained and tested on 100 stations.
The details of each model are as follows.
(2) PSO-SVR: The time period is 100 (100×10 min per day), and the limit of parameter combination of SVR is from [10, 0.08] to [500, 0.3].The final parameter combination will be selected by PSO in the training.
(3) LSTM network: The number of units is 10 and the time step is 100 (100×10 min per day).
(4) ST-LSTM network: The number of units is 10 and the time step is 100 (100×10 min per day). .
where φ is the forecast data, while   is the measured data.
. .Training and Testing.5-fold cross validation is used to evaluate the models.In 5-fold cross validation, the data is divided into 5 subsets.Each subset is a testing set, and the rest of data is the training set.The experiments are repeated 5 times for each station.After the experiments, the performance of four models was collected.The experiments are conducted under a desktop computer with Intel i7 3.20 GHZ CPU and 16 GB memory.
. .Experiment Result.The experiment results of different algorithms are shown in Table 1 and the operation time is averaged on all stations.Compared with SARIMA, PSO-SVR, and LSTM network, the proposed ST-LSTM network achieved a better performance.From the view of ME and MAE, ST-LSTM network is more accurate than the other models.Moreover, from the view of RMSE, ST-LSTM network has a better stability.Therefore, the proposed ST-LSTM network is more suitable for the short-term forecast of rail transit.

Analysis of Result
When the models have been tested on 100 stations of Chongqing rail transit, we find that ST-LSTM network achieves a higher accuracy than the other models.However, the performance of ST-LSTM network fluctuates on different stations, which are shown in Table 2. Therefore, we analyze the experimental results based on the field investigation.The stations in Table 2 are sorted in descending order by passenger volume.Due to the lack of space, Table 2 just exhibits the performance on stations of top-10 and bottom-10 passenger volume.
. .Base Volume.In our research, we find that base volume is one of the influence factors of the forecast.The performance of two stations is chosen to shown in Figure 3, which are station No.321 and station No. 334 of Chongqing rail transit.Both of them are located in the residential district of Chongqing.However, station No. 334 only attains 5% of the base volume of station No.321 monthly.In the test, the MRE on station No.321 is 13.52%, while the MRE on station No.334 is 26.58%.We use these two different performances as samples to show the influence of base volume.The research suggests that the stations with higher base volume usually have more prominent regional features.As a result, passenger flows of these stations have stronger regularity and are more insensitive to the emergent factors.Therefore, short-term forecast on stations with low base volume is one of the difficulties in rail transit forecast. . .Randomness.Except for the base volume, we discover that randomness is another influence factor of the forecast.As shown in Figure 4, the base volume of station No. 323 and station No.123 are both around 900 thousand monthly.However, the performance of forecast on two stations is quite different.In the test, the MRE on station No.323 is 16.73%, while the MRE on station No.123 is 38.37%.This phenomenon occurs on a few special stations, such as station No.123, which is located in the university town of Chongqing.Compared with the commuters, the undergraduates have more choice on the travel time.So the passenger flow of stations, which next to universities, has stronger randomness than others.Similarly, the passenger flow of stations, which next to railway stations or airports, is related to the flight schedule.Therefore, the randomness from the environment cannot be neglected on several stations.Short-term forecast on these stations is one of the difficulties in rail transit forecast.

Conclusion and Future Work
Short-term forecast for rail transit is an essential issue in ITS.We propose the ST-LSTM network, which combines the temporal features and spatial features.To extract spatial features, TCM matrix and SCM matrix are integrated into the method.Compared with other models, the proposed model is more suitable for rail transit forecast.
This study researches on prediction of exit passenger flow, but a model which also includes entrance passenger flow is more significant for the management.In addition, except the rail transit, ITS also contains bus system and taxi system.The  correlation between different public transportation is worth consideration.In the future, we will try to forecast other targets of rail transit and then consider the relation among different transportation.Finally, a comprehensive system for rail transit will be built to output a more accurate result of short-term forecast.

Figure 1 :
Figure 1: Design of the extraction of spatio-temporal features.
x,, is the forecast of station  in time  and  ,, is the actual output. . .Training Algorithm.The training algorithm contains two aspects.One is the extraction of spatio-temporal features, and the other is the training of ST-LSTM network.The key point of training is minimizing the output of cost function by adjusting the weight matrices and bias vectors.The training procedure can be stated as follows.

Figure 3 :
Figure 3: Influence of base volume in forecast.

Figure 4 :
Figure 4: Influence of randomness in forecast.
1  2,2 ...2, . . ...., ...,1  ,2 ...,  is the SCM matrix of time ., is the spatial factor, where  is origin station and  is destination station.û, is the time cost from  to .,,−û, is the number of passengers from  to , whose origin time is  − û , .,−û , is the entrance passenger flow of station  in time  − û , .* , , and  have been defined in Eq. (6).The structure of extraction method is shown in Figure1.To forecast the exit passenger flow at station  in time , the temporal features is the exit passenger flow at station  in time  − 1.The spatial features are gathered by calculating the number of passengers, ) (7)  ,,−û , = ∑ ∈ {1 | . =  − û , , . = , . = } . . .Extraction of Spatio-Temporal Features.who will arrive at station  in time  and depart from other stations.The function of extraction can be set as   , =   ,−1 (11)   , = ∑ ∈  , •  ,−û , ,   ,   , and   are the output of different layers.  ,   ,   , and   are the intermediate variables of the hidden layer.  is the state of the hidden layer.  ,   ,  ℎ ,  ℎℎ ,  ℎ , are weight matrices.,  ℎ , and   are bias vectors and  is sigmoid function.The cost function is activated after the forecast through the training.The proposed model is improved by reducing the output of cost function, which can be set as

Table 1 :
Performance of four models.
. Data Description.The data of card records are provided by Chongqing City Transportation Development & Investment Group Co., Ltd.Compared with other targets, such as Origin-Destination (OD) volume, exit passenger flow is more accurate and has less missing data.Therefore, we calculate the exit passenger flow from 01 March 2017 to 31 March 2017 based on the dataset.There are more than 46 million card records.After processing, 600 thousand data are calculated.. .Evaluation.We use several criteria to compare the performance of four models.Maximum error (ME) and mean absolute error (MAE) are used to measure the accuracy of models.Root mean square error (RMSE) is sensitive to the stability of models.Mean relative error (MRE) is the most suitable to compare the performance of four models.