Analysis of Bus Trip Characteristics and Demand Forecasting Based on NARX Neural Network Model

In recent years, there has been increased interest in the use of bus IC card data to analyze bus transit time characteristics, and the prediction is no longer confined to rail traffic passenger flow prediction and traditional traffic flow prediction. Research on passenger flow forecast for the bus IC card has been increasing year by year. Based on the bus IC card data of Qingdao City, this paper first analyzes the characteristics of one-day passenger flow and passenger flow during subperiods and conducts a separate study on the characteristics of the elderly. +e results show that the travel of the elderly is also affected by the weekday and the weekend. +en, based on the ARIMA model and the NARX neural network model, the passenger flow forecasting (10-minute interval) is carried out using the IC card data of No. 1 bus for 5 weekdays. +e prediction results show that the NARX neural network model is effective in the short-term prediction of bus passenger flow, and especially, it is more accurate in the peak hour and large-scale data prediction.


Introduction
Automated fare collection (AFC) system, also known as the transit smart card (SC) system, has gained more and more popularity among transit agencies worldwide.Compared with the conventional manual fare collection system, an AFC system has its inherent advantages in low labor cost and high efficiency for fare collection and transaction data archival.Although it is possible to collect highly valuable data from transit SC transactions, substantial efforts and methodologies are needed for extracting such data because most AFC systems are not initially designed for data collection [1].People are also paying more and more attention to how to use these data to dig deeper into the characteristics of bus passengers and provide support for urban public traffic management and planning.At present, research on bus IC card data can be broadly divided into three categories: (1) analysis of characteristics of bus passenger flow, (2) evaluation of bus operating efficiency, and (3) forecast of bus travel demand.
In terms of the analysis of the bus passenger flow characteristics, Morency et al. used smart card data from Ottawa, Canada, to study the travel characteristics of different card types in time and space [2].Páez et al. combined AFC data with personal passenger information and used cluster analysis to obtain passengers' spatial travel characteristics [3].Briand et al. used a Gaussian mixture generation model to cluster the travel time characteristics of passengers and revealed the travel patterns of different groups [4].Li and Deng adopted the method based on the travel chain to obtain the passengers' drop-off point and classify the bus passengers' travel according to the passengers' travel time and space characteristics [5].He et al. took Guangzhou residents as the research object and proposed a cyclic analysis method based on mode division [6].
On the evaluation of bus operation efficiency, Ma developed a data-driven platform for online transit performance monitoring by using the smart card and GPS data to monitor transit service quality [7].
In the aspect of bus passenger flow forecasting, Yang et al. extracted data from the IC card database of Dalian public transport, used the Fisher clustering algorithm to classify the bus peak intervals to establish a regression equation for predicting passenger flow under different peak conditions, and realized the forecast of total passenger flow in different peak intervals, but the forecast period is longer [8].Liu et al. used the number of passengers getting off and on at Bus No. 8 in Harbin City to present a prediction model for the number of people entering and leaving the bus station based on the improved BP neural network, but the forecasting time interval was longer [9].Tsai et al. used multiple temporal units neural network (MTUNN) and parallel ensemble neural network (PENN) to predict the short-term subway passenger flow based on sales data from Kaohsiung to Taipei of a specific train (Train No. 1008) over four years, which had higher prediction accuracy than the traditional multilayer perceptron model (MLP) [10].Wei and Chen used a hybrid EMD-BPN forecasting approach by combining empirical mode decomposition (EMD) and backpropagation neural networks (BPN) in the short-term passenger flow forecasting of the metro systems.
e prediction results showed that this method performs well and was stable in predicting short-term subway passenger flow [11].Jiang et al. used dynamic regression neural network (Elman) and BP neural network methods to predict the bus passenger flow in Hefei City.Experiments showed that the dynamic regression neural network (Elman) method had higher prediction accuracy [12].Yang et al. used the time-series model and the fuzzy neural network model to predict the short-term bus passenger flow.Because of the nonstationary bus passenger flow, it failed to achieve good prediction result [13].Zhang et al. proposed the Kalman filter as a short-term passenger flow forecasting model for public transit stations and presented the solution process of the model [14].
In summary, the current analysis of bus trip characteristics is mainly based on the long-term forecast of one-day passenger flow and one-week passenger flow.However, there are few studies on the short-term prediction of passenger flow during subperiods, and in addition, the lack of exploration of the elderly travel characteristics has led to the neglect of this important group in passenger flow forecasting.In terms of passenger flow prediction methods, with the gradual maturity of the neural network and other deep learning forecasting technologies, it has become the most mainstream traffic prediction method at present.erefore, in this paper, the preprocessing and cluster analysis of bus IC card data are used to explore the bus trip characteristics of passenger flow during subperiods, and the bus travel characteristics of the elderly are mostly analyzed.
en, the NARX neural network model is used to forecast the bus passenger flow during subperiods, conducting to provide basic data for the real-time scheduling and management of bus operations.e remainder of this paper is organized as follows.Section 2 introduces the data structure of the bus IC card in Qingdao and preprocesses the original data.In Section 3, it analyzes the characteristics of passenger flow in a single day and passenger flow during subperiods and mostly analyzes the bus trip characteristics of the elderly.Section 4 introduces the construction of the bus traffic forecast model, ARIMA model, and NARX neural network model and analyzes and evaluates the example of the bus IC card data of 5 weekdays in Qingdao City.In Section 5, we summarize the research results of this paper and look forward to future research.1.

Data Preprocessing.
In order to guarantee the quality of data analysis, it is necessary to preprocess the bus IC card data, extract fields that have a significant impact on the data analysis, and filter some invalid data which will adversely affect the results of data analysis.is article will clean up and filter Qingdao bus IC card data based on Microsoft SQL Server 2017, to deal with the wrong data, redundant data, and other issues in the original data and improve data mining efficiency and quality.Combining the research content of this article and the original data of Qingdao bus IC card, this data preprocessing is mainly divided into data cleaning, data transformation, and data reduction.

Data Cleaning.
Delete the duplicate data and error data records that exist in the bus IC card data.For the same card number of the bus IC card data, when the transaction date and the transaction time are the same, these data are duplicate data, and the deletion operation is performed on such data; when there is a null value in the card number, transaction time, transaction date, etc., these data are missing data which need be deleted; through the screening we found that there exist some mistakes in bus IC card data, such as "20000"; that is, swipe card time is 2 am, which is obviously not practical.Deletion processing is required for such erroneous data.

Data Transformation.
In order to meet our data analysis needs, we need to transform the original data.Specifically, it is divided into two parts: the characteristics of the passenger flow and the forecast of bus passenger flow.For the feature analysis part of the passenger flow, we will group the data after cleaning into 1-hour intervals and at the same time extract the elderly bus IC card data and group the data into 1-hour intervals in Section 3.For the bus passenger flow forecasting, we will group the data from 5:00 to 23:00 on weekdays into 10-minute intervals to serve as the data basis for the demand forecast study.

Data Reduction.
e original data obtained from the IC card of Qingdao bus includes the fields shown in Table 1.Some of the fields are useless for the analysis of the passenger flow characteristics and passenger flow forecasts, such as the unit number, unit name, and posttransaction card balance.
ese data take up storage space and reduce the speed of data filtering.For this reason, we remove these fields.

Analysis of Bus Passenger
Flow Characteristics is section will analyze the bus IC card data from September 4th to September 10th, 2017 (Monday to Sunday), including mainly analyzing the travel time characteristics and different types of bus IC card travel characteristics.

One-Day Passenger Flow.
e total number of the bus IC card on a single day from September 4th to September 10th is shown in Table 2.
Figure 1 shows that there is a significant difference in the number of passengers on the weekend and weekdays, and the amount of bus IC cards on the weekend is significantly lower than that on weekdays.From Monday to ursday, the number of bus IC cards remains steady at 1.54 million times.e passenger flow on Friday increases by 1.6 million times over the previous four days.e passenger flow on Saturday and Sunday decreased continuously.ere are two reasons for this: (1) e majority of the passengers are students and commute passengers to work.e travel of these two groups has obvious periodicity and is affected by the weekend.(2) On Saturdays and Sundays, the travelers will choose to go out for entertainment and shopping activities.However, due to the reduction in the number of students and the number of commute passengers, the overall number of times of swiping the card is on a downward trend.erefore, the results obtained through statistical analysis of a large number of data are consistent with the actual results.

Time Interval Passenger Flow.
Bus passenger flow has very obvious time-varying characteristics.ere are obvious morning and evening peaks.Some small cities will also have afternoon peaks.rough the analysis of bus IC card data, we can grasp the time distribution characteristics of bus passenger flow, make reasonable arrangements of the frequency of shifts, and improve the operating efficiency of the entire bus system. is section will conduct a statistical analysis of the data from 05:00 to 23:00 at one-hour intervals, based on the bus IC card data from September 4th to September 10th.e following conclusions can be obtained by comparing the time periods and the daily passenger flow: (1) e passenger flow of the bus IC card in Qingdao has obvious peaks in the morning and evening.e intensity of the evening peak is weaker than that of the morning peak, but the duration is longer.
(2) e start of the weekend morning peaks was postponed, and the duration of the evening peaks was shortened.e total number of passengers during the weekend was significantly lower than weekdays.(3) Figure 3 intuitively shows that there are obvious differences between the travel characteristics of the weekdays and the weekend.e number of card swiping on September 9th and September 10th is less fluctuating, and the peak amount of card swiping is an obvious decline compared to the weekdays.e above conclusions are consistent with the actual travel situation.Travel on the weekend is relatively decentralized, and on weekdays, it is relatively concentrated.e morning peak is late on the weekend, and the evening peak end earlier.

Analysis of the Travel Characteristics of Seniors.
With the ageing of population becoming increasingly prominent, the issue of seniors' travel has received extensive attention.In terms of bus travel, the city of Qingdao started to implement a free travel policy for people aged 65 and older in 2014.In this context, the seniors' bus travel demand has increased significantly, and bus travel has gradually become the primary choice for seniors to travel. is section will study the travel patterns of the elderly, based on the data from the bus IC card in Qingdao, with the purpose of avoiding the disadvantages of inaccurate and incomplete data of traditional survey and statistical methods, providing decision support for urban public transport planning, and improving the quality of travel for the seniors.
is section will study the travel patterns of the seniors from the amount of the bus IC card and time distribution characteristics.
Figure 4 shows that the travel of the seniors has the following characteristics: (1) e number of trips over the weekend is less than that of the weekdays.According to the statistics of the seniors' card in one week, it can be found that the amount of swiping the bus IC card on Friday is the largest, followed by Wednesday, and the least on Sunday.e traditional experience is that the travel of the seniors is affected little by the weekdays and weekend.However, actual statistics show that the amount of seniors' bus travel is also affected by the weekdays and weekend and is obvious.(3) Seniors travel later in the morning peak and travel earlier in the evening peak.e previous analysis found that the morning peak and evening peak of other passengers except the seniors are 7:00 to 8:00 and 17:00 to 18:00, respectively.rough the analysis of seniors card data, it is found that the morning peak is between 8:00 and 9:00 and the evening peak is 16:00 to 17:00. is shows that the seniors will avoid the morning and evening peaks in the choice of travel time. is period of time is conducive to the elderly travel, avoiding the crowded hours of work and commuting.(4) e change trend of seniors' travel volume is relatively flat, the gap between the peak and off-peak travel volume is small, and the slope of the passenger flow changes after the evening peak of Figures 2 and 4 shows that the number of seniors reduced faster than other types of passengers after the evening peak.

ARIMA Model.
e ARIMA (Autoregressive Integrated Moving Average) model is a time-series model used for short-term forecasting.In general, the ARIMA model is given in the following equation: where Φ(L) is the autoregressive coefficient polynomial for the stationary reversible ARMA (p, q) model, Θ(L) is the moving smoothing coefficient polynomial for the stationary reversible ARMA (p, q) model, R t is time-series data, L is the backward shift operator, d is the differencing order, and e t is white noise series.
e essence of the ARIMA model is the combination of the differential operation and ARMA model, with high short-term prediction accuracy.

Stationarity Test.
e passenger flow trend of a workday is shown in Figure 5.It can be seen that passenger flow data are not stable.
Figure 6 is an autocorrelation diagram of the original data, from which it can be seen that the data have the characteristics of a typical nonstationary sequence, which is smoothed by the differential method for further prediction.7-9 show that the sequence has a steady trend.As can be seen from the autocorrelation diagram and the partial autocorrelation diagram, the time-series presents a trailing situation.e ADF unit root test is performed on the time series, and the p value is 0.0069, which is less than the significant level value 0.05, so the sequence after the first-order difference is a stationary time series.

Model Establishment.
According to the AIC criterion, the ARIMA model is fixed, and the AIC value is calculated.When the AIC value reaches the minimum, the ARIMA (p, d, q) model has a p value of 4 and a q value of 5. e sequence has been subjected to first-order difference.e operation is therefore using the ARIMA (4, 1, 5) model.

Model Prediction.
After the model is established, we use it to predict the passenger flow in the working day, and the forecast results are shown in Figure 10.It can be seen that there is a large fluctuation in some time periods, and the actual value has a large error with the predicted value, so the prediction effect of the ARIMA model on the nonlinear data needs to be improved.

NARX Neural Network Model. NARX neural network (Nonlinear Autoregressive with External (Exogenous)
Input) is a kind of the dynamic neural network.Its output value can be input into the model again through feedback as an input value, so that the sensitivity of the model to historical data is improved and can better reflect dynamic features of the passenger flow and improve the prediction accuracy of neural networks.
e NARX neural network structure is shown in Figure 11.
e expression of the NARX neural network model is where x(t) represents the input of the neural network, y(t) represents the output of the neural network, f represents the nonlinear ambiguity function, and d represents the feedback delay.

Determination of NARX Neural Network Parameters
(1) Determination of the number of input layer nodes: when using the neural network model for prediction research, we first need to determine the number of nodes in the input layer.In general, the number of nodes in the input layer is determined by the input of the dynamic system equations.If the dynamic system equations are not clear, step-bystep testing can be used to determine the number of input layer nodes in the network.We select the times of the swiping card as the input value and then adjust it one by one to determine the number of input layer nodes to be 4. ( 2) Determination of the number of hidden layer neurons and hidden layers: the number of neurons in the hidden layer is extremely important for the performance of the network.When the number of neurons in the hidden layer is small, the characteristics of the data cannot be well simulated.Too many neurons will increase the training time of the network, and excessive training will occur.e function of the hidden layer is to extract features from the input data.
e appropriate number of hidden layers can make the neural network have better data processing capabilities.Excessive number of hidden layers will increase the training error and lengthen the network training time.We collected data from the bus IC card for 5 weekdays and counted the times of the swiping card at intervals of 10 minutes.According to the data size of this study, we determined that the network with one hidden layer was used for training.e final network structure is the number of input layer nodes is 4, the number of hidden layer neurons is 22, the number of hidden layers is 1, and the delay number is 2. e network structure is shown in Figure 12.

Training Method Selection.
e number of research samples, the number of hidden layers, and the number of hidden layer nodes all play an important role in the convergence of the training algorithm.Based on the data size of this study, we compare the effects of two different training algorithms and determine the LM (Levenberg-Marquardt) training algorithm as final results.
(1) SCG (Scaled Conjugate Gradient) Algorithm.e SCG algorithm is an improved conjugate gradient algorithm.It does not need to calculate the Hessian matrix and changes the linear search mode of the conjugate gradient algorithm when calculating the search step length.It does not need a linear search to determine the optimal search path, and its convergence speed is faster.e conjugate gradient method improves the traditional gradient descent method, which can improve the network oscillation and improve the convergence speed of the network.Based on the data from this study, we used the SCG algorithm to train the network.
e training effect is shown in Figure 13.
As shown in Figure 13, the errors of training, validation, and testing tend to be stable after the 45th generation, which reflects that the network convergence speed is fast and the network performance is good.
e LM algorithm does not need to calculate the Hessian matrix during the correction rate.When the error performance function has the form of the sum of squared errors, the Hessian matrix can be approximated as where H is the Jacobian matrix containing the first derivative of the network error function to weights and thresholds.e LM training algorithm is modified as follows: where μ is the adjustment coefficient and e is the error vector.When μ is close to or equal to 0, the LM algorithm is the Newton method.When the value of the coefficient μ is large, the LM algorithm becomes the gradient descent method with smaller steps.Since the Jacobian matrix is easier to compute than the Hessian matrix, the training speed of the LM algorithm is very fast.e LM algorithm training effect is shown in Figure 14.6 Journal of Electrical and Computer Engineering e error of the LM algorithm validation and test tends to be stable after the 4th generation, and the training speed is greatly improved compared with the SCG algorithm.In order to more reasonably judge the performance of the training algorithm, we compared the two algorithms by the mean squared error (MSE).
e magnitude of the mean square error reflects the discrete distribution of the error.When the mean square error is large, it indicates that the dispersion of the error distribution is high and the prediction effect is poor.e formula for calculating the mean square error is as follows: where n represents the number of time intervals, y(t) represents the actual value, and y ′ (t) represents the predicted value.By comparison, we find that the mean square error of the LM algorithm is 564.32, and the mean square error of the SCG algorithm is 788.66.erefore, the LM algorithm is finally used to train the neural network and predict with the actual data. is section uses the bus IC card data from September 4th to September 8th of the No. 1 bus in Qingdao as the database for the prediction of the NARX neural network.We have selected data from 5:00 to 23:00 and divide the data into 10minute intervals.
e input layer node of the NARX neural network is set to be 4 because we use the September 4th to September 7th bus IC card data as input data, it is a matrix with 4 columns, and the number of hidden layers is 1.In the process of neural network training, the training effect is judged by observing the error autocorrelation function and the input-output correlation function and R value after each training, the number of hidden layer neurons in the network is adjusted, and the number of hidden layer neurons is determined 22 and the delay number is 2 by using the LM (Levenberg-Marquardt) algorithm that trains the network.70% of all sample data are selected as training data, 15% as validation data, and finally 15% as test data, until network training is effective.
After each training, we need to judge the training effect based on the error autocorrelation function and the inputoutput correlation function.
e error autocorrelation function reflects the correlation between each data.In the ideal state, the model has zero delay only when it is in the zero state; that is, it is completely irrelevant.In general, the error autocorrelation function falls within the confidence interval, indicating that the network training effect is good and can be predicted.e input-output correlation function in Figure 15 and the error autocorrelation function in Figure 16 show error within the confidence interval (in the red line range), indicating that the training results are good and can be predicted.

Prediction Process and Accuracy Evaluation.
After the NARX neural network model was trained, the passenger flow on a single weekday (in 10-minute intervals) was predicted.We used Matlab to output the prediction effect error (Figure 17) and the fitting effect (Figure 18) and generated a comparison chart of the actual value and the predicted value, and the relative error between the actual value and the predicted value was calculated.
rough the distribution of the error line (solid yellow line) in Figure 17, we can see that there is a large error in only a few time periods, indicating that the model has a good effect on passenger flow forecasting.Figure 18 shows that the training data have an R value of 0.99803, the validation data have an R value of 0.91471, the test data have an R value of 0.90578, and the overall data have an R value of 0.9691.is also shows that the NARX neural network has a good effect on the time-division forecasting of bus passenger flow.Figure 19 more intuitively shows that the model's performance in bus passenger flow forecasting meets the prediction requirements.

Error Analysis.
After the forecasting process is over, the relative error E r between the predicted value and the actual value is used to evaluate the forecasting effect.Its expression is as follows: where y(t) represents the actual value and y ′ (t) represents the predicted value.e error distribution curve is shown in Figure 20.We can see that the error during the peak hour is below 0.2, and there is a large error during the off-peak hours.e maximum relative error reaches 0.35.e reason for the analysis is that due to the large and concentrated passenger flow during peak hours, less passenger flow during off-peak hours, and relatively dispersed passenger flow, the NARX neural network model performs well in the extraction of data features during peak hours, achieving high accuracy.Due to the poor regularity of passenger flow during the offpeak hours, the forecasting model's ability to extract data features is lower than that during peak hours, and the prediction accuracy during off-peak hours remains to be improved.

Nonparametric Tests.
We use the Wilcoxon signedrank test to test the difference between the actual and prediction values.Table 3 shows that there is no significant difference between the actual value and the prediction value.e probability is 0.941 (the significance value in the table).Compared with the significance level of 0.05, 0.034 is large enough to be a high probability event; that is, the probability that there is no significant difference between the true value and prediction value is 0.941, preserving the null hypothesis.ere is no significant difference between the real value and the predicted value, which is statistically significant.
Figure 21 shows the more detailed statistical results of this analysis, and the conclusion is unchanged.

Conclusions
Based on the IC card data of Qingdao bus, this paper first studies the distribution characteristics and regularity of bus passenger flow in time, analyzes it from a single day passenger flow and subperiod passenger flow, and compares the different weekdays and weekend features.In addition, seniors are individually analyzed and their characteristics in bus travel were obtained.In the aspect of bus passenger flow forecasting, the ARIMA model and NARX neural network prediction model were used according to the characteristics of the IC card data and the forecasting demand.e prediction results were compared with the actual values and found that the NARX neural network prediction model had good prediction accuracy and achieved expected results.In the future research, we also study the characteristics of time-division trips for multiple groups.At the same time, we noticed that Qingdao implemented different fare policies

Figures 2 and 3
are time-varying diagrams of passenger flow and passenger flow box plot of the Qingdao bus IC card in one-hour intervals.

Figure 5 :Figure 6 :Figure 7 :
Figure 5: Passenger flow in different periods of a workday.

Figure 8 :Figure 9 :
Figure 8: Autocorrelation diagram after the first-order differential processing.

Table 1 :
Data structure of Qingdao bus IC card system.

Table 2 :
Total number of the bus IC card on a single day from September 4th to September 10th.