A Hybridly Optimized LSTM-Based Data Flow Prediction Model for Dependable Online Ticketing

Fifth-generation (5G) communication technologies and artificial intelligence enable the design and deployment of sophisticated solutions for enhanced user experience and superior network-based service delivery. However, the performance of the systems offering 5G-based services depends on various factors. In this paper, we consider the case of the online railway ticketing system in China that serves the needs of hundreds of millions of people daily. This system’s online access rates vary over time, and fluctuations are experienced, affecting its overall dependability and service quality. We use long short-term memory network, particle swarm optimization, and differential evolution to construct DP-LSTM—a hybridly optimized model to predict network flow for dependable and quality-enhanced service delivery. We evaluate the proposed model using real data collected over six months from the “12306 online ticketing” system. We compare the performance of the proposed model with mainstream network traffic prediction models. We use mean absolute percentage error, mean absolute error, and root mean square error for performance evaluation. Experimental results show the superiority of the proposed model.


Introduction
Fifth-generation (5G) communication technologies and artificial intelligence (AI) enable the design and deployment of sophisticated solutions for enhanced user experience and superior mobile service delivery meeting diverse critical requirements [1][2][3][4]. However, the performance of 5G-based services is dependent on many factors [5,6]. 5G-based mobile services face challenges such as transferring of high data rates, rapid response requirements, dynamic coupling and decoupling of new devices, and their remote configuration [7]. Consequently, the overall data traffic associated with the service delivery systems grows tremendously [8]. Likewise, while 5G infrastructures support high data throughput and AI-based knowledge driven data layers try to address corresponding resource allocation and optimization challenges, Internet is a limited resource and new situations can affect its performance (https://www.nytimes.com/2020/03/ 26/business/coronavirus-internet-traffic-speed.html), which can lead to deteriorated service delivery.
Here, we need to remember that the performance of Internet-based services is not entirely dependent on the telecommunication infrastructure, and the scalability of servers entertaining service requests also plays a critical role [9]. Network congestion can slow down the service. Unexpected fluctuations in service request rates and corresponding changes in Internet traffic can directly affect the availability of webbased systems, and a substantial change in traffic may even result in crashing various application services (https:// techcrunch.com/2018/12/26/alexa-crashed-on-christmasday/). Consequently, we see that special measures and strategic approaches are used for load balancing, enhanced stability, and detecting and mitigating malicious actors affecting the availability of Internet-based services and systems [9][10][11].
Moreover, the requirements and challenges of Internetbased services vary for different application scenarios [12][13][14]. As a result, recent research has focused on applicationspecific issues investigating underlying factors affecting the performance of corresponding Internet-based services and proposing different optimization methods for improved availability and Quality of Service (QoS) [15,16]. In this paper, we focus on Chinese railway system that serves the needs of hundreds of millions of people daily. A key component of the Chinese railway system is the "12306 ticketing system (https://www.highspeed.mtr.com.hk/en/ticket/buy-ticket-12306.html)" that serves as the main channel for passengers to check the availability of tickets and make online bookings. This system can experience occasional incoming traffic fluctuations [17]. Particularly, with the wide-scale availability of high-speed Internet over 5G infrastructures, the number of people seeking the services through "12306 ticketing system" has grown significantly and ensuring its scalability and dependability is a significant challenge in its own.
This research observes and predicts the Internet traffic generated by the incoming service requests in the "12306 ticketing system" to propose a model that enables scalability and resilience in the event of Internet traffic surges. The main goal of proposing this model is to minimize the impact of sudden fluctuations in online traffic for improving the dependability and overall Quality of Service (QoS) of the 12306 ticketing system. Since, AI and data analytics can play an important role to control and efficiently operate networks and for efficient service delivery [18][19][20][21], we use AI-based data analytics to achieve our goals.
There service requests and related network access flows are encountered at successive time intervals. Therefore, it can be treated as time series data. In other words, the problem of predicting network access flow for the "12306 ticketing system" resembles traffic flow forecasting and stock forecasting [22][23][24]. Conventionally, there are three main approaches used to solve these problems [25,26]: traditional algorithms such as exponential smoothing and autoregressive integrated moving average (ARIMA) [27,28]; traditional machine learning algorithms such as support vector regression (SVR) [28], eXtreme Gradient Boosting (XGBoost) [29,30], and Random Forest; and deep learning algorithms, such as Deep Autoregressive Networks (DARNs) [31] and long short-term memory (LSTM) [32,33]. Among these algorithms, the ARIMA requires stable data. It predicts network traffic flows considering the variations in the historical data. Therefore, it cannot predict nonlinear patterns, and its generalization ability is weak [34,35].
Likewise, due to network flow's complex nature and the lack of typical behavior [36], traditional models cannot handle such data well. The LSTM network [32] has achieved good results in nonperiodic event detection [37], traffic load balancing [38], and other fields [8,39]. It has shown promising performance in time series data trend prediction [34]. With its nonlinear approximation function and self-learning and self-adaptive features, LSTM can better describe the characteristics of time series data and achieve high prediction accuracy and strong generalization [33]. It is mainly used to describe the relationship between current data and previous input data and uses its memory ability to save the state information before the data is fed in the network and use the previous state information to influence the exact value and development trend of subsequent data. In LSTM, appropriate number of layers of and the number of hidden neurons in the feedforward network layer play key role in its performance. Research shows that increasing the number of network layers does not necessarily improve the effect, and selecting the appropriate network layers can train a highly accurate model [40,41].
However, in the actual application of LSTM network, determining the network structure and parameter selection are challenging tasks. It is generally achieved by hit and trial method or based on experience. This is also a bottleneck in the development of neural network [41]. Particle swarm optimization (PSO) is a heuristic random search algorithm, which has a lower number of setting parameters, no update and mutation involved, and it can find the extreme function values faster. Some LSTM models use particle swarm optimization (PSO) algorithm to find the optimal super parameters and achieve good results [34,38,39]. However, the PSO algorithm converges faster in the early stage of the optimization process and is easy to fall into the local optimum in the later stage. In order to be able to establish an optimal model, it is proposed to realize the parameter selection of the LSTM traffic prediction model by fusing the particle swarm optimization (PSO) and the differential evolution (DE) algorithm [42]. Using DE to optimize the evolution of PSO can improve the results of PSO [43] and greatly reduce the probability of obtaining a local optimal solution.
In this paper, we use LSTM-optimizing it through a fusion of PSO and DE to construct the DP-LSTM model that we use for the access flow prediction of 12306 ticketing system. The purpose of proposing this model is to minimize the impact of sudden fluctuations in online traffic for improving the dependability and overall Quality of Service (QoS) of the 12306 ticketing system.
Specifically, the contributions of this research can be listed as follows.
(1) We focus on "the 12306 ticketing system" to learn its flow data patterns and predict future traffic to minimize the impact of sudden fluctuations for improving its dependability and overall QoS (2) We use long short-term memory (LSTM) network and hybrid optimization algorithm to construct the DP-LSTM model to predict network access traffic (3) We evaluate the proposed model using real data collected over six months from the "12306 ticketing system" and compare its performance with mainstream time series data forecasting methods. We use mean absolute percentage error (MAPE), mean absolute error (MAE), and root mean square error (RMSE) for performance evaluation. Experimental results show the superiority of the proposed model over the benchmarks

Access Flow Data in "12306
Ticketing System" As mentioned in the previous section, service requests and related network access flows are encountered at successive time intervals. Such data exhibits secular trend, cyclical fluctuations, and irregular variations [44]. Therefore, it can be expected that operations of the "12306 ticketing system" 2 Wireless Communications and Mobile Computing can be affected by the time-varying cyclic and irregular variations. The system serves from 7 to 23, and we focused on the hourly peak flows and collected the data from March 1, 2020, to August 31, 2020. We conducted an exploratory analysis on the collected data to identify timing sequence and outliers. The results of this analysis are provided underneath.
2.1. Timing Sequence. By selecting the one-month data from March 1 st , 2020, to April 1 st , 2020, and observing the variation law of request flow, the timing sequence diagram obtained is shown in Figure 1. The timing diagram indicates that the flow rate fluctuations had evident periodicity and trend. It is basically in accord with the Internet ticketing system's daily operation schedule of suspending ticket booking from 0 : 00 am to 5 : 00 am and opening from 6 : 00 am to 11 : 00 pm. On March 5 th , the flow showed an upward trend, because the tickets can be prebooked for April 4 th , the Tomb Sweeping Day holiday. Every morning, the Internet ticketing system becomes available at 6 o'clock for ticket booking, resulting in a daily peak at 6 am. The stability of access flow data is further tested through autocorrelogram and unit root detection. The P value of the unit root check was 0.91. From the autocorrelogram shown in Figure 2, it can be found that it is without truncation. Hence, the request flow is not a stationary sequence.

Outlier Detection.
We detected abnormal points of the data flow through the boxplots. The flows during the ticketing period (6 : 00 am-11 : 00 pm) and the nonticketing period (0 : 00-5 : 00am) were quite different. Therefore, we divided the data into the ticketing period and the nonticketing period to be tested. The test results are shown in Figure 3. These red points represent abnormal data which are greater than Q3+2IQR. Q1 is the first quantile, Q3 is the third quantile, and IQR = Q3 − Q1. The abnormal data during the nonticketing period were the data at 5 : 00 am every day, which were the normal flow fluctuation before the ticket release. The abnormal data during the ticketing period appeared in the ticket prebooking for the Qingming Festival (from March 4 to March 9) and 6 : 00 am. The data were also normal business traffic fluctuations. Through outlier detection, there is no extreme abnormal data in the collected data. There is a big difference in traffic during ticketing and nonticketing periods, so the prediction models need to be built separately.
The analysis results indicated that the access flow of the Internet ticketing system was not only unsteady and periodic but also affected by holidays and ticket release timing, which was random and complicated. For the traditional timing sequence models, it is difficult for them to fit such time sequence data.

High-Level Design of DP-LSTM
In this section, we introduce different components that we use to construct the DP-LSTM model for predicting flow data for the "12306 ticketing system." 3.1. LSTM. The LSTM is a recurrent neural network (RNN), which effectively learns the long-term dependency relation-ship with well-designed "gated" architectures. It consists of memory cells and gate units. An LSTM neuron has input (multiplicative input), output (multiplicative output), and a forget gate. As the name suggests, the input gate handles input data stored in a given memory cell and protects it from perturbation by other irrelevant inputs. The output gate contains the output representations, and the forget gate handles the retention of historical information or, in other words, decides when to forget retained historical information. A typical LSTM network can have at least one input layer, one output layer, and a hidden layer. Memory cells and gate units are located in the hidden layer. When using LSTM, determining the network structure is a challenge, and it is often based on experience. Since single-layer LSTM is limited by the number of convolution kernels, multilayer LSTM can be a better choice [34]. Therefore, we use a multilayer LSTM in the network structure of the access flow prediction model of the railway ticketing system. At the same time, the dropout layer is added to improve the generalization ability of the model. Moreover, different studies have used different optimization algorithms to find the optimal LSTM prediction model [45][46][47][48]. We use a fusion of particle swarm optimization (PSO) and differential evolution (DE). The basic network structure is shown in Figure 4.

Particle Swarm Optimization (PSO).
The PSO is a population-based stochastic algorithm for optimization. It searches for the optimal zone through the continuous interactions of population members (data), leading to an iterative improvement in the algorithm's performance [49]. It moves on its own in the search space and tests different parameters. It uses the group's optimal fitness to change the direction and distance of movement to complete the global search space's optimization process. Consider a group X = fx 1 , x 2 ,⋯,x n g, consisting of m particles in a d-dimensional search space, and ,⋯,p t id T and global optimal location: p t g = ½p t g1 , p t g2 ,⋯,p t gd T at the time t, the globaloptimal location is the optimal parameter combination of the current training model; then, velocity and location of the particle at the time t + 1 can be updated to where w is inertia weight which controls the effective equilibrium between global detection and local mining of the particle; c 1 and c 2 are learning factors which, respectively, adjust the step size flying to its own and global-optimal location; r 1 and r 2 are random numbers uniformly distributed within [0,1]. While PSO has its advantages, there is a possibility that after iteratively changing the optimal parameter combination with Equation (1), subsequent particle update is stuck into local 3 Wireless Communications and Mobile Computing optimum. A differential evolution (DE) algorithm [20,21] can be used to optimize the location update of the particle swarm with the optimal individual from the differential evolution swarm. In Equation (1), p t gd uses the optimal value among differential individuals and particle swarm. Sharing the globaloptimum of two swarms can accelerate optimization velocity of the particle swarm, reduce the risk of falling into local-optimum, and output the optimal parameter combination.

Differential Evolution.
The DE algorithm is a parallel direct search method utilizing NP d-dimensional parameter vectors for optimization [42]. It is a simple yet effective technique based on group-random search, designed to solve the global optimization problem. Initially, the search vector population is randomly chosen, and it should cover the whole parameter space. New parameters are generated through the mutation operation that adds the differential weights of two parameter vectors to a third vector called a mutated vector. Mutated vector parameters "crossover" with those in a vector called target vector. The resulting vector is called a trial vector. Suppose the value of (1) Swarm Initialization. The initial swarm consists of a vector P i ðtÞ = ½P i:1 ðtÞ, P i:2 ðtÞ,⋯P i:d ðtÞ that consists of N parameters, which are randomly selected from the overall search space. d is the dimension of individual vector; i ∈ f1, 2, ⋯Ng present the i th chromosome.
(2) Mutation Operation. Different strategies can be used for mutation operation. In order to share current optimal parameter combination with the particle swarm and accelerate the optimization velocity, Equation (2) represents an individual mutation operation.
where i is the serial number of the current swarm; r 1 and r 2 are two unequal numbers randomly selected from 1 − N ðr l ≠ r 2 ≠ iÞ; P best ðtÞ is the optimal individual in the t-generation particle swarm and differential swarm; F is the scaling factor.
(3) Crossover Operation. For the trial vectors generated from mutated and target vectors, Equation (3) is used for crossover selection to get the trial vector.
where CR is the crossover probability, rand ð0, 1Þ is a random number uniformly distributed within [0,1], and rand ð jÞ is a randomly selected dimension.
(4) Selection Operation. The parameter vector with high fitness (low-cost function value) is selected using Equation (4). This helps to select the optimal parameters for the next-generation swarm.
The mixed optimization is to select the optimal value for p t gd in Formula (1) and P best ðtÞ in Formula (2) after iteration. The training here is to find the minimum loss value. Therefore, the optimal individual is selected with min ð f ðp t gd ÞÞ, f ðP best ðtÞÞ as the basis of particle swarm and differential swarm during the next-generation evolution.

Implementation Details
In this section, we present the implementation details for the proposed DP-LSTM model. The model is constructed using LSTM-optimizing it through a fusion of PSO and DE. We use this model for the access flow prediction of the "12306 ticketing system." The purpose of proposing this model is to minimize the impact of sudden fluctuations in online traffic for improving the dependability and overall Quality of Service (QoS) of the system. As mentioned in the previous sections, the flow data under consideration is time recurrent. We select several time points during ticketing and nonticketing periods. Figure 5 shows the flow trend. A cyclic flow variation can be noted with a significant difference in the access traffic flow depending on whether the ticketing is open or not. Therefore, we train different models for the ticketing period and nonticketing period. The overall structure of the network used for training this model is shown in Figure 4.

Input and Output.
The real flow data for the past 24 hours is used as the model input. This is because the change period of the flow data is 24 hours. If i is the current hour, then the corresponding flow peak data within ½i − 24, i − 23, ⋯i − 3, i − 2, i − 1 hours is input, and the model will output the prediction for the peak traffic flow for the next hour. We also tried training the model using input comprising the flow data for more than 24 hours (multiple days). For example, Figure 6 shows some results generated by using traffic flow data for the past seven. We discovered that using long-term data for forecasting can better predict the trend of traffic changes. However, it introduces a serious lag and low prediction accuracy. Simultaneously, the increase in the number of days increases the input dimensions by seven

Wireless Communications and Mobile Computing
times. This significantly increased the time cost of model training. Therefore, selecting longer period data to predict the access traffic in the next hour is costly and does not generate optimal results.

(a) Fitness function
The MAE between forecasted flow and real flow is considered as the fitness indicator. According to the experiment, the trained model's verified loss value has an inevitable fluctuation but tends to decline as a whole. Therefore, the average value of the final three verified loss values during training is used as a fitness measure.  The boundary treatment is performed at each dimension of an individual observation, and maximum and minimum limits are set in each dimension. If an individual variation exceeds maximum or minimum, corresponding treatment is required. Equation (5) shows the treatment function.
where j is the dimension of an individual variation, x H j , x L j is the maximum and minimum value of j, respectively, and n is the number of dimensions.

Hybrid Optimization Algorithm Flow.
We fuse the evolution processes of PSO and DE to optimize the parameters of the model. This fusion helps identify the flow prediction model's optimal network structure during the ticketing and nonticketing period. The optimal parameters learned in this process are used for training to obtain the final flow prediction model. Underneath, we describe the detailed steps of algorithm flow.
Step 1: construct the sample sets for the nonticketing period and ticketing period, respectively. Generate the training sample set, validation sample set, and test sample set according to the ratio of 6 : 2 : 2. Then, perform model training for the nonticketing period flow prediction model DP-LSTM (N).
Step 2: initialize the basic PSO and DE parameters (as given in Table 1), and obtain the initial population. Generate a particle/individual randomly. Then, generate a correspond-ing flow prediction model according to the value of the particle, and calculate the fitness value of each model is using the fitness function. Finally, repeat the above operation to generate NP DE individuals and NP PSO particles.
Step 3: update the speed and position of parameters with the number of NP PSO and record the optimal fitness value experienced by each particle using Equation (1). Also, update the overall optimal fitness value and optimal particle position of the PSO. Furthermore, perform the mutation, hybridization, and selection operations using Equations (2)-(4) on all individuals in the DE population. And update the overall optimal fitness value and optimal individuals of the DE population.
Step 4: select the overall optimal set of DE and PSO output as the evolutionary basis for the next epoch. If the optimal fitness value of the DE output is less than the optimal fitness value of PSO, the optimal fitness value and particle position of PSO should be updated and vice versa. Otherwise, the optimal value and individual corresponding to DE should be updated.
Step 5: test whether the algorithm reaches the maximum number of iterations iter max or the loss value is less than los s min . If yes, obtain the optimal parameter combination and execute step 6; if not, update the iteration counter t = t + 1 and execute step 3.
Step 6: perform training according to the optimal parameter combination to obtain the optimal nonticketing period flow prediction model DP-LSTM (N).
Step 7: use the flow sample data set during the ticketing period to repeat step 2-step 6, obtain the optimal parameter combination model DP-LSTM (S) of the ticketing period, and combine the two optimal models to be the final flow prediction model DP-LSTM.

Experimental Evaluation
In this section, we provide the details of the experimental evaluation of the DP-LSTM model constructed to predict the flow data in the "12306 ticketing system" for warning and minimizing the impact of sudden fluctuations in online traffic for improving the dependability and overall Quality of Service (QoS) of the system. We describe the experimental setup's details. We explain the data preprocessing procedures, measure the effect of fused optimization used in our design, compare the proposed model's performance with mainstream methods, and present the results of error analysis.

Data Set.
We collected real data from the "12306 ticketing system" for this research. The peak flow per hour from 1 March to 31 August 2020 (i.e., data of 4393 samples in total) was used to generate the sample set. The data set is a  time series data with a period of 24 hours. Through the analysis of "access flow data in 12306 ticketing system," it can be seen that the collected traffic belongs to nonstationary time series, and the change of access traffic has great fluctuation due to the influence of holidays and business changes. The data set was divided as per the "training set : verification set : test set = 6 : 2 : 2." The training set comprised the data collected from March to June 2020. The verification and test data set comprised the data collected in June and August 2020.

Performance Measures.
We used mean absolute percentage error (MAPE), mean absolute error (MAE), and root mean square error (RMSE) to measure the model's performance. MAPE provides an effective determination method for forecasting the accuracy rate, which is usually expressed with percentage. The smaller the value is, the better the effect will be, as shown in Equation (6). The index MAE directly provides an average deviation between model output and real data. The larger the error is, the larger this value will be, as shown in Equation (7). RMSE is the standard deviation of the forecasting value, as shown in Equation (8). The smaller the value is, the better the model performance will be. MAPE, MAE, and RMSE are widely used evaluation indexes for neural network-based models [50].
where y i is the real value andŷ i the predicted value.

Data
Preprocessing. The internal covariate shift or change in the distribution of input variables used for training and testing the model obstructs a neural network-based model's training due to nonlinearities. Applying normalization on input data can offer an easy starting condition for the training and reduce the overall training time [51]. It is particularly the case for data sets collected over an extended period of time. Normalizing input data also reduces the risk of overfitting [52]. We test the input data to apply a suitable normalization technique. The Kolmogorov-Smirnov test (K-S test) is conducted on the collected data with the output result (statistic = 0:156 and P value = 6.31e-94), and the P value is far lower than 0.05. Therefore, the data cannot accommodate normal distribution. We cannot use the normalization based on mean variation for the sample data. Moreover, the maximum and minimum flows during different periods are different, and the training set's extreme values cannot be used for future data. Therefore, the normalization of maximum and minimum does not apply to the training data. The data distribution shows that the data is distributed within the scope of ½0, 2 * le7. Therefore, normalization is conducted by dividing by le7.  Table 2. It can be seen     To further compare parameter influence on the model, the optimal model LightGBM (N) was built using LightGBM for benchmarking. The test set was forecasted with RN-LSTM (N), PSO-LSTM (N), DP-LSTM (N), and LightGBM (N). The MAE, MAPE, and RMSE evaluation indexes were calculated, respectively, with the Formulas (5)- (7). Table 3 shows the results. It can be seen that MAE, MAPE, and RMSE of the DP-LSTM (N) model are lower than RN-LSTM (N) and PSO-LSTM (N). Therefore, it is possible to search optimal model parameters with mixed optimization based on DE and PSO. In contrast, the model with randomly selected parameters has worse effects for the same network structure, and some indexes are worse than LightGBM (N). Consequently, parameter optimization is crucial for the LSTM model results. The proposed mixed optimization method can achieve better effects.

Comparison with Mainstream Approach.
We compare the DP-LSTM model with the mainstream time series models to verify the forecasting effect on the access flow of the "12306 ticketing system." The Light Gradient Boosting Machine (LightGBM), XGBoost, Random forest, support vector regression (SVR), and Seasonal Autoregressive Integrated Moving Average (SARIMA) algorithm are used to build models. These methods have been widely used in recent studies [55][56][57][58][59]. The parameter optimization is conducted for each algorithm model for building the optimal models. For example, LightGBM and XGBoost use Bayesian optimization for parameter optimization, and SARIMA uses Akaike Information Criterion (AIC) for parameter selection. Table 4 presents related parameters for different models.
Each model makes predictions for the same test set, and then forecasting results are compared. Table 5 presents the evaluation results of each model for given indexes. The DP-LSTM generated the best value for the MAPE index. Contrarily, SVR and SARIMA have the worst values. Each index of LightGBM in the machine learning algorithm is approximate to DP-LSTM. Figure 7 shows the overall forecasting effect of DP-LSTM. It can be seen that the model can achieve better fitting for data fluctuations.
The forecasts generated using the LightGBM, SAR-IMA, and DP-LSTM are shown in Figure 8. Reviewing the evaluation indexes in Table 5 and Figure 8, it can be observed that the prediction effect of the DP-LSTM is better than traditional algorithm. DP-LSTM needs to train the structure and weight parameters. The training time of DP-LSTM is about 3 hours, which is longer than that of LightGBM, but shorter than that of SARIMA. The training speed of DP-LSTM needs to be further optimized. 5.7. Residual Sequence Analysis. Residual sequence analysis (error analysis) plays a crucial role in time series prediction and analysis. If a time series is a white noise, there is a zero correlation among this series' values. Prediction models do not work well on such series. Contrarily, while forecasting a time series, ideally, the series comprising a model's forecasting errors should be white noise. Suppose the forecast errors constitute a series that is not white noise. In that case, the predictive model can be further optimized. We get residual sequence by subtracting the forecasted value of DP-LSTM from the test set's real value. We use the white noise test function of Python for testing. The residual is not a white noise sequence because the P value is lower than 0.05 when the lag is between one and forty. Although the accuracy of the model training is high, there is extractable information that shall be further optimized to improve the forecasting accuracy. Figure 9 shows the distribution of the residual sequence. Although there is related information in the residual sequence, the error changes within 0.1. Therefore, the error 08-01-13 08-02-13 08-03-13 08-04-13 08-05-13 08-06-13 08-07-13 08-08-13 08-09-13 08-10-13 08-12-13 08-13-13 08-15-13 08-16-13 08-17-13 08-18-13 08-19-13 08-20-13 08-21-13 08-22-13 08-23-13 08-24-13 08-25-13 08-26-13 08-27-13 08-28-13 08-29-13 08-30-13 08-31-13 08-14-13 08-11-13 Residual Figure 9: Residual sequence. 10 Wireless Communications and Mobile Computing must be considered when the model is applied for decisionmaking.

Conclusion
The 5G communication technologies and their AI-embedded infrastructures enable the design and deployment of sophisticated network-based services. However, the hosts deploying these services can experience dependability and QoS issues. This paper constructed a deep learning-based model DP-LSTM using LSTM and hybrid optimization to address such problems in the "12306 ticketing system." This system serves hundreds of millions of railway passengers daily and can experience network flow fluctuations due to demand variation. The proposed model forecasts the network flow data peak for the next hour based on the recent day's access data. The performance of the LSTM network structure is optimized through a fusion of DE and PSO optimization algorithms. We used MAPE, MAE, and RMSE to evaluate the proposed model's performance using real data experimentally. A comparison with the mainstream time series forecasting algorithms demonstrated the superiority of the proposed model. However, error analysis/residual sequence analysis showed that the proposed model could be further optimized.
The proposed system can help for resource planning of the "12306 ticketing system," thereby improving its dependability and QoS. Such solutions can also reduce the overall costs, particularly in cloud-based environments driven by pay-peruse model.

Data Availability
Data can be obtained by contacting the first author (fan.chun .mei@163.com).

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.