Probabilistic En Route Sector Traffic Demand Prediction Based upon Statistical Analysis of Error Distribution Characteristics

,


Introduction
As an important part of the Next Generation Air Transportation System (NextGen) program, traffic flow management based on probabilistic traffic demand prediction by considering uncertainty has received much attention in recent years due to the stochastic nature of the predictions.It provides more accurate prediction results that will benefit realistic decisions than deterministic prediction.
Intelligence and dynamic of traffic environment has been deepened with the complexity since the 90s of last century.The method of original static and deterministic traffic demand prediction has gradually been unable to meet the requirements of optimizing the air traffic management.Many scholars began to study the characteristics of the uncertainties in the air traffic system from the perspective of both the trajectory uncertainty and the uncertainties of departure/arrival time [1][2][3][4][5][6][7][8][9][10].In 2002, Larry A. Meyn [11] established a probabilistic approach to predicting traffic demand and made a scientific estimate on demand of the airport through Monte-Carlo simulation.In the same year, K. T. Muller et al. [12] studied the uncertainty models of the factors in the strategic trajectory prediction and simulated the future operating environment by controlling the distribution of these factors.In 2003, Sandip Roy [13] established an aggregate dynamic stochastic model based on Poisson distribution to quantify the uncertainty of traffic demand.For the disadvantages of the method, Craig R.Wanke et al. [14] extracted and measured the characteristics of uncertainty in demand predictions based on the deviation analysis between predicted values and actual values.In 2004, Craig R. Wanke et al. [15] improved the algorithm and established a statistical model of error distribution applied to Monte-Carlo simulation under special traffic samples.In 2007, E. Gilbo and S. Smith [16] improved the ETMS model by proposing a new model to improve aggregate air traffic demand predictions.In 2009, E. Gilbo and S. Smith [17] analyzed probabilistic prediction of aggregate traffic flow in the terminal area of the airport using uncertainty in individual flight predictions.In 2011, E. Gilbo et al. [18] proposed a new method for probabilistic traffic demand predictions for en route sectors based on uncertain predictions of individual flight events.Wen Tian and Minghua Hu [19] of Nanjing University of Aeronautics and Astronautics explored the random variation of traffic demand in the airspace sectors in 2011 and established a probabilistic traffic demand prediction model for tactical operations.The proposed model has made contributions to the further study of uncertainty and measurement in air traffic demand prediction.Also in 2011, Chao Wang and Le Yang [20] of Civil Aviation University of China proposed a prediction method of congestion prediction based on the Monte-Carlo simulation.They established the probabilistic distribution model based on aircraft sector entry time, sector exit time, and stay time in the airspace sectors.In 2013, from the perspective of demand uncertainty, Shanmei Li et al. [21] analyzed the influence of aircraft arrival and departure time on the airport traffic demand prediction and established a probabilistic distribution model of multiperiod method.
In the same year, Wen Tian [22] further optimized the probabilistic traffic demand prediction method for airspace sectors and applied it to the risk management of airspace congestion.In addition, more and more ATFM researches are probability/uncertainty-based on, which have made contributions to network optimization and decreased congestion.F. Gonze et al. [23] proposed a probabilistic framework for modeling air traffic occupancy count to provide ATC with more precise and valuable information.Dan Chen et al. [24] established a model to reduce en route delays by characterizing realistic dynamics and uncertainty of en route airspace system.Prediction uncertainty is explicitly used in demand and capacity balancing tools to develop effective and efficient congestion resolution actions [25].YH Chang et al. [26] mainly focus on reduced sector volume problems caused by weather uncertainty and proposed effective solutions to avoid sector congestion.As mentioned, the majority of traditional research has focused on the measurement of uncertain traffic demand prediction.For more and more study based on aircraft, these segments need to be considered based on actual data.According to the prediction deviation of the time that aircraft went through the en route sector boundaries, we could obtain time randomness characteristics of the single aircraft entering and leaving the sector.Therefore, probabilistic traffic demand of the en route sector could be calculated.A large number of targeted aircraft performance data of particular periods and fixes are required, which made high requirements on accumulation and organization of historical data (including radar data and prediction data).In our traffic prediction and management system, the latitude  and longitude  of any waypoint could be provided easily.We assume that fix A(  ,   ) and fix B(  ,   ) represent origin and destination separately; method of spherical trigonometry was applied to derive the corresponding parameter as ( 1) and (2).True course   depends on which quadrant it is in (Figure 1).
where  is the angle between the rhumb line track and longitude  is the average latitude of fix A and B, i.e.,  = (  +   )/2  is the distance from fix A to B (unit: radian) In addition, the unit of   −   is radian,  takes absolute value of calculation results and we should converse D's unit from radian to nautical mile.The cruise speed varies with the aircraft types.Then the estimated time of the arrival time at fix B (  ) since the flying time from A to B could be calculated.During the process of track conjecture, the system scans radar data and telegraph data (such as DEP, departure message; ARR, arrival message; and OVFLY, overfly message) at a regular interval to obtain the latitude, longitude, and speed of the aircraft.The total aircraft in time interval would be the results of deterministic prediction.

Data Preparation
An aircraft flying over the en route boundary fixes is defined as a random event.In order to obtain the probability density distribution function of the event, the distribution parameter is usually obtained according to the historical data for a given distribution characteristic, which is a subjective method [19,21].In this paper, a direct statistical method was adopted to predict the time error of the large number of aircraft the en route boundary in a certain period of time for a more realistic, objective reflection of random events regularity of the discrete distribution function.

Analysis of the Prediction Error
Characteristics.The relative error of the predicted time of the aircraft through the sector boundary is calculated by comparing the historical predicted time with the corresponding historical actual time.Some regularity can be found after statistical analysis of a large number of predicted values and actual ones, and it may vary depending on the busy level of the predicted periods and the location of targeted sectors.Details are as follows.
(1) The Busy Level of the Predicted Periods.As shown in Figure 2 (for details of this analysis see Supplemental Material [S1]), majority of data is concentrated in the time interval of 09: 00-24: 00; it is easier to find the regularity.On the contrary, sectorAR05 seems relatively free during the period of 01: 00-08: 00.
(2) The Location of Targeted Sectors. Figure 3 (for details of this analysis see Supplemental Material [S1]) shows the results that the aircraft numbers vary significantly among different sectors.Usually, the more aircraft in the sector there are, the easier error regularity of the predicted time could be found.

Influencing Factors of Prediction Error and Model Description.
It is convenient to describe the problem as a simplified network model composed of four elements: airspace, en route, sectors, and sector boundary fixes.In addition, the four-dimensional space (one-dimensional time and threedimensional space) of the aircraft flight is simplified as one-dimension time and two-dimensional space.The entire airspace is divided into two parts: targeted airspace and nontargeted part.As illustrated in Figure 4, targeted airspace consists of those airspace, en route, sectors, and sector boundary fixes which are within predicted space.Others belong to nontargeted part.The aircraft departs from the take-off airport, then enters the targeted airspace through the sector boundary fixes, and finally travels along the en route and continues to fly over several sector boundary fixes until it leaves the targeted airspace.
During the predicted period , we assume that the total amount of aircraft passed through a targeted sector is .Aircraft's predicted time error of flying over several sector boundary fixes can be expressed as where The samples that estimated time of aircraft flying over several sector boundary fixes are divided into two subsets.All data before day th belong to the subset I, data from day  + 1th to the day th belong to subset II, and the former class is used to find the distribution characteristics of the prediction deviation.The latter one is used to verify the validity of the statistical regularity.After successful verification, sector enter time of any aircraft on any day can be predicted probabilistically in the future.The analysis process in this paper is based on typical route sectors at typical run time.When the sector changes or emergencies happen (such as bad weather and major events), restatistics and reanalysis on the changed situation are required by method mentioned above to guarantee the accuracy and validity of the error distribution regularity.The location of targeted sectors, the busy degree of the predicted periods, and the difference of the predicted time scale are considered as the main reasons for the time prediction deviation.Two-dimensional probability distribution (    ) is used to describe the variation regularity of the prediction error characteristic over time.
(1)   represents the time zone.The prediction error results often depends on the busy levels in different periods of the sector.Therefore, according to the air traffic flow management controllers' habits, a day was divided into several time zones for every 15 minutes for analyzing the typical predicted time curve trend and error statistics. ( = 1, 2, . . ., ) means time period of error statistics and  indicates the sum of time periods.
(2)   is a feature quantity that describes the aircraft's time passing through the boundary fixes.In this paper, a bivariate cumulative distribution function is proposed to describe the predicted time of the aircraft passing through these fixes.

Statistical Methods for Predicting Error Distribution
According to the probability statistics management [13], the event frequency function (empirical distribution) can be used as the probability density in big sample analysis due to the minor errors presence compared to the overall distribution density.General solution can be summarized as follows.
(1) We take the 15min time window as the interval unit for the typical operation day of an en route sector and the aircraft's time passing through the boundary fixes could be predicted.Each unit is defined as   ( = 1, 2, . . ., ).
(2) The prediction error sample in interval   is divided twice according to the uniform distribution of the number of the prediction error and its size.The steps of the procedure are described below.
(a) The first division is based on prediction error value of interval   .Take the number of intervals corresponding to the prediction error value   +  ( = 0, 2, . . ., 59) as the abscissa and the prediction error value as the ordinate to establish coordinate system for the first division.The vertical axis is separated in unit of  (e.g., 2min).
(b) The second division is necessary to avoid that the deviation trend of prediction error cannot be reflected due to lack of maximum and minimum in the sample.The sample reference interval is set as [ − ,  + ], where  indicates the appropriate number of sample (e.g.,  = 8) and  is fluctuation range (it could be defined approximately 12% of ).The initial divided intervals are merged from both sides of the predicted value to satisfy the number of prediction error samples in each interval as much as possible, and the boundary values of each interval are recorded as well.If the number of samples cannot meet the requirement after merging process, then a smaller reference interval should replace the original one.
Following the steps mentioned above, total  layer partitions corresponding to the interval   can be calculated easily. , ( = 1, 2, . . ., ) is defined as each partition and the sample size is  , for partition  , .As results showed in Figures 5 and 6, the historical data can be classified from the horizontal and vertical angles, respectively.The results reflect the predicted time error distribution of passing through boundary fixes in different intervals.(3) Calculate prediction errors V  ( = 1, 2, . . .,  , ) of  , ( = 1, 2, . . ., ;  = 1, 2, . . ., ) one by one and the distribution is shown in Figure 7.Note that the abscissa is the percentage of predicted error.
The number of samples in each prediction error interval is calculated as   ( = 1, 2, . . ., ) and ∑  1   =  , .In addition,   =   / , , as  , is sufficiently large and   is regarded as discrete exact probability distribution of the prediction error of layer partition  , in interval   .(4) Finally, upon overall historical data, we can get exact probability cumulative distribution functions of a total of  prediction error, which is the table of error statistics.Furthermore, efficiency test should be conducted to ensure that the statistical method of the predicted error distribution characteristic is valid.Assuming that the prediction data of the same time interval in the first day after the historical statistical sample is the predicted sample data, the proposed statistical method is used to calculate the discrete probability distribution predicted time error of the aircraft passing through the boundary fixes and the distribution test is carried out.
(5) According to the time that aircraft passing through the boundary, we divided the aircraft into two categories; one is time of aircraft entering the sector from boundary fixes and the other is time about leaving the sector.Research on probabilistic en route sector traffic demand prediction is based on the error distribution rules of these two types time.

Probabilistic Methods for En Route Sector Traffic Demand
Prediction.The probabilistic traffic demand prediction is a derivative of the conventional deterministic traffic demand prediction.After verifying the discrete probability distribution of the prediction error Δ  of the aircraft's time passing through the boundary fixes, the error statistics can be used to analyze the possible distribution of the aircraft entering and leaving the en route sector.Then the probable number of aircraft in a certain period of one sector can be computed, that is, the probabilistic results of the traffic demand value of the sector.It reflects the implied risk factors in the prediction of traffic demand and provides the prerequisites and basis for the study of the sector congestion risk.We assume that there are   aircraft passing through the sector within targeted prediction interval [  ,   + 14]; the predicted time error Δ   of these aircraft through the boundary fixes going into the sector can be obtained according to the aforementioned method (i.e., predicted time error of aircraft entering the sector).(Δ   ) and (Δ   ) represent the corresponding probability density function and cumulative probability function separately.Similarly, Δ   indicates the predicted time error that aircraft leave the sector through Based on (4) and ( 5), the probability for aircraft   staying in the sector in interval [  ,   + 14] is shown in If the deterministic demand prediction value for aircraft   existing in the sector in interval [  ,   + 14] on someday is , then   [] (0 ≤  ≤ ) represents that there will be  aircraft in the sector of that interval probably.It can be expressed in Pseudocode 1.

Efficiency Test.
In order to verify the validity of the proposed method further, we conduct efficiency test from the perspective of cumulative probability on the basis of the distribution test in the third section.We use () to indicate the probability that aircraft   is located within the sector during the interval [  ,   + 14] for full description of error distribution regularity.
We assume that a day is divided into  intervals, the probable demand prediction value is   in interval , and the corresponding calculated actual demand value based on historical data is   ; then the correlation coefficient is denoted as where  and  denote the mean of   and   , respectively; therefore,  = ∑  1   / and  = ∑  1   /.The larger the correlation coefficient  is, the better the prediction value fluctuation simulation of the day is.In addition, it is of great practical value and the prediction error laws are equally effective for the same interval of another day.

Numerical Examples.
In this section, we carried out a case study based on operation data from May 1, 2014, 00:00:00, to May 24, 2014, 23:59:59, of the sectors AR01-AR08 in central and southern China.A total of 19624 time samples of passing through these sectors' boundary fixes from every 07:00 (a relative free level) to 11:59 (a relative busy level) were obtained as historical sample for statistical analysis.Central Limit Theorem (CLT) establishes that, in most situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed.When the sample size is equal to or greater than 30, and the variables are approximately normally distributed [15].We can assume that the sample space is subject to the normal distribution; the calculation results are shown in Tables 1 and 2 (for details of analysis from 07:00-07:14 in sectors AR01-AR08 see Supplemental Material [S3], which also applied to the results of the rest 19 intervals).
We use the Quantile-Quantile plot (Q-Q plot) of Normal Probability Paper Test to test actual error sample distribution in this paper.It relies on the relationship between the quantile of the sample data and the quantile of the specified distribution to verify whether the data obey the normal distribution.If the relationship curve is a straight line, then the data obey the normal distribution, and otherwise it is not.A total of 320 hypothesis tests were performed on the above samples, and the results were found to be consistent with the normal distribution hypothesis.Due to space limitations, only 16 of the AR01-AR08 sectors test results for a 15-minute interval were selected and shown in Figure 8 (for details of this analysis see Supplemental Material [S2]).

Results
Analysis.Probability distributions of every typical operation day from 07:00 to 11:59 of the sectors AR01-AR08 in central and southern China can be computed according to the prediction error distribution characteristics, probabilistic en route sector traffic demand prediction, and time predicted error parameter.Taking the predicted results in interval 07: 00-07: 14 from May 1, 2014, to May 24, 2014, of sector AR01 as an example, according to the maximum number of aircraft in sector AR01 which is denoted as , the probability of number  aircraft occurring in the sector is shown in Table 3 (for details of this analysis see Supplemental Material [S3]).
As is shown in Table 3, the abscissa  represents the number of aircraft to be presented of the same interval in the future, which is demand predicted results that used deterministic method referred in Section 4.1.The ordinate  represents the possible number of aircraft and the values in the table mean the corresponding probability of  aircraft present in sector AR01 for  aircraft has been predicted in interval 07: 00-07: 14 of the same sector.Similar to Table 2, a total of 160 en route sector traffic demand and its probability distribution tables can be obtained.However, only 8 test results in interval 07: 00-07: 14 of sectors AR01-AR08 were selected due to space limitations, as shown in Figure 9 (for details of this analysis see Supplemental Material [S3]).
We calculated that the average flight time is 12 minutes of the central and southern regions based on the sample.According to the results, we enlarge each interval by increasing 12 minutes both forward and backward from 07: 00 to 11: 59.Then the corresponding traffic demand and its probability of each sector on May 25, 2014, could be obtained.The maximum predicted traffic demand is picked as the final values shown in Table 4 (for details of this analysis see Supplemental Material [S4]).
Correlation test results of the above predicted results and the actual ones are shown in Table 5 (for details of this analysis see Supplemental Material [S4]).It can be found that the correlation coefficient is more than 70% in busy period (such as 09: 00-11: 59) or sectors with large traffic flow (such as AR01-AR05), the highest or even 99%, which proved that the statistic law is quite effective.In contrast, the correlation coefficient of the idle sector or the smaller flow is not as ideal as the former one because of the small sample size, but it is acceptable overall.
Based on the predicted results above, we take the AR05 sector (the capacity is 15 per minute) as an example; both the deterministic and probabilistic traffic demand predicted results and actual value for each interval of period 07: 00-11: 59 are shown in Figure 10 (for details of this analysis see Supplemental Material [S4]).We also found the following.
(1) Because the deterministic prediction approach is limited by the prediction scale, the result is more accurate when the scale is small.The probabilistic prediction approach is more accurate in the peak period when the sample size is larger than normal.
(2) In the period 08: 45-08: 59, the probabilistic demand prediction results reflected that the traffic demand exceeded volume accurately, which would make the controllers predict the sector congestion easier.From 09:15 to 09: 29, the probabilistic results showed that demand did not exceed the volume, thus avoiding the false alarm caused by the deterministic results.
In general, the accuracy rate of probabilistic traffic demand prediction is 80%, which is 23.1% higher than the accuracy rate of deterministic traffic demand prediction.

Conclusions
Since most mature probabilistic traffic demand methods cannot be applied to data of current air traffic management system in China, in this paper, an approach for predicting the time of flying over several sector boundary fixes and probabilistic en route sector traffic demand was proposed to analyze time predicted error distribution characteristics and influencing factors.Furthermore, the traffic demand probability distribution and the change regularities in a certain period of the sectors are obtained based on the actual operation data.In this way, the accuracy of probabilistic demand prediction is estimated to be 23.1% higher than that of the traditional deterministic traffic demand prediction, which shows that this method can provide more proper and accurate analysis of traffic demand prediction for air traffic flow management.

Figure 2 :Figure 3 :
Figure 2: Bar chart of aircraft in Sector AR05 in central and southern China in different periods in May.
is departure time of aircraft     is estimated time of aircraft  (1 ≤  ≤ ) flying over several sector boundary fixes    is actual time of aircraft  (1 ≤  ≤ ) flying over several sector boundary fixes

Figure 4 :
Figure 4: Diagram of en route network.

Figure 7 :
Figure 7: Diagram of prediction error distribution.

Figure 10 :
Figure 10: Comparison of the deterministic and probabilistic demand prediction results and comparison of the actual traffic flow value and capacity (Sector AR05).
The predicted time for aircraft  (1 ≤  ≤   ) entering the sector could be calculated by equation    =  It is determined by take-off time, route length, and flight performance.Therefore, the probability for aircraft   entering and leaving the sector in interval [  ,   + 14] is calculated by the following equations:  (  ≤ ≤ (  + 14)) =  (   ≤ (  + 14)) −  (   ≤   ) = ∫

Table 1 :
Mean  and variance  of aircraft's predicted time error Δ   of entering sector AR01-AR08 (min).

Table 2 :
Mean  and variance  of aircraft's predicted time error Δ