A method for estimating flight paths missing data

Air-traffic optimization is an essential part of airspace operation reengineering, as the number of flights and the usage of routes increase in the world. The NextGen and SESAR projects are important initiatives that allow for more scalability and safety in air-traffic. One element of these projects is the Automatic Dependent Surveillance-Broadcast (ADSB) system which allows airplanes to share their position and speed. The ADS-B antennas’ coverage is somewhat limited in less economically developed and oceanic areas, resulting in the lack of flight path data. This paper proposes a method based on artificial neural networks (ANN), interpolation and average computation to fill flight path data partially tracked by ADS-B antennas. While other methods are focused on one or two dimensions of the flight path, this work is focused on infilling the 4-dimensions present in the ADS-B data (latitude, longitude, altitude and ground speed). This work is useful in analyzing performance of historical flights related to limited coverage areas or in predicting flight path in air-traffic management systems (ATMs). The comparison between the real and estimated trajectories in a set of 517 flights has shown accuracy superior to 92% for the metrics distance flown, estimated burned fuel, and trajectory correlation.


INTRODUCTION
Optimizing air traffic control and route management is a constant concern of airlines and airport managers as air traffic increases over Europe (EUROCONTROL, 2015) and the US (FEDERAL AVIATION ADMINISTRATION, 2015).This growing demand will eventually render the current system unsustainable.The efficiency of the system is dependent on the efficiency of flight operations.Hence, the optimization in route designing and flying standards is an essential part of airspace operation reengineering.
The NextGen project was proposed to guide this reengineering process in the US.The project includes modifications to the current method of operation.Many of these modifications are intended to give more autonomy to pilots and planes in flight in order to reduce the intermedium of flight controllers.New technologies have been developed to provide the system with information for decision making (THEUNISSEN et al., 2011).In Europe, a similar program, named SESAR and co-founded by Eurocontrol, aims to increase the efficiency in European airspace operational procedures (COOK et al., 2009).
An important technology that is part of the NextGen project is the Automatic Dependent Surveillance-Broadcast (ADS-B) system, which is used by aircraft to broadcast information about their position to an other aircraft and/or ground stations (STROHMEIER et al., 2014).The receptor aircraft uses these data to avoid collision with the sender aircraft, while the ground stations can use these data for other purposes.Airspace authorities collect data through their ground antennas to control and improve the airspace.Enthusiasts have been using simple ADS-B antennas to collect and monitor aircraft flying close to their locations and share these data with companies like FlightAware (2016a), FlightStats (2016) and Planefinder (2018).These companies have services such as online web sites which provide flight information for internet users .
For many reasons there are flights that do not have their path totally tracked by ground antennas.Among these reasons are the lack of coverage range of these antennas which is about 320 km, as well as their limited quantity (FLIGHTAWARE, 2106a).This can be seen in Figure 1, which illustrates the FlightAware coverage.This coverage limitation is evident in developing areas such as Africa, Asia and South America, and over the oceans.The lack of data can affect studies that aim to analyze the performance of historical flight data and propose improvements in flight operations.In this paper, a method to fill flight paths, composed by positions (latitude, longitude and altitude) and ground speed, known as 4-dimension paths is proposed.This is useful to fill flight paths partially tracked by ADS-B antennas and allow performance analysis of historical flight data related to limited coverage areas.This method could also be an alternative to air-traffic management systems (ATM) that may need to predict some parts of the flight path.
The process of finding the appropriate algorithms and parameters is a contribution which can support researchers and professionals on solving the flight path missing data problem.The proposed method was constructed by analyzing and combining different algorithms and parameters.The resulting method uses artificial neural network (ANN), interpolation and average computation algorithms.
Methods using ANNs to fill missing data are not new in literature.Coulibaly and Evora (2007) used a similar method to infill missing data in precipitation records.ANNs were also used to fill-in medical data that was not collected from a patient, yet still assisted in diagnosing an appendicitis (PRABHUDESAI et al., 2008).In the aeronautical field, there was some research on the use of ANNs in predicting parameters (DONGARE;MOHAMED, 2015;RAIS-INGHANI;GHOSH, 2000) such as landing speed (DIALLO, 2012), and predicting the vertical speed of an aircraft (SHARMA; CHATURVEDI, 2011).Diallo (2012) has shown an error of less than 12.6% in 95% of the cases, while Sharma and Chaturvedi (2011) showed that the error increases in proportion to the distance predicted.These methods are focused on one or two dimensions of the flight path, while the proposed method in this work is focused on infilling the 4-dimensions present in the ADS-B data (COULIBALY; EVORA, 2007).
The following sections are organized as follows: in Section two, the proposed methodology, algorithms and metrics are described; while Section three presents the results.Comments and conclusions are presented in Section four.

METHODOLOGY
The methodology which was used in this work is comprised of ANNs and statistical tools (interpolation and average) to predict the cruise and descent flight path (4D) given a set of waypoints from the flight ascent phase.The waypoints dataset was made available by PlaneFinder.netcompany.For the method construction, many algorithm parameters were tested using flights with complete paths over six routes: four in Europe, one in the United States and one in Brazil.The resulting method with the best performance is presented in Section 2.2.Three metrics were used to assess the performance of the method: distance flown, estimated fuel burn, and coordinates correlation.The metrics give information about how close the generated flight paths are to the real ones.

Data
The data used in this work were collected through ADS-B antennas by PlaneFinder.netfeeders throughout September 2013.Initially 612 flights over six routes were selected from the dataset provided by PlaneFinder.net.After filtering outlier flights, 517 flights were used for parameter definition and the neural network training task.Each flight has a set of waypoints (latitude, longitude, altitude and ground speed), airline code, origin and destination airports codes, and departure and arrival dates and times.To guarantee complete flight paths, the selected flights must have no more than 20 km between each waypoint in their path, otherwise the flight is considered an outlier.Table 1 summarizes the routes' details.Source: Elaborated by authors using a dataset provided by PlaneFinder.netcompany.

The flight path data-filling method
The proposed flight path data-filling method receives as input n sequential waypoints on the flight ascent stage (typically n = 2) and then completes the other flight path stages (cruise and descent) using artificial neural networks, interpolation and average techniques.An illustra-tion of the algorithm data input and the techniques adopted in each flight phase is provided in Figure 2. The proposed method uses a sliding window method in applying the ANN functions to the data and complementary statistical techniques.The sliding window method takes as input n sequential values and returns a predicted value based on the inputs.The new value is used with the n-1 previous values to generate the next value up until the end of the iteration.This method is illustrated in Figure 3.The sliding window method and artificial neural networks were initially used to predict latitude, longitude, altitude and ground speed for each waypoint.However, the neural networks showed unsatisfactory performance in predicting the flight altitude in the cruise phase, and the ground speed in the cruise and descent stages.To overcome this problem, the flight altitudes in cruise phase were estimated by the use of a simple interpolation method.To predict the ground speeds, a method that assumes average speed by a percentage of total waypoint distance was used (see Section 2.4).
As the stop condition of the flight path data-filling algorithm, the method creates a virtual waypoint with the coordinates of the destination airport and ground speed was assumed to be zero.This waypoint is connected to the closest generated waypoint within a distance of less than 10 km to the airport.

Artificial Neural Networks (ANNs)
Artificial Neural Networks is a computing method that can be used for function approximations based on training with historical values.In this paper the ADS-B data were used to train the networks.The Matlab app nftool (MATLAB, 2016) was used along with the Matlab programming language to create, train and implement the network.The tool creates a function in Matlab language that is effectively the ANN, with its optimal weights and biases.
Many distinct architectures and parameters were tested while varying the number of inputs, outputs and neurons in the hidden layer and the learning algorithm.The number of inputs and outputs was responsible for considerable differences in the final results.The search process to find the most suitable ANNs for the data infilling problem is detailed below: i) Initial setup: A feed-forward neural network was created with latitude, longitude and altitude as inputs for each route.The inputs are related to a couple of waypoints, totalizing 6 input values.The idea was to evaluate the training convergence and then add or remove columns as needed.The ANN had two layers (input and hidden neurons) and sigmoid activation functions.The dataset used in the neural networks learning processes (minimizing error) were subdivided in: training, validation and test datasets with respectively 70%, 15% and 15% of the data.The metric means squared error (MSE) was used to compare the results.
ii) Disjoint dimension patterns: putting altitude and ground speed together with latitude and longitude did not work so well.This initial setup showed an error of up to 20% in predicting the routes.Even by varying the number of neuros in the hidden layer, from 2 to 20 neurons, and testing different learning algorithm, the results were not improved.For this reason, the latitude and longitude were separated in a different ANN.The altitude and ground speed were tested as input for distinct ANNs.These experiments indicate that the level of association among the trajectory dimensions is low, except between latitude and longitude.
iii) Altitude and ground speed: these metrics also did not work as ANN inputs separately, except for altitude in the descent flight stage, which showed an error inferior to 10% in the test and validation tasks.Therefore, an another method was evaluated to predict ground speed.
Two ANNs for each route were created: one for the latitude-longitude prediction and the other for altitude prediction in the descent phase.Both were Feed Forward ANNs trained with the Levenberg-Marquardt algorithm (LEVENBERG, 1944).The final setups having the least errors are illustrated in Figure 4.It is important to note that the test and validation results obtained in the search process for suitable ANNs are not the final and conclusive results to infer the efficiency of the proposed method (Section 2.2).The obtained results in the ANN test and validation procedures supported the data infilling method creation, but the conclusive results about the proposed method is presented in Section 3 (Results).

Estimating ground speeds and cruise altitudes
Since the ANNs failed to predict the ground speeds and cruise altitudes, complementary methods were proposed to improve the flight path data-filling algorithm.To predict the ground speed, an average-based method was used to calculate the average ground speed according to the percentage of flown distance compared to the total waypoint distance.We assume the average as an acceptable value because this dimensional data flight has a low operational pattern.Thus, each percentile would have an average value for the speed. Where: GS k : average ground speed at the k-th percentile of the flown distance GS ik : ground speed in the flight i at the k-th percentile of the flown distance n: number of performed flights in a route The altitude value for the cruise phase in the generated flight path is the last altitude of the input waypoints -typically the top of climb waypoint.In most of the flights, top of climb waypoint is the average value for the flight altitude at cruise phase.A waypoint is considered to be in cruise phase if its distance from the departure airport is larger than the average distance from the flights starting cruise phase, or if its altitude is within a margin that varies for each route.The end of the cruise phase is considered to be the average distance from the origin airport where the flights start the descent phase.

Grouping flights
Some flights had discrepancies in their data, either because of multiple paths or due to varying altitude levels in the cruise phase.In cases where the discrepancy was evident (especially in routes where there was heavy traffic such as DUB-STN), the grouping was necessary for the ANNs to achieve a converging result.The k-means algorithm (WU et al., 2008) was used to group flights according to maximum altitude, latitude or longitude, and the distance measured was the squared Euclidian distance.The number of groups (parameter k) was chosen based on the number of typical paths (generally k = 2).

Metrics
The adopted metrics used to assess the accuracy of the generated flight path are detailed below.The first one is the waypoint distance, proposed by Almeida (2017), used to calculate the difference between the distances of the generated and the real flight; then the correlation, which is a method to show the correlation of two vectors, and lastly the estimated fuel burn that demonstrates the aircraft performance of the generated flight paths (ALMEIDA, 2017).

Waypoint Distance
The Waypoint Distance (WPD) is a measure of the distance between two waypoints.The equations used to calculate this metric are: (2) (3) Where: W s : Estimated distance flown through a set of waypoints S (km); n: Size of the set of waypoints S; W p,q : Estimated distance flown between a pair of sequential waypoints p and q (km); G p,q : Great circle distance between a pair of sequential waypoints p and q (km); h : Variation of altitude between the waypoints p and q. (km).The waypoint distance metric was used to compare generated flight paths and real flight paths.An error is calculated by dividing the difference between the total WPD of the two paths (real and generated) by the total WPD of the real flight, and then multiplied by 100.The final result for this metric is the mean of all the errors by route.

Fuel Consumption
A complementary metric used to compare the generated flight paths and real flight paths was the estimated fuel burn.The method used to estimate the fuel consumption of the flights is in accordance with the Base of Aircraft (BADA) model, proposed by Eurocontrol (POLES et al., 2010).According to the BADA User Manual (version 3.12) released in 2014, the BADA equations are mainly projected to calculate aircraft performance in the context of trajectory simulations and air traffic management (ATM) systems (EUROCONTROL, 2014).This model has equations that can be used to generate trajectories and calculate fuel consumption for these trajectories.Some studies have been proposed to improve BADA model.This work considered adaptations in BADA equations proposed by Oaks et al. (2010), which enabled fuel burn estimation from actual flight path data (e.g.ADS-B dataset).
The BADA manual proposes equations to calculate the aircraft fuel consumption rate (kg/min).Essentially, what Oaks et al. (2010) and Belle and Sherry (2013) suggested is to multiply the calculated BADA fuel flow value by a certain time duration t , and then obtain the fuel burned in this interval.In this work the t variable is assumed to be the flight time elapsed between a pair of waypoints.To obtain the overall fuel burned, the process has to be repeated for all sequential pairs of waypoints (OAKS et al., 2010;BELLE;SHERRY, 2013).

Correlation
Correlation is a method of comparing two curves which returns a number between -1 and 1 based on how correlated the curves are.The Pearson product-moment correlation coefficient method (EDWARDS, 1976) is used in this paper.This method requires all points in both curves to be aligned with a certain parameter in common (time in the case of this paper).For this reason, it was necessary to interpolate waypoints in the generated flight path to be able to apply the method.For the final correlation result, the average of the correlations of each flight was calculated.
The required interpolation which created new waypoints was done with physical and mathematical equations that would interpolate a new waypoint with latitude, longitude, altitude and ground speed at a specific required instant t.The equations 4-7 are used to create the interpolated point.Figure 6 illustrates interpolation of waypoints.
Where: a: Acceleration (m/s 2 ); v: Velocity (km/h); dij: Distance from initial point i to the final point j (km); dif: Distance from initial point i to the interpolated point f (km); t: Time (s); i: Initial point; f: Interpolated point; j: Final point; C can be latitude, longitude (degrees) or altitude (km); The final result for this metric is the mean of all correlations for each flight path.

RESULTS
This section presents the results acquired with the above described method.Section 3.1 illustrates a single actual flight path and its generated path.Section 3.2 presents the results of the correlation between generated and real flights path.Section 3.3 demonstrates that the method can satisfactorily approximate the real path data by analyzing the distance and fuel burn metrics.

Visual Analysis
In other to illustrate the potential performance of the proposed method, the graphs in Figures 7, 8, 9 and 10 show a selected flight in the LHR-HAM route (best result to the proposed algorithm).The blue dots are the generated data and the orange line is the real data.Visually, it is possible to observe the generated waypoints are very close to the actual flight path in terms of latitude, longitude and ground speed.The altitude in the cruise phase was not so accurate for this single selected flight, because it was assumed the average top of climb altitude of all flight in this route.In this case the average top of climb was 8.7 km vs 11 km of the single real flight path.
For all flights, it was noticed that the generated flight paths were visually close to a typical flight.This fact is due to the generalization characteristic of ANNs.The ANN fits all data used in its training.An advantage of this method is that it could be used to generate a typical route that could be set as a flight standard for airlines.It could also be a comparison tool to assess how close the real flights are from the typical route and provide information about the efficiency of the trips.

Correlation Analysis
Correlation is a more significant factor for this work since it compares each real waypoint with its respective generated/interpolated point, thus providing a better estimation of how discrepant both flight paths are.Each real flight path was compared with a correspondent generated flight path in terms of correlation, and then the percentage of difference (error) between the real and generated flight where obtained.For each route the average of the percentage of difference (error) between the real and generated flight paths were calculated as shown in Table 2.All routes showed correlations above 92%, with less errors in latitude and longitude.Some outliers are very explicit, such as in the case of latitude in the CNF-VIX route.In this case, the data presented two ways to approach the VIX airport and the ANN only predicts one.This lowered the correlation in flights that landed differently from the "usual." In the ground speed column, one can see that all results are relatively high (compared to altitude) and the EDI-CDG route in particular showed an even greater discrepancy.This route varies greatly in speed during the cruise phase.The 95th percentile of difference between the average maximum speed and all the flights is of 154 km/h.Climate conditions and the pilots' way to fly aircrafts impose nonstandard ground speed to the aircraft.Since the algorithm that predicts the ground speed takes the ground speed average to complete the results, it shows a larger error for this case.

Distance and fuel consumption analysis
The distance difference metrics is less significant than the correlation because of the large standard deviation of the differences between flight paths.As shown in Table 3, all the distance differences computed for the flights did not exceed a positive 10%.The positive values mean that the generated flight path is shorter than its respective real one.The high values for this metrics can be explained by circumstances such as air traffic, weather conditions and factors that could affect the flown distance.The fuel burned analysis is similar to the distance analysis.These metrics comparison tend to be very close, since the fuel burned is proportional to the distance flown.The standard deviation for distance and fuel varies from 1.36 to 3.78 percentage points and from 2.27 to 6.50 percentage points respectively.In most cases, the algorithm predicted shorter flights and less fuel burned.
Figures 11 and 12 show scatter graphs with the 517 analyzed flights and routes in terms of distance flown and fuel burned.The generated flight path tends to be longer in terms of distance and fuel consumption because of the smoothing (nonlinear) behavior of the proposed method in ascent and descent flight path phases.The real flight path tends to be shorter because they are optimized by aircraft computers and airspace authorities.A post processing method could be used to optimize the generated route in the same way the pilots used to do after receiving the flight plan.In conclusion, since the generated flight paths are not underestimated by the real ones, the proposed method was shown to be robust.

Figure 2 -
Figure 2 -Illustration of flight path data-filling algorithm and its adopted techniques Technique Flight Phase Dimension

Figure 5 -
Figure 5 -Flights grouped by k-means algorithm

Figure 5
Figure 5 illustrates flights grouped by k-mean (k = 2) in the route DUB-STN.Group 1 has 210 flights, while Group 2 has 7 atypical flights.The flights in Group 1 were considered in the ANN training task.Only the groups with more flights per route were considered in the ANNs training task.

Figure 10 -
Figure 10 -Real vs generated ground speed

Table 1 -
Dataset adopted in the analysis

Table 2 -
Error and standard deviation between the real and generated routes Source: Elaborated by authors.

Table 3 -
Mean and Standard deviation for Distance and Fuel Burn Source: Elaborated by authors.