Predicting Airplane Go-Arounds Using Machine Learning and Open-Source Data

Go-arounds (GAs) are standard air traffic control procedures during which aircraft approach a runway but do not land. The incidence of a GA can subsequently affect the workload of flight crews and air traffic controllers, and might impact an airport runway’s throughput capacity. In this study, two different modeling methods for predicting the occurrence of GAs based on open-source Automatic Dependent Surveillance–Broadcast (ADS-B) and meteorological data are presented. A macroscopic model quantifies the probability of a GA within the next hour for an airport by applying a generalized additive model. A microscopic model employs a number of machine learning classifiers on trajectories of aircraft on approach in order to predict if a GA will be performed. Even though the results of the macroscopic model are promising, the information currently available to predict the probability of a GA is not detailed enough to achieve satisfactory predictions. Similarly, the microscopic model is capable of predicting 50% of all GAs, with false positive rate below 7%. Despite the limitations of the quality of the results, the authors are convinced that both modeling methods can be inspiring to other researchers and provide useful insights into the airport system under scrutiny.


Introduction
Go-arounds (GAs) and missed approaches are air traffic control (ATC) procedures where an aircraft approaches a runway but does not land [1,2]. Common causes for GAs are runways occupied by other aircraft (e.g., due to slow departures) or unstable approaches (UAs), for which the pilots deem the approach not to be within safety margins. UAs are often caused by poor visibility, by unfavorable winds (e.g., crosswinds and tailwinds), or due to approaches flown "too fast, too low, and too close" [3,4].
Internal research at Airservices Australia (the Australian air navigation services provider) has shown that the probability of one GA is 18 times higher within ten minutes of a previous GA. Additionally, the probability of a UA can be accurately measured from energy loss variation [5] and glide slope deviation. Airservices' exploratory work shows the potential of predicting GAs. Furthermore, it highlights the need for further research to determine prediction methods that can be of practical use and that have the potential to be operationalized.
In this paper, methods to predict the probability of GAs are explored. As such, GAs on runway 14 of the Zurich International Airport, Switzerland are predicted. GAs on runway 14 are of particular interest, since the published GA procedure interacts with the standard instrument departure procedures for runway 16, as illustrated in Figure 1. This interaction, which produced an aircraft proximity event in 2003 [6], is well known to the relevant stakeholders. Subsequently, mitigation procedures have been put in place, which ensure the separation between GAs on runway 14 and departures on runway 16. However, these procedures limit the capacity of the airport. Consequently, predicting GAs on runway 14 could help increase capacity. In this paper, two different approaches to predict GAs are presented: one on the macroscopic level and one on the microscopic level. Both methods presented are entirely based on open-source data, namely Automatic Dependent Surveillance-Broadcast (ADS-B) data and meteorological data. The macroscopic model estimates the probability of a GA in the next hour based on information that is available at the time of prediction. To the best knowledge of the authors, such a model has not been presented in the literature. The microscopic model predicts for each landing aircraft at a given cut-off point on the approach whether a GA will be performed after this point or not. A microscopic approach was pursued by Searidge using neural networks to continuously calculate the probability of a GA for each landing aircraft. However, no details about the methods or the performance of the system have been published [7,8].

Pre-Processing and Go-Around Detection
In this study, ADS-B trajectories provided by OpenSky Network ( [9]) were used. The data cover the period between 1 January, 2017 and 30 June, 2020, and were re-sampled at 5 s intervals. Trajectories may contain outliers and noise, which were subsequently removed with a Savitzky-Golay filter ( [10]). The barometric altitude of the observed flights is affected by the atmospheric pressure at the time of observation. This deviation was compensated for by computing the geometric altitude for each landing trajectory using the known elevation of runway 14. Furthermore, the trajectory data were augmented with aircraft data from [11] and meteorological data from [12].
To classify trajectories, a simple and case-specific rule-based approach was used, given its simplicity and robustness:

1.
Landings are defined as trajectories, whose last five observations are (i) within a polygon demarcating Zurich Airport's limits and (ii) below 600 m above mean sea level.

2.
Landings on runway 14 are defined as a subset of landings observed at Zurich Airport. To be classified as a landing on runway 14, trajectories (i) stay for at least 5 min within a specifically defined approach corridor (see Figure 2) and (ii) have a heading between 126 and 146 degrees during this time.

3.
GAs are defined as trajectories, which (i) first perform an approach, (ii) leave the approach corridor, and subsequently fly for more than 6 min at an altitude above 800 m above mean sea level. For alternative GA classification methods, the reader is referred to Proud [13].
To this end, almost 250,000 landings on runway 14 were identified, of which with 850 were GAs. By visually inspecting the GA data, false positives and outliers (e.g., instrument landing system (ILS) calibration flights, training flights, waiting loops on approach, noisy measurements, etc.) were eliminated, which led to 715 GAs being selected for this analysis (i.e., the GA rate is approximately 3 GAs per 1000 landings).

Macroscopic Model: Prediction of GAs in the Next Hour
The aims of the macroscopic model are twofold. First, it allows valuable insights into the airport system under scrutiny by quantifying the probability of having a GA within the next hour. Second, the model is a helpful tool to facilitate short-term capacity planning for an airport system. For example, a decreased probability of having a GA in the next hour might justify a smaller temporal separation between landing and departing traffic (clearly, a lower probability alone is not enough to decrease the separation between landing and departing traffic; other factors, such as the risk of a wake vortex encounter, must be taken into account as well). The macroscopic model relies exclusively on data that are available at the time of prediction. As such, inputs into the model are the expected traffic of the next hour (e.g., number of landing aircraft, fleet mix, etc.) and meteorological observations derived from METAR data(METAR is a standardized format for reporting meteorological observations at airports) [12].

Modeling
The probability of having at least one GA in the next hour is modeled with a generalized additive model (GAM). The proposed GAM uses a Bernoulli distribution to model the binary response y i ∈ {0, 1} or, expressed in practical terms, y i ∈ {no go-around, go-around}. The probability of "success" (i.e., a GA) is typically expressed as being conditional on the predictors as where the subscript i indicates the i-th observation, p i is the probability of success of the i-th observation, and E[·] is the expected value. For example, a non-linear logistic regression model can be expressed as In Equation (1), the left-hand-side term is called the logit or log-odds function of p i , and links the predictor ) to the conditional expectation p i . The predictor η i is an additive model that can contain both linear and non-linear terms (with the subscript j). In Equation (1), the linear terms use parameters β j , while the non-linear terms are expressed with smooth functions f j (x j ). The flexibility to model a predictor's effect on the dependent variable as an additive model makes GAMs ideal for data exploration and for finding suitable model structures. Similarly to linear regression models, GAMs provide statistics on the significance of predictors and other metrics. GAMs are fitted by penalized maximum likelihood. For further information on GAMs, the interested reader is referred to [14,15].
It is worth mentioning that maximum likelihood logistic regression yields asymptotically unbiased estimates. In case of imbalanced data, this assumption might not hold because of the "finite sample and rare events bias" [16]. Particularly, this is problematic when only few observations of the rarer class are available. In this study, 819 h were observed during which at least one GA was performed, out of 15,791 h during which landings on runway 14 were conducted. Even though only about 5% of the observed operational hours contain GAs, the absolute number is assumed to be sufficient according to the simulation studies in [16].
The macroscopic model is based on environmental and aircraft-related predictors that are assumed to be knowable in advance (see Table 1). The GA prediction was made every full hour for the periods where at least one landing was observed on runway 14. The environmental predictors used in the model are based on the last METAR observation available before the prediction, usually at 10 min before the hour. Aircraft-related predictors consist of the number of landings in the next hour as well as an indicator describing the observed probability of a GA for a group of aircraft types (see Table 1).
A suitable model for predicting the probability of having at least one GA in the next hour is: (2)

Results
The estimated coefficients for the linear terms, which can be interpreted in the same way as in a linear regression model, are shown in Table 2. For example, the model predicts, all else being equal, that the presence of a thunderstorm in the vicinity of the airport increases the odds of having a GA by a factor of exp(0.95) = 2.6. The non-linear terms can also provide useful insights into the system and are shown in Figure 3. For example, the model predicts that the effect of headwind on the probability of having a GA is highly non-linear. It can be seen that the probability of a GA increases substantially for tailwinds (negative headwind) exceeding 4 knots. Similarly, a crosswind of up to about 5 knots does not affect the probability of a GA, but that probability increases non-linearly after that. The interpretation of the effect of the number of landings is less clear, since the confidence intervals are quite large. These large intervals, particularly towards the end of the scale, indicate that the fitted smooth might not be correct. In this case, this is due to the fact that only a few observations exist with such high numbers of landings on runway 14.  Figure 4 shows the predictions of the model for both GAs and landings. Unfortunately, the predicted probability distributions do not differentiate the landings from the GAs well enough for practical applications. This result was expected to some extent, as the input data used are highly aggregated and do not contain enough information. For example, the model does not know if the weather reported in the METAR will change during the next hour. It would be possible to increase the fidelity of the model by introducing additional predictors; for example, as suggested in [4], delays and the amount of ground traffic, or the Terminal Aerodrome Forecast (TAF) section of the METAR, which indicates the future trends of meteorological conditions. However, the authors do not expect a significant improvement to the model. Instead, in the next section, a different approach is proposed.

Microscopic Model: Prediction of the GA Probability for an Approaching Aircraft
The aim of the microscopic model is to estimate, for each approaching aircraft, the probability of a GA before it is initiated. As such, this information is highly valuable to air traffic control, as potential conflicts can be better anticipated and operation is more efficiently managed.
Following a method presented by Dai, the probability of a GA is estimated at a 'cut-off point' [17], which, in this paper, is selected to be a point on the extended centerline of runway 14, located 10 km in front of the threshold. By applying the defined cut-off point, 97% of all observed GAs (695 trajectories) are selected for use in the microscopic model (see Figure 5). Subsequently, for each observed trajectory labeled as an approach to runway 14, the 10 last observations (at a sample rate of 5 s, 10 observations correspond to 50 s of trajectory data) in the approach corridor prior to the cut-off point are used in the microscopic model.

Feature Engineering
The microscopic model requires additional features, which have to be engineered from the data. These features are (i) a number of stability metrics relating to the stability of an aircraft on the approach, (ii) information on the lead-trail relationship of two aircraft that are approaching simultaneously, and (iii) environmental and aircraft-related information.

Stability Metrics
To describe the stability of an approach, three different metrics (see Table 3) are used: (i) the flown glideslope angle (the glideslope refers to the vertical guidance of the instrument landing system (ILS); for instance, runway 14 at Zurich Airport is equipped with an ILS with a 3 • glideslope angle), (ii) the deviation from the localizer (the localizer refers to the lateral guidance of the ILS), and (iii) the specific potential and kinetic energy of the aircraft. The flown glideslope and localizer deviation are depicted in Figure 6. Glideslope angle α gs , with aircraft at a height above runway h ar and at a distance d thr from the threshold δ cl = dist(airplane, centreline) Localizer deviation δ cl : 2D distance between airplane and centerline Aircraft specific energy E s (also known as energy height), with aircraft at a height h, ground speed V, and the gravity constant g [3] Figure 6 shows the profiles and the mean value distributions of the used stability metrics α gs , δ cl , and E s for a subset of 1000 randomly chosen landings and the 695 identified GAs. Despite an overlap of the distributions of GAs and nominal landings, some GA observations have the tendency to have higher values. Consequently, this indicates that the proposed stability metrics might be useful predictors for unstable approaches.

Lead-Trail Relationship
When two aircraft are simultaneously approaching the same runway, ATC must ensure spatial separation between the airplanes at all times. Dai [17] suggests that the probability of a GA is substantially increased if (i) there is a loss of separation, and (ii) the closing rate between the leading and the trailing aircraft is "too high". To model the lead-trail relationship, the time to leader t TL is used in this paper, which is defined as: t TL = ∆ TL ∆ V TL , where ∆ TL is the relative distance and ∆ V TL is the relative speed between the leader and the trailer. Table 4 shows that minimum time to leader observed during the approach has a positive impact on GA rates. Additionally, the authors expect the wake category of the leading and trailing aircraft to have an impact on GA rates. Subsequently, both minimum time to leader as well as wake categories are used as features in the microscopic model. Table 4. Time to leader and observed GAs.

Environmental and Aircraft-Related Information
The results of the macroscopic model suggest that weather-related features such as visibility, wind, or the presence of thunderstorms in the vicinity of the airport affect the probability of a GA. Additionally, aircraft-related information, such as the aircraft type, the GA category (the GA category was derived from the data by estimating the probability of a GA for each aircraft type), and a binary factor identifying whether the airline operating the flight is a home carrier (i.e., has Zurich as its home base), are used in the microscopic model.

Modeling
For the microscopic model, a number of different classical machine learning (ML) classifiers were trained and the corresponding results were compared with each other. The heavily imbalanced data (approximately 3 GAs per 1000 landings on runway 14) constitute a major challenge for the application of classical ML classifiers. As such, Chawla [18] presents several sampling strategies and ensemble-based methods to deal with imbalance. In this project, a downsampling of the majority class (i.e., the landings) has been applied. The data used to train the ML classifiers consist of 1000 random samples of the non − GA set and all 695 observed GA samples. This approach introduces a bias into the predicted probabilities. However, since the microscopic model is used solely for classification of trajectories into the sets y i ∈ {no go-around, go-around}, a correction of the bias (e.g., as presented in [19]) is not applied. In order to decorrelate the extracted model features (see Table 5), a principal component analysis (PCA) transformation was applied (see [20]) to the data used for the ML classifiers. In this paper, the following ML classifiers provided in the Scikit-learn package [21] were used: Linear Support Vector Machine, Round Based Function Support Vector Machine, Random Forest, AdaBoost, Naive Bayes, Quadratic Discriminant Analysis, and Gradient Boosting Classifier. Indeed, the classifiers were trained with 50% of the downsampled data, while the other 50% was used for testing and validation purposes.  Figure 7 shows the receiver operating characteristic for each trained model applied on all of the data for runway 14, covering 242,959 approaches. The true positive rate is indicated on the vertical axis, while the false positive rate is displayed on the horizontal axis for different classification thresholds. It can be seen that the Random Forest classifier is the best performing, since it manages to detect half of the GAs with a false positive rate below 7%. While this result seems to be rather good, the number of nominal landings being much higher than the number of GAs, only 2% of the predicted GAs are actual GAs observed in the data. Figure 8 depicts a comparison of the GA initiation distance with respect to the runway threshold for GAs the model manages to predict (true positives, displayed as yellow-green) and (b) for the ones that have not been predicted by the classifier (false negatives, displayed as red). It can be inferred that the model does not perform well at predicting GAs that are initiated close to the runway threshold, while it performs better at predicting GAs that are initiated further away than 3 km from the threshold. False negatives can be explained by information not present in the model, such as ATC intervention or runway occupancy. Additionally, Figure 9 shows the stability metric profiles and mean distributions for false positives and true negatives, which can be explained by the following factors. First, there is a high chance that the flight crew is capable of stabilizing an unstable approach identified at a cut-off point located 10 km from the runway threshold. Second, noisy or inaccurate data might contribute to abnormally high approach metric values.

Conclusions
In this document, a microscopic and a macroscopic approach for determining the probability of a GA are presented. The macroscopic model aimed at estimating the probability of a GA within the next hour based on information that is available at the time of prediction, while the microscopic model predicted the probability of a GA for each landing aircraft at a cut-off point on the extended runway centerline, located 10 km in front of the threshold. Even though neither of the models performed well enough for real-world applications, they both provide valuable insight into the system. The macroscopic model can be used to understand the nonlinear relation between environmental and operational features and GA occurrences, while the microscopic model could be used as a basis for the development of a tool to assist ATC in identifying potential GAs. A possible extension of this work is the modeling of the probability of a GA not for a single cut-off point, but rather for a large number of points covering an entire section of the approach corridor. Even though the results presented in this paper are somewhat non-conclusive, the proposed methods might inspire other researchers in the area of GA prediction.