Machine Learning in Orbit Estimation: a Survey

Since the late 1950s, when the first artificial satellite was launched, the number of Resident Space Objects has steadily increased. It is estimated that around one million objects larger than one cm are currently orbiting the Earth, with only thirty thousand larger than ten cm being tracked. To avert a chain reaction of collisions, known as Kessler Syndrome, it is essential to accurately track and predict debris and satellites' orbits. Current approximate physics-based methods have errors in the order of kilometers for seven-day predictions, which is insufficient when considering space debris, typically with less than one meter. This failure is usually due to uncertainty around the state of the space object at the beginning of the trajectory, forecasting errors in environmental conditions such as atmospheric drag, and unknown characteristics such as the mass or geometry of the space object. Operators can enhance Orbit Prediction accuracy by deriving unmeasured objects' characteristics and improving non-conservative forces' effects by leveraging data-driven techniques, such as Machine Learning. In this survey, we provide an overview of the work in applying Machine Learning for Orbit Determination, Orbit Prediction, and atmospheric density modeling.


I. Introduction
It is estimated that more than 36,000 objects larger than 10 centimetres, and millions of smaller pieces, exist in Earth's orbit [2]. To safeguard active spacecraft, it is necessary to accurately determine where each Resident Space Object (RSO) is, and where it will be, at all times. To create such a complex body of knowledge, the larger problem of orbit estimation is divided into smaller problems, each one largely complex but with a specific goal in mind. In this review we observe three main sub-problems: Orbit Determination, Orbit Prediction and Thermospheric Mass Density.
• Orbit Determination (OD): the OD is the determination of the orbit of the object based on observations. The Extended Kalman Filter [3] (EKF) is the de facto standard for orbit determination in real-world scenarios. The accuracy of this process depends on the amount of sequential observations used to determine the orbit, and the type of observation, e.g., laser ranging and GPS tracking is far more precise than optical observations. The output of this method is commonly a Gaussian probability distribution of the orbit of the object, usually represented through a vector consisting of the object's estimated position and velocity and a covariance matrix reflecting the uncertainty. Currently, this process is limited by the assumptions of the EKF, by lack of knowledge in RSO's shape and attitude and dynamic model simplifications, which we will further examine in Section III.
• Orbit Prediction (OP): Orbit Prediction is the process of predicting the future position and associated uncertainty of any given RSO. Two method families exist for OP: one pursuing analytical solutions, with the other exploiting numerical approximations. Numerical methods are time-consuming but precise, while the analytical methodologies are simpler and faster. To be tractable, these algorithms hold simplifying assumptions that hinder accuracy [4].
Each method is limited by OD in the sense that a orbital state with high uncertainty will necessarily evolve to have an inaccurate orbit prediction. When used in a Collision Avoidance scheme, this process is used to propagate the state of the RSO until the time of closest approach (TCA) to any actively monitored satellite. The current limitations in orbit prediction are the Gaussian assumption -which does not hold against the true distribution [5] -the simplified modelling of perturbation forces, the unknown information regarding RSO characteristics and the uncertainty over space weather forecast data.
• Thermospheric Density Mass: A set of force models determine the acceleration of any RSO. Earth's gravity potential is the most meaningful, but solar and lunar gravitational attraction and Earth/ocean tides all have an effect on the RSO [6]. Of the non-conservative forces, the most relevant for objects in LEO is the satellite drag.
Being applied in the opposite direction of the RSO's velocity, this force is the largest source of uncertainty for most RSO's in LEO [7] and correctly determining atmospheric density is key to calculate satellite drag. All state-of-the-art models currently used for density estimation are empirical (data-driven) models that have been consistently updated since the last century. The lack of predictive capability and uncertainty estimation are the two main drawbacks of these methods.  Table 1 we have a summary of the input/output for each task. In the following sections, we will do a thorough review of published work using machine learning techniques to help increase space safety, protecting spacecraft, while working to diminish the amount of debris orbiting the Earth. Each section will start with a brief description of the task, followed by a state-of-the-art review of the classical methods currently in operation. The machine learning body of work will be presented in semi-chronological order, from the initial use of historical data to improve physical models to the most recent developments in the area, that use highly complex models. The next section will briefly define the physical forces that determine the path, or orbit, of a satellite or space debris.

II. Forces Model
The equation of motion of a space object in a Cartesian ECI (Earth-Centered Inertial) coordinate system [5] can be written in the form where is the position vector, [ ] = , and ⊕ is the gravitational constant multiplied by the masses of the Earth and the RSO, with the perturbing forces being For a given initial condition ( 0 ) = 0 , the position of a satellite at any point in time can be implicitly written as the solution of the equation flow : The perturbation of Earth's gravity potential, a , is due to the fact that Earth is not a perfect sphere. This force is modeled using spherical harmonic potential equations, with varying level of precision depending on degree. The most recently published model, the Earth Gravity Model (EGM2012), has coefficients up to degree 2190 and it is accurate to 0.1 m [8]. The -Body perturbations, a / , affect the trajectories of RSOs in orbit, with the Solar and Lunar perturbations being the two most relevant forces on the space object's body. Other planets in the solar system, such as Jupiter, have a very minimal impact on satellites. The atmospheric drag, a , especially in the Thermosphere layer (85/125 − 600/1000 km), can lead to significant effects, and it is responsible for the decay of orbits in Low Earth Orbit (LEO). Using the canon-ball model, the aerodynamic drag can be represented by [9]: where is the local atmospheric mass density, is the drag coefficient, is the cross-sectional area, the mass of the space object, and is the relative velocity of the object with respect to the rotating atmosphere. For most satellites, the mass and cross-sectional area is known in advance, and is assumed to be constant for the lifespan of the satellite, despite slight changes that might occur, due to, for example, fuel consumption. This is however not the case for the majority of RSOs, especially space debris that originated from breakups. Taking that into account, it is common to define a combined parameter, the Ballistic Coefficient ( = ), which integrates mass, area and drag coefficient to be calculated as an extra feature during Orbit Determination. The local mass density is the variable obtained from the Thermospheric Density Mass models, which by itself is particularly challenging to model, as we will see in Section V.
The Solar Radiation Pressure, a , is the force that arises from the absorption and reflection of light from the Sun.
It depends on the cross-sectional area and mass, but unlike atmospheric drag it does not depend on altitude. It is the dominant non-gravitational perturbation on higher altitudes (MEO and GEO) and for High Area to Mass Ratio (HAMR) objects [5]. Of particular importance to the representation of this force is the determination of the entry and exit times of a given RSO from the shadow of the Earth, along with the determination of the force orientation, as the object-Sun vector is always changing and with it the cross-sectional area.
Tidal effects, a , are also taken into account in orbital determination, with the small periodic deformations of the solid body of the Earth referred to as solid Earth tides, which, alongside ocean tides, impact Earth's gravity potential [6].
Others perturbations include General Relativity theory adjustments, geomagnetic pulls, Earth's albedo and atomic clock corrections.

III. Orbit Determination
This is the process of using observations to obtain a state vector ∈ R of a given space object [11]. In this survey space objects will be all objects, either satellites or space debris, currently orbiting the Earth. A state vector of = 6 is sufficient to represent the velocity and position of the satellite at a given time, but others variables, such as the Ballistic

Orbital Determination
Orbital Propagation Fig. 1 The two main stages of orbital estimation necessary for collision avoidance and debris and satellite tracking. The information obtained from OD is used as input for Orbit Prediction (OP). Figure adapted from [4] and [10].
Coefficient (BC), or clock bias can also be calculated as part of the state vector [12]. This state vector can also be represented by the traditional Keplerian elements ( , , , Ω, , ) or by equinoctial elements ( , ℎ, , , , ℓ). The type of observations used for this purpose are radio, radar, laser or image; with observations usually divided between cooperative or non-cooperative observations [10]. As the name suggests cooperative observation is when the object in question, e.g., a satellite, has a built-in capability to respond to tracking. This gives the Ground Station (GS) the ability to know with detail the position and ID of the object. In this case we have a significant amount of observations across the period in which the GS, or GS's, are communicating with the satellite. The non-cooperative observations are necessary to track defunct satellites or space debris. In this case the identification of the satellite is a necessary part of the process, and the number of observations, usually radar, are relatively small for each space object.
The state-of-the-art approach for this problem is the Extended Kalman Filter (EKF) [4]. This method is an approximation of the Kalman Filter to non-linear systems, such as the equations of motion of a space object. Its limitations include the use of only the first order Taylor series expansion to approximate the system to a linear system, as well as the assumption that the noise, measurement and process, is Gaussian [13]. Despite that, the applicability of the filter for sequential data, alongside providing a probability distribution of the estimate state, proves to very useful and justifies the current use. Because of the relevance of the EKF method to orbit determination, we will briefly explain the algorithm.
The intended goal of this method is to combine knowledge of the dynamical system with observations, therefore the dynamical and observation equations can be represented as [13,14]: where ( ) ∼ (0, ( )) is the process noise, is the measurement, ℎ( ) the measurement function and the measurement noise following a zero-mean Gaussian with covariance , ∼ (0, ). Through this general formula we can observe that the formula combines discrete observations with continuous dynamical equations. When ( , ) and ℎ( ) are linear functions, and the initial distribution is Gaussian, with ( ) and expressing white noise (uncorrelated Gaussian distributions with zero-mean), the Kalman Filter is the optimal filter [15]. A known and much studied filter in the automatic control and time-series research fields, a great deal of books [16][17][18] present in depth explanations of the Kalman filter, which can be summarized as a two-step algorithm (predict and filter) as illustrated in Figure 2.
The Extended Kalman Filter is applied when the linearity assumption does not hold, in particular the linearity of ( , ) and/or ℎ( ). To do so, and before the KF algorithm in Figure 2 ought to be used, both equations are linearized using the first order of Taylor's approximation, at each observational moment : Note in particular in Figure 2 that the prediction step for the a priori mean (ˆ| −1 ) does not need a linear function, it is the covariance matrix and the Kalman Gain matrix that require the simplification of the dynamical and measurement equations. This is exactly the intuition behind improvements over the base method such as the Unscented Kalman Filter (UKF) [3], the Second Order Kalman Filter [19] and others methods referred to in Lee [20]. These methods more closely represent the non-linearity of the problem, but are, nonetheless, Gaussian or linear approximations. Sequential Monte Carlo methods such as the Unscented Particle Filter [21] or the Divided Difference Particle Filter [20] are also popular, and shown to be superior to the EKF by [22] and [23], with the disadvantage of high computational costs [23]. The UKF in particular sits on the border between Monte Carlo and linear methods, with deterministically chosen sigma-points (X) [3], made to represent the Gaussian distribution, being independently propagated across time through

Fig. 2 General Kalman Filter Diagram.
is the state transition matrix at time , is the Kalman Gain at each observational moment, is the linear measurement mode at time step , and the output is the estimated mean (ˆ) and covariance (Σ ) at time step .ˆ| −1 and Σ − are internal components representing the a priori prediction before the observation is introduced. +1 = ( +1 ; , ) and then used to derive the mean and covariance at the + 1 time step.
More recently, the non-linearity of the problem as been tackled through the use of machine learning and data-driven techniques. Hartikainen et al. [24] used Latent Force Models (LFMs) to combine the dynamic system with a more freely defined measurement component, implementing Gaussian Processes to model measurement noise, and using a machine learning approach to define the covariance structure of the Gaussian Process [25,26]. This method improves upon the Kalman Filter by removing the uncorrelated white noise assumption and replacing it by a time-varying GP. In [27], it is presented a general approach to OD using distribution regression. It is shown that under a known and continuous dynamic system, with known spacecraft characteristics and observable orbital parameters, there is a possible mapping from the probability distribution of the observed samples to the orbital parameters. This method can be used either to estimate the posterior distribution of the orbital parameters based on full batch observations, or to identify the ID of the spacecraft associated with each sample. In [28], Sharma extends this work and presents a method for collaborative and non-collaborative orbit classification, by estimating the probability of each sample to represent the distribution associated with its ID. This technique is particularly useful to to identify satellites with identical Radio Frequency (RF) transmissions or very close orbits, but was not extended to more than 2 spacecrafts. The distribution regression approach was also followed in [29], extending the process to observations from more than one Ground Station (GS).
Different authors [30][31][32] independently presented an improvement over the current methods for OD by using Gaussian Mixture Models (GMMs) to model the estimated distribution. As a GMM is known to approximate any distribution given a sufficient number of Gaussian distributions [33,34], it is used to approximate the real distribution of the state during the OD period. In particular, this allows for a more accurate characterization of the distribution of , being the state at any given time during OD, while also being particular useful for the step of Orbit Propagation (OP), as all current methods are capable of propagating a Gaussian distribution across time, and are trivially scalable to a finite number of independent Gaussian distributions. This work was extended by [35] and [36] by adapting the GMM weights across time, and further enhanced by Vittaldev [37,38] who presented an univariate splitting library along arbitrary directions that is used to generate a Multidirectional Gaussian Mixture Model. In [39], Horwood et al.
also proposed a new distribution to improve OD -the Gauss von Mises (GVM) -using this distribution in tandem with a particular set of orbital elements, the 2 equinoctial orbital elements, to reduce the non-linearity of the process.
The GVM is the counterpart distribution of the Gaussian distribution for circular support. In a series of publications

IV. Orbit Prediction
Orbit Prediction can usually be seen as the process of propagating a known orbital state to a future orbital state. The current physics-based methodology is to solve the differential equations of motion either analytically or numerically to predict the mean state at a future time. The numerical methods are time consuming and the sheer number of space debris objects orbiting the Earth makes them only usable for academic or non-real time applications [42]. In general, the non-linear stochastic dynamic system can be defined using the Itô stochastic differential equation [43]: where ( , ) is the deterministic part of system, as seen in Equation (1) where ∇ is the gradient in viewed as a column operator. Solving this equation would provide a complete description of the position and uncertainty of the orbit at any time > 0. However, there is no analytical solution to most FPKE, with orbital mechanics being particularly hard to solve due to the non-linearity of the perturbed dynamics in a state space with six dimensions. The orbit prediction numerical approximations of this solution usually entail a set of assumptions to allow the problem to be solvable. In this review, before jumping to the advances in the field, we are going to present the more common approach for Orbit Prediction, where the output from the EKF, a Gaussian Distribution with mean ( 0 ) = 0 and covariance Σ 0 is propagated according to a first-order systems of SDEs. Using the SDE in Equation (8), assuming the noiseless case ( ( ) = 0) and the flow solution (3), the mean and covariance at time is: where the flow solution ( ; 0 , 0 ) and State Transition Matrix (STT) Φ( ; 0 , 0 ) are solved by: where Φ( 0 ; 0 , 0 ) = and, in the case of the EKF, the linearly propagated covariance is obtained with ( ( ; 0 , 0 ), ) as the partial derivative of the function ( ( ), ) in Equation (1): To solve the equations in (10) there are many numerical methods, with some numerical integration methods specialized for orbit propagation, as seen in [6] and [45]. The most relevant numerical integrators used for orbit propagation found in the literature include explicit Runge-Kutta methods such as Dormand-Prince 8(7) (DP8) or Runge-Kutta-Nystrom 12(10) (RKN12) and predictor-corrector methods, namely Adams-Bashforth-Moulton (ABM) and Gauss-Jackson (GJ) [5]. In particular, implicit Runge-Kutta methods such as the ones developed by Bradley et al.
[46] (GL-IRK) and Bai and Junkins [47] (GC-IRK) are quite interesting algorithms for practical applications as they allow for parallelization, are A-stable [48] at all orders and are more accurate than DP8 [49]. Analytical methods are more common, and of those, the widely used method is the SGP4 (Simplified General Perturbations Model 4) [50].
Made available to the public in 2006 [51], it has since them become the most used propagator for orbit prediction, despite the lack of accuracy for long-time predictions. The input data for SGP4 is a Two Line Element (TLE) that includes information about the object and its orbit, such as satellite number, mean orbital elements, drag, ballistic coefficient, revolution number and a time stamp. One information it does not provide, crucially, is the initial uncertainty, so we can deduce the fact that the information on Resident Space Objects made publicly available through TLEs at space-track.org is one of the reasons for the ample use of SGP4 [52].
Using the current methodology two types of errors exist when doing Orbit Propagation: satellite force model's errors and measurement errors. The first error type relates to simplifications in the model, such as Earth's Gravitational harmonics model, or unknown information specific to the RSO, such as shape, attitude or cross-sectional area.
Measurement errors are all errors associated with the difference between the real state of the satellite (such as position and velocity) and the measured state, including numerical truncation errors, as well as other navigation errors caused by clock accuracy or coordinate systems precision. In Figure 3 we can see a summary of the error sources in orbit prediction, as presented by [10] and [53].
In recent years, data-driven techniques were used to tackle both sides of the problem, either employing machine learning to improve the model's representation of reality or to mitigate the errors caused by numerical solutions and linear assumptions. Levit and Marshall [54] showed that when using a sufficient number of TLE's, they could be used as pseudo-observations to fit a high-precision special perturbations numerical propagator. This method was applied to a set of satellites, and the OP error was reduced from 1.5 km/day to 0.1 km/day. This work shows that past information can be leveraged to improve orbit prediction, which is a key factor for any machine learning setup. This work was closely followed by [55], in which the same method was applied, alongside a bias correction function. Similarly, other authors [56,57] have shown improvement of the SGP4 prediction using error correction functions, based on prediction error periodicity.
A second approach is to remove the physics-based propagator altogether. In [58], it was proposed a data-driven model that was able to approximate the SGP4 algorithm using a polynomial fit. A different method was introduced in [24] using Latent Force Models (LFMs) to combine the physical models principles with non-parametric data-driven components. By assuming Gaussian process (GP) priors on unknown forces, this method allows to determine future orbit positions and corresponding uncertainty. This work was by extended Rautalin et al. [59] who obtained positive results for a set of satellite constellations in Medium Earth Orbit (MEO) and Geo-Synchronous Orbit (GEO).
Peng and Bai [60] have an extensive number of publications on this field. In their first paper within the error correction framework, a Support Vector Machine (SVM) is used to learn the structural part of the historical error and to improve the SGP4-based prediction. In a simulated catalogue, they have shown that the error correcting model can be deployed to improve RSO's orbit prediction. This duo of researchers also examined the capabilities of three ML models, specifically, Artificial Neural Networks (ANN), Gaussian Processes (GP) and Support Vector Machines (SVM) in [60][61][62], coming to the conclusion that the ANN had the best results, despite being more prone to overfitting. In [63] the authors evaluated the SVM model on a real dataset and more recently proposed a data fusion approach to combine uncertainty information from the EKF OD process with the one obtained from the GP model [64].
This hybrid approach, that uses a ML algorithm to error correct a physics-based model is by far the most common, and various neural network architectures have been leveraged for this purpose. Under the same hybrid SGP4 approach, four independent works show that Convolutional Neural Networks (CNN) [65], Feed-Forward Neural Networks (FNN) [66], Recurrent Neural Networks (RNN) [67] and Long-Short Term Memory (LSTM) [68] can be used to improve SGP4 orbit prediction. Using a Gradient Boosting Decision Tree (GBDT) and a CNN, Li et al. [69,70] improved by more than 75% the along-track direction prediction for five satellites in LEO and GEO. All of this work has been done under the same data regimen, in particular using publicly available satellite data from the International Laser Ranging System (ILRS) as the ground truth and using a set of TLEs for a limited number of satellites as the training data. Two  yet to achieve goals in this line of research is a generalisation of the machine learning models to unseen RSOs (which Peng and Bai refers as a type III generalisation [71]), so that a machine learning model is not developed for each RSO, and the use of exogenous variables outside of the TLE information.
As mentioned before, two error sources exist in Orbit Prediction, and therefore two lines of work could be developed to enhance a physics-based model. The previously shown work focus is on correcting numerical errors and observational errors that occur when using simplifying assumptions, particularly the limited SGP4 propagator. A different approach is on the use of ML to improve our knowledge of the Earth, space and RSOs and thus obtaining a more exact model of reality.

V. Thermospheric Density Mass Models
In any orbit prediction process of LEO objects, knowledge of thermospheric mass density (the in Equation (4)) is essential to accurately predict future states, since it is known that air drag is the main non-conservative force affecting satellites and debris alike in LEO [72]. The state-of-the-art model series currently used are the Jacchia-Bowman 2008 (JB2008) [73], that has a corrective model in the form of the High Accuracy Satellite Drag Model (HASDM) [7], the Drag Temperature Model 2020 (DTM) [74]

and the Naval Research Laboratory for Mass Spectrometer and Incoherent
Scatter radar 2.0 model (NRLMSIS 2.0) [9]. These empirical models are derived combining information from mass spectrometer data, inferred temperature from ground stations, solar UV measurements and on-board drag-derived density data. They are, however, far from exact when used in Orbit Prediction. Vallado and Finkleman [75] revealed an average one-sigma accuracy of 10% to 15% for empirical density models, depending on model, solar activity and location [76]. These empirical models lack predictive capacity, and can be further compromised by erroneous forecast of the driving features of the model. In the work by Emmert et al. [77] it is shown that a 10% error in Extreme Ultraviolet (EUV) light prediction will result in more than 200 km uncertainty for a given satellite after 7 days.
The use of machine learning for space weather indexes, such as the 10.7 , which is a proxy for EUV light, dates back to the 90's, when a ANN was used to predict solar flux [78] or Gleisner et al. [79] predicted geomagnetic storms using Time delay Neural Networks (TDNN). On the whole, due to the public availability of space weather indexes, and their natural periodic features, this area has an ample array of scientific publications [80][81][82][83][84] and for an extensive review of all machine learning applications in this particular field we recommend the excellent survey done by Camporeale [85].
The initial approach for improving atmospheric mass density models is to use calibration data to fine tune the parameters of the original model. This was the approach taken in [7] to develop the HASDM, and other authors showed also superior results with calibrated models: Doornbos et al. [86] used TLE data to calibrate a complex physical model, halving orbit prediction error; with the same approach later used [87] to calibrate the NRLMSISE-00 model during periods of increased solar activiy. In [88] a process was suggested to calibrate any empirical model using the least squares method, which [89] found to be the most effective calibration method for reducing orbit prediction error.
Using more than one model was found to be helpful in reducing forecasting errors, when a Multi-model Ensemble [90] combining three well-known physical and empirical models was able to reduce forecasting error in the training and test set. A similar approach was presented by [91] in which a NN was used to combine empirical models in order to reduce density error. [92] leveraged a Neural Network to calibrate models during magnetic storms, a period that is known to be particularly challenging for empirical models, while [93] and [94] used LSTM and Gaussian Processes (GP), respectively, to calibrate the original models.
A different approach has appeared recently, using Reduced Order Models (ROM) as a quasi-physical model. In [95] a ROM was used to approximate the TIE-GCM complex physical model, with simulated orbit ephemerides used for model calibration. Reduced Order Models are models that take the high dimension space to a lower dimensional while maintaining most of the information. In this case, the order reduction is done using Proper Orthogonal Decomposition (POD), which is usually referred to in the machine learning world as Principal Component Analysis (PCA). This work was extended using accelerometer-derived density estimates from the Challenging Minisatellite Payload (CHAMP) and Gravity Field and Steady-State Ocean Circulation Explorer (GOCE) [96] satellites and TLE data [97] for model calibration. Gondelach and Linares [98] showed that a ROM model with TLE-estimated densities provides more precise Orbit Determination than the NRLMSISE-00 and JB2008 models, especially in the along-track coordinate. This ROM approach was improved upon using an anto-encoder for dimensionality reduction [99]. This work was further refined by using Sparse Identification of Nonlinear Dynamical systems techniques (SINDy) to derive an explicit non-linear differential equation that defined the atmospheric density in the encoded (low-dimension) space [100].
In 2017, Mehta et al. [101] published a global density estimate derived from the GRACE and CHAMP satellites, using a physical model with high-fidelity drag coefficient modelling. This data was then used to develop an LSTM-based Neural Network for global density prediction, with promising results [102]. Similarly, Bonasera et al. [103] used two different machine learning techniques to derive uncertainty over the estimated density, Monte-Carlo Dropout and Deep Ensemble, with both methods surpassing JB2008 and NRLMSISE-00; Both techniques use a sampling method to obtain a probability distribution of the estimated quantity. Monte-Carlo Dropout "turns-off" a percentage of the neurons in a Neural Network for each sample, so as to obtain a slightly different result [34]. Deep Ensemble is a technique that combines the output of a series of Neural Networks to obtain an estimated mean and uncertainty of the prediction [104].
The same group of researchers also developed a method to forecast with uncertainty estimation the input features for empirical models, combining solar data images with time-series information [105]. In 2020, SET (Space Enviroment Techonologies) made public 20 years of density data derived from HASDM, the model used by the United States Air Force [106]. With this data it was possible to train a NN to replicate the HASDM model [107] using the same input, so as to add uncertainty information through the use of Monte-Carlo Dropout. Further advancements were accomplished by the use of Bayesian Neural Networks on the global HASDM dataset and the local CHAMP dataset [108]. Bayesian Neural Networks have recently garnered more attention in the machine learning world, as they allow for a more principled understanding of uncertainty, when compared to Monte-Carlo Dropout or Ensembles. Bayesian Neural Networks replace the neurons in a network, which are unknown parameters, or weights, with random variables, with the training process learning the probability distribution function of each neuron [109].

VI. Future Challenges and Conclusion
In this Section, we will discuss some of the current and future lines of research. Globally, we see a shift from linear and physical models to more data-driven non-linear models across the entire field. Some of these linear models started being used in the 1960s [110], and many of the computational limitations that justify the linear approach have been surpassed. We now know that the Gaussian assumption used in the Extended Kalman Filter (EKF) is not verified [5], and the new approaches presented using Gaussian Mixture Models (GMM) [37] and the Gauss Von Mises (GVM) distribution [40] are highly promising, however two main drawbacks are yet to be addressed: the GMM necessarily needs a set of Gaussian distribution that is not constant for all RSOs, depending on shape, orbit, altitude and solar activity. Determining the exact number of Gaussian distributions necessary for sufficiently accurate orbit determination is an open area of investigation. A second hurdle regarding this area is the Probability of Collision (PoC). One of the key motivations for precise Orbit Determination is the protection of satellites and the space environment, and calculating the PoC is necessary to determine orbital avoidance manoeuvres. In [37] the author is the first to evaluate the GMM for PoC using Monte Carlo simulation. However, in most cases, the probability of collision is closer to 10 −30 , which makes Monte Carlo too time consuming for general use. A PoC formula that can be applied for all distributions is a very interesting path of research, with some initial work done by [111], who generalized the PoC equation to any distribution and also hypothesises the use of GMMs. The authors of this survey found that, in this particular area, most machine learning is focused on distribution regression, the fitting of a distribution to a set of observations, with very limited use of Deep Learning techniques, with the exception of [112], who used a ANN-EKF hybrid. On the contrary, many Deep Learning architectures have been experimented with in Orbit Prediction, most of then with the focus on improving upon the SGP4 propagator. While this work can be relevant to improve publicly available catalogues, a necessary development for industrial use is the introduction of uncertainty in the ML-OP scheme. Most Neural Network have Bayesian counterparts, meaning Neural Networks that predict a distribution, however limited work has been done in this front. The most recent works in this area validate this trend, with [64] combining Gaussian Process (GP) uncertainty with EKF-derived uncertainty, and [67] using a Deep Ensemble to perform variance estimation.
Most research relies on the learning of historical error patterns, and it is possible to obtain improvements by introducing variables that improve the representation of the physical forces being applied on the RSOs. Some exploration in the estimation of unknown physical properties of RSO has also been performed recently by Guthrie et al. [113] who performed attitude estimation of unknown space debris using siamese CNNs, or with the use of CNNs combined with a 3D image generator to determine the pose of a series of satellites [114,115]. Using light curves data, the work in [116] leveraged deep learning techniques to determine RSOs shape. The retrieval of physical characteristics is particularly relevant for space debris, however it suffers from a lack of labeled data, with real information on space debris' shape seldom available. Some preliminary work on Meta-Learning Models [117] indicates that this area is primed for the use of transfer learning (using models trained on some type of data to solve a different problem) and few-shot learning techniques (learning with very limited data). Currently, the shape and orientation of RSOs is not yet used in Orbit Determination or Prediction, meanwhile, atmospheric density and air drag forces are considered in any current force model, thus it is relevant to have accurate density prediction models. The classical models (e.g., JB2008, DTM) referenced before are empirical models, which in practice means that they are, in a sense, machine learning algorithms: physics-aware equations, mostly linear or sinusoidal, with coefficients regressed from data. When we observe the limitations of the current algorithms, with 10 to 15% 1-sigma accuracy [75], it becomes a question if the empirical models can be completely replaced by more modern deep learning models, and that is exactly what recent papers [100,103,107] have proposed, using different methods. During this review we observed that most machine learning papers in this subject where published since 2019, as can be seen in Figure 4, which speaks to the excitement around this area and to the results currently being achieved. As with everything else in this field, model uncertainty and forecast confidence intervals need to be a key feature in all models designed in the future, so to integrate atmospheric density model uncertainty in Orbit Propagation. Furthermore, the introduction of grey-box models, such as the one proposed by [100], will become more common, since these types of models are easier to evaluate and can implement physics aware restrictions within them.
In conclusion, we see a field where classical algorithms can eventually be replaced, but that will only occur if we use machine learning techniques to better represent and understand the underlying physical forces. Across the board, the predictions will become more focused on probability distributions than point-wise estimations, and the push to create grey-box models, simpler physics-aware algorithms that can be understood by physicists and machine learning specialists alike will gain prominence, as that is the real problem -How can we use machine learning to gain a better understanding of a system, or, in this case, the physical forces that govern space?

Funding Sources
This work was partially supported by FCT/PT Space under the PhD grant I-2021-03954 and the strategic project NOVA LINCS (UIDB/04516/2020).