A Robust Terrain Aided Navigation Using the Rao-Blackwellized Particle Filter Trained by Long Short-Term Memory Networks

Terrain-aided navigation (TAN) is a technology that estimates the position of the vehicle by comparing the altitude measured by an altimeter and height from the digital elevation model (DEM). The particle filter (PF)-based TAN has been commonly used to obtain stable real-time navigation solutions in cases where the unmanned aerial vehicle (UAV) operates at a high altitude. Even though TAN performs well on rough and unique terrains, its performance degrades in flat and repetitive terrains. In particular, in the case of PF-based TAN, there has been no verified technique for deciding its terrain validity. Therefore, this study designed a Rao-Blackwellized PF (RBPF)-based TAN, used long short-term memory (LSTM) networks to endure flat and repetitive terrains, and trained the noise covariances and measurement model of RBPF. LSTM is a modified recurrent neural network (RNN), which is an artificial neural network that recognizes patterns from time series data. Using this, this study tuned the noise covariances and measurement model of RBPF to minimize the navigation errors in various flight trajectories. This paper designed a TAN algorithm based on combining RBPF and LSTM and confirmed that it can enable a more precise navigation performance than conventional RBPF based TAN through simulations.


Introduction
Aircraft safety requires highly reliable navigation information. Traditionally, the inertial navigation system and global positioning system (INS/GPS) integrated navigation algorithm has been widely used [1]. However, GPS cannot operate independently and is also vulnerable to jamming. To overcome such weakness, the terrain-aided navigation (TAN) techniques can be used. TAN is a navigation technology that estimates the aircraft's precise position by comparing the altitude measured by an altimeter with the uploaded digital elevation data (DEM). To acquire precise position information using TAN, nonlinear estimation problems must be solved in real-time. The extended Kalman filter (EKF)-based TAN algorithms have solved these problems through regional linearization [2]. However, because of the highly nonlinear characteristics of the terrain, the EKF-based TAN algorithm can diverge due to linearization. Recent studies have suggested that the TAN techniques that use the Bayesian estimate method, such as particle filter (PF) and point mass filter (PMF), can prevent the problem [3][4][5][6][7]. The techniques can be directly applied to nonlinear problems without having to perform linearization, like EKF. When using the Bayesian approach, integration terms are included when the measurements are updated. It's difficult to calculate the integration terms in real time. PF uses the Monte-Carlo study adopted an improved recurrent neural networks (RNN) method called long short-term memory (LSTM). RNN is an artificial neural network that recognizes patterns from the time series data, which is one of the deep learning techniques that considers current and past input data via inner memory [20]. However, as for the RNN, its error gradient decreases along with back propagation when going back in time, so it is inappropriate to analyzing long time series data patterns. LSTM networks update intermediate memory cells with the sum of values that pass through the input and output gates, so they can deal with longer sequences than RNN, which is composed of only multiplication. Recently, there have been active studies that use the expectation-maximization (EM) algorithm or Bayes rules to efficiently conduct RNN or LSTM training or studies that use LSTM to improve the performance of KF or PF [21][22][23][24][25][26]. These studies are not suitable for providing real-time solutions or they are mostly limited to image recognition. This paper used LSTM networks to train noise covariances and measurement model of RBPF based TAN for improving the navigation performance in flat and repetitive terrains.
Section 2 summarizes the design of the conventional RBPF based TAN. Section 3 introduces the terrain and measurement validity check logic and its application to the designed RBPF-based TAN. Next, the LSTM modules are designed, and an LSTM-RBPF-based TAN is proposed, of which the noise covariances and measurement model are trained by the LSTM modules. Finally, this study determines the model parameters for the proposed LSTM modules using training data and performs Monte Carlo simulations that use evaluation data to verify the proposed design.

Conventional RBPF Based TAN
As for the EKF-based TAN, there is a high probability of divergence if the nonlinearity of the system or measurement model is too great. Therefore, this study considered PF-based TAN. PF is one of the general Bayesian filters that use global approximation instead of regional linearization [3,8,27]. In this study, the two-dimension PF was composed of latitude and longitude errors. The Bayesian filter was applied to the following TAN system and measurement model: x k = δφ δλ T is a two-dimensional state vector composed of latitude and longitude errors at the k-th time. u k = v n,k v e,k T and h k (x k ) denote the velocity vector composed of velocities in a northward and eastward direction and terrain elevation from the DEM evaluated at the position, x k . y k is the terrain height calculated by the measurements of the IRA and barometer. In the system above, w k is the system white noise that meets E(w k ) = 0 and E(w k )E(w k ) T = Q k ∆t. Here, Q k is the system process noise covariance, and ∆t is the sampling time. In the measurement above, v k is the white measurement noise that meets E(v k ) = 0 and E(v k )E(v k ) T = R k . Here, R k is the measurement noise covariance. The prior pdf is as follows. The prediction step uses the TAN system model (1) to obtain the prior pdf of the state at time step k via the Chapman-Kolmogorov Equation (6): By using the same process above from Equation (2), the likelihood is as follows: At time step k, measurement y k becomes available, and this may be used to update the prior pdf via Bayes' rule [28]. The posterior pdf that uses this is as follows: Here, α k = p v k (y k − h k (x k ))p(x k |Y k )dx k and Y k = {y 1 , y 2 , . . . , y k }. α k is the parameter that normalizes the posterior pdf. The state variable estimate and covariance that minimize the mean square error are as follows:x k|k = x k p(x k |Y k )dx k , (6) The computational load due to the integral calculation included in the above conditional pdf is large. To alleviate the computational load, the sequential importance sampling-PF (SIS-PF) is generally used. SIS-PF is a technique that implements a sequential Bayesian filter using Monte Carlo sampling.
If x i 1:k , i = 1, . . . , N s is the i-th weighted particle with the i-th weight, w i k|k , i = 1, . . . , N s , the time propagation equation of the PF is as follows: N s is the number of the sampled particles and x i 1:k = x i 1 , x i 2 , · · · , x i k . If x i k ∼ q(x 1:k |Y k−1 ), i = 1, · · · , N s is a sample generated from the target probability density, q(x 1:k |Y k ), the above weight is as follows by the importance sampling principle: The process of PF can be expressed as follows by separating the time propagation equation and the measurement update equation: In this study, the current value of the state vector is determined by the one previous value that uses a Markov process, and the Gaussian distribution is used as the target distribution, q( x i k x i k−1 , y k ). By using the Dirac delta function, the posterior pdf, p( x k |Y k ) at the k step can be approximated as (12): Here, δ denotes the Dirac delta function. The state variable estimate and covariance that minimize the mean square error are as follows [8] The SIS-PF updates the weights and particles when the measurements are put sequentially. When the method runs several steps, a degeneracy problem occurs in which the weights of all particles are too small, except for a few particles. To solve this problem, the number of valid samples, N e f f , should be maintained [28]. N e f f is determined by the user and is set to 2 3 N s in this study: . It is hard to calculate the target weight, w * i k+1|k , as the target distribution is unknown exactly. So, the estimate is used as shown in the above equation. That is, to maximize N e f f , important sampling is performed so that Var(w * i k ) becomes the minimum. The simplest way to implement this is to perform resampling, and this PF is called sequential importance resampling-PF (SIR-PF). Among various resampling methods, this study employed the stratified sampling method with simulation. This method is as follows [28]: When the number of particles after resampling is N, the weights are recalculated as follows: Resampling can resolve the degeneracy problem, but since the particles with large weights are replicated when the filter is updated, a sampling impoverishment problem occurs where the diversity disappears over time. To alleviate this problem, the Markov Chain Monte Carlo (MCMC)-step was added to PF by replacing only particles that satisfy the diversity judgement condition through Metropolis-Hasting sampling [27]: Here, µ and R are determined by considering the move step size to a new set of particles that use the following random walk model where it was set to 0 and 0.002, respectively, in this study. This MCMC-step is performed after the resampling step. The corresponding acceptance probability is expressed as: To increase the accuracy of the posterior pdf estimate, the number of particles must increase. As the dimension of the state variable increases, the amount of computation increases rapidly [28]. To solve this problem, there have been studies that used the marginalization method for efficient computation in the positioning, navigation, and tracking problems [10,11,29]. This is a method that separates the state variables into linear and nonlinear parts. It also applies nonlinear parts to PF and linear parts to construct one KF for each particle. The most general model about RBPF is as follows [9]: Here, following includes a general formula that consists of the linear state variables of RBPF, x l k and nonlinear state variables, x n k [9,10]. f n k−1 (x n k−1 ) is the dynamic function of the nonlinear state variables and is equal to x k−1 + u k in Equation (1). F n k−1 (x n k−1 ) is the dynamic function of the nonlinear state variables and determined by the linear state variable in one previous time. It was set to zero matrix in this study. This means that the prediction of the nonlinear state variables is not affected by the linear state variable [9]. f l k−1 (x n k−1 ) is the dynamic model of the linear state variable determined by the nonlinear state variables. F l k−1 (x n k−1 ) is the dynamic function of the nonlinear state variable by the linear state variables in one previous time. H k (x n k ) is the measurement model determined by the linear state variable. Assume that x l k follows the normal distribution in the condition given x n k in the above model, the model can be expressed as: That is, (24) and (26) are the measurement models and (25) is the system model from the viewpoint of x l k . In Equation (24) (1). Therefore, it is possible to interpret z k−1 as a measurement and w n k−1 as the corresponding measurement noise from the viewpoint of x l k . From the viewpoint of x n k , F n k−1 x n k−1 x l k−1 of (25) and H k x n k x l k of (26) are regarded as additional process and measurement noise, respectively. First, x k|k−1 is calculated through (9) and (10) for x n k . By Bayes' rule, the joint pdf x l k and x n k in the condition of given Y k = {y 1 , . . . , y k } is as follows [9]: Here, p( x l k x n k , Y k ) is analytically tractable and given by the optimal KF. p( x n k Y k ) can be estimated by PF.x l k|k−1 is calculated by the time propagation of x l k−1 . Then the conditional pdf forx l k|k−1 is given by applying two-step measurement updates using z k−1 = x n k − f n k−1 x n k−1 , where x n k is the value in the time propagation step of PF in the condition of given x l k|k−1 and y k − h k x n k , where x n k is the value in the measurement update step of PF in the condition of given x l k|k : k|k is calculated by performing a measurement update through (12) for x n k .
When N e f f is smaller than the threshold value, resampling is performed. Afterwards, Equation (28) is calculated by the measurement update for x l k . Finally, the posterior pdf is obtained as follows: x l k is a one-dimensional state vector that is given in terms of altitude error. The time propagation of the linear component of the states and covariance is as follows: The likelihood of the nonlinear part of RBPF is compensated by this estimated altitude error state and covariance: Kalman gain is updated as follows: If the measurement is available, the update state and covariance are performed as follows:

Validity Check Logic of Terrain for RBPF Based TAN
In TAN, the validity check technique of measurements by terrain roughness and uniqueness is an important technique that determines the navigation performance. In this study, the interferometric radar altimeter (IRA) is used to measure the angle of the direction of flight, the angle perpendicular to the direction of flight, and the range from the aircraft to the nearest terrain point. It then converts these measurements to a three-dimensional position information on an earth-centered earth-fixed (ECEF) coordinate system [30,31]. Moreover, it can acquire precise position estimates and maintains a very small margin of error, even at high altitudes. Despite these advantages, IRA has many uncertainties, including environmental factors and IRA inherent measurement errors. Generally, the uncertainties are large in flat and repetitive terrains. In particular, it is difficult to estimate the ambiguity errors generated through the signal processing and the glint errors caused by the target fluctuation or clutter, making it challenging to find appropriate compensation techniques. Accordingly, as the TAN that uses the raw data of IRA is likely to be diverted due to uncertain measurements, only the measurements that are useful for TAN should be used selectively. This study describes the RBPF-based TAN, including the validity check logic of the terrain and IRA measurements, as shown in Figure 1. The difference between altitude from the aircraft to the mean sea level (MSL) measured by the barometer and distance from the aircraft to the nearest terrain point measured by IRA was matched with the terrain height on DEM. If the IRA measurement errors are large, PF may not converge. Therefore, this study designed a system that only updates RBPF when it decides the measurement is valid, and if not, the system only conducts time propagation. The INS/TAN integrated navigation uses the estimated position by RBPF-based TAN as measurement and only updates in terrains that seem to be rough and unique through a terrain validity check, as in Figure 1. In this study, we designed an RBPF composed of two-dimensional PF and one-dimensional KF. Two-dimensional PF estimates the latitude and longitude errors. One-dimensional KF estimates the altitude error and compensates the errors in the likelihood step, as shown in Figure 1. If the posterior pdf of the RBPF satisfies the IRA validity check conditions, the IRA measurements are updated. Also, if the posterior pdf is more informative than the prior pdf, the TAN output is judged as satisfying the TAN validity check condition and can be used as the measurements of the INS/TAN integrated navigation.

Measurement Validity Check Logic
As previously stated, the IRA measurements can be converted into three-dimensional position information. As shown in Figure 2., the relative position vector, δ , of the nearest point from the aircraft is given in Equation (42) Here, and are the range and look angle output of IRA, respectively. The virtual pitch angle, α, and azimuth angle, , of the zero Doppler line are determined by the velocity of the aircraft as follows [30]: Here, [ ] is the velocity of the aircraft in the navigation frame. So, the nearest point, [̂̂] , is determined by the summation of the estimated aircraft position calculated by Equation (13) and the relative position, [ ] . As shown in Figure 2b, the nearest points acquired by the raw data of IRA measurements without an IRA validity check is very unstable. Therefore, to implement the robust TAN, we must use the beneficial measurements selectively. We could not find the references about the IRA validity check logic for the Bayesian filters. So, we developed a validity check logic through the simulations and captive flight tests. We need navigation information, including the position, velocity, and attitude. To acquire all the information, increasing the dimensions of RBPF causes computation complexity. There have been studies about the INS/TAN integrated navigation algorithms [32]. In this study, the loosely-coupled INS/TAN integrated navigation was designed to reduce the computing load. The INS/TAN integrated navigation is designed with the EKF and uses the 13th state variables composed with the error of latitude, δφ, longitude, δλ, velocity, { δV e δV n }, attitude, { δΨ e δΨ n δΨ u }, accelerometer bias, { δB a x δB a y δB a z }, and gyro bias, Since the TAN filter can be unstable in flat and repetitive terrains, in this study, a feedforward structure was designed to prevent this problem. The state variables can be expressed as: x(k) = δφ δλ δV e δV n δΨ e δΨ n δΨ u δB a x δB a y δB a z δB w The system and measurement matrix of the discretized state equation are as follows. The system matrix is derived as an error model of INS, and the measurements are acquired from the estimates of the latitude and longitude of the RBPF-based TAN: The system matrix, A is described in Appendix A.

Measurement Validity Check Logic
As previously stated, the IRA measurements can be converted into three-dimensional position information. As shown in Figure 2., the relative position vector, δx IRA , of the nearest point from the aircraft is given in Equation (42): Here, ρ and ξ are the range and look angle output of IRA, respectively. The virtual pitch angle, α, and azimuth angle, β, of the zero Doppler line are determined by the velocity of the aircraft as follows [30]: Here, [ V e V n V u ] is the velocity of the aircraft in the navigation frame. So, the nearest point, x φxλ T , is determined by the summation of the estimated aircraft position calculated by Equation (13) and the relative position, δφ res δλ res T . As shown in Figure 2b, the nearest points acquired by the raw data of IRA measurements without an IRA validity check is very unstable. Therefore, to implement the robust TAN, we must use the beneficial measurements selectively. We could not find the references about the IRA validity check logic for the Bayesian filters. So, we developed a validity check logic through the simulations and captive flight tests. In this study, the IRA validity check technology was applied using residual check logic in the following equations: Here, ĥ is the MSL altitude measured by the barometer. ℎ is the relative height calculated by the IRA. ̂ℎ is the height error estimated by the KF part of RBPF and is equal to the ̂| calculated in Equation (37) in Equation (37) and means the estimate of the height error state assigned to the i-th particle.
is the estimate of the latitude and longitude of the i-th particle. ℎ ̅ is the mean of the terrain DEM data of the particles and ℎ 2 is the mean of the terrain DEM data squares of the particles.
is the variance of the measurement noise, and ℎ is the covariance of the height error state estimated by the KF. ĥ − ℎ means the terrain height by the measurements, and ̂ℎ − ℎ ̅ means the estimate of the terrain height by RBPF. In this study, the IRA validity check technology was applied using residual check logic in the following equations: Here,ĥ is the MSL altitude measured by the barometer. h res is the relative height calculated by the IRA.x h is the height error estimated by the KF part of RBPF and is equal to thex l k|k calculated in Equation (37).
k|k−1 in Equation (37) and means the estimate of the height error state assigned to the i-th particle. x is the estimate of the latitude and longitude of the i-th particle. h dem is the mean of the terrain DEM data of the particles and h 2 dem is the mean of the terrain DEM data squares of the particles. R is the variance of the measurement noise, and P h is the covariance of the height error state estimated by the KF.ĥ − h res means the terrain height by the measurements, andx h − h dem means the estimate of the terrain height by RBPF. So, the difference between both terms is the residual. R + P h is the acceptable range of the height error square that considers the measurement noise and height error state covariance. h 2 dem − h 2 dem is the additional range of the height error square that considers the terrain roughness and uniqueness in the distributed area of particles, so the IRA measurements are valid, but only if the residual is more than one sigma of the acceptable error range, and the minimum residual among the residuals of all the particles is more than 0.1 sigma of the estimated error range by RBPF. These logics were designed through simulations and verified by the captive flight tests.

Terrain Validity Check
As mentioned above, unlike contour matching or BKF based TAN, we could not find a well-established terrain validity check logic in PF based TAN. In this study, a technique was performed that uses mutual information, which is a measure of the mutual dependence between the entropy of a prior distribution and a posterior distribution [33]. Entropy is an index that displays the uncertainties of random variables. If the random variables are in a uniform distribution, the value of entropy is at its maximum. The entropy of the prior pdf, The entropy of the posterior pdf, H( x k |Y k ), is defined as follows [33]: Mutual information indicates the amount of entropy of x k reduced by measuring y k . The validity check index, V IE(k), which uses the mutual information of the estimate, can be determined using Equations (8) and (12): V IE(k) is the amount of reduced uncertainty after the measurement update. In other words, if V IE(k) is positive, the position information estimated by TAN is valid and used as the measurement of the INS/TAN integrated navigation. To verify the method, this study conducted Monte Carlo simulation 100 times based on the simulation condition and RBPF design parameter indicated in Table 1 and Figure 3, respectively. of the INS/TAN integrated navigation. To verify the method, this study conducted Monte Carlo simulation 100 times based on the simulation condition and RBPF design parameter indicated in Table 1 and Figure 3, respectively.   Figure 3 shows the simulation trajectories used to observe the performance in various terrain conditions. Figure 3b shows a trajectory that starts from the rough terrain to sea, with Figure 3c showing a trajectory that includes only rough terrains, Figure 3d showing a trajectory that starts from flat land to rough terrain, and Figure 3a showing a trajectory that includes all rough and flat terrains. Through these simulations in various terrains, we want to draw the most optimal design of the validity check logic. This study conducted simulations in various trajectories with and without , and the results are as shown in Table 2. Also, this study compared the navigation error between cases that are decided by ( ) in the current point of view and cases that are decided by accumulated from previous times.  Figure 3 shows the simulation trajectories used to observe the performance in various terrain conditions. Figure 3b shows a trajectory that starts from the rough terrain to sea, with Figure 3c showing a trajectory that includes only rough terrains, Figure 3d showing a trajectory that starts from flat land to rough terrain, and Figure 3a showing a trajectory that includes all rough and flat terrains. Through these simulations in various terrains, we want to draw the most optimal design of the validity check logic. This study conducted simulations in various trajectories with V IE and without V IE, and the results are as shown in Table 2. Also, this study compared the navigation error between cases that are decided by V IE(k) in the current point of view and cases that are decided by V IEs accumulated from previous times. The cases in Table 2  As Table 2 indicates, all trajectories had a smaller position error with the IRA and terrain validity check logics than without the check logics. Even if the validity check logic was used, its performance was better on average when the measurement update was conducted with either one of the positive current or one-step previous V IE than if the current V IE was positive, or if either one among the current V IE and the previous two-step V IEs were positive. Table 2 indicates that the IRA validity check logic provide great improvement. Although the terrain validity check logic is not perfect, it is helpful to improve the performance in some ways. As the IRA validity check logic is a technique to filter out the uncertain IRA measurements caused by the flat and repetitive terrains, a wide sense of the terrain validity check logic is a must for RBPF based TAN. Also, Table 2 shows that the previous data pattern rather than the current data must be considered. Also, these simulation results represent that it is difficult to numerically model the logic helpful to all the common trajectories. So, this study aimed to suggest a design with more improved performance than the conventional RBPF based TAN in various terrains by utilizing deep learning techniques that use time series data, which are called LSTM networks.

The RBPF Trained by LSTM Networks
In the previous section, the two-step validity check logic was designed. The terrain validity check logic was not robust in all the trajectories and was affected by the previous time data of the validity check index. Also, although the IRA validity check logic provided great improvement in those simulations, the incorrect estimates caused the validity check logic to malfunction as the logic was based on the residual between the measurement and the estimated position. Therefore, we proposed the validity check logic using RNN, which is robust in all the terrains and can operate normally in the incorrected estimates. RNN is an artificial neural network that recognizes patterns from time series data. It can memorize patterns from time series data that can consider the current and past input data at the same time. RNN can process various lengths of sequence information, but actually, it only effectively processes comparatively short sequences and cannot remember the incidences from the far past. This is due to the vanishing gradient problem, in which the gradient of the output errors cannot be delivered to the initial layer [34]. In other words, it is inappropriate to make simple RNN to learn using back propagation through time (BPTT) for dealing with longer time series data [31,34]. Among the various tricks for solving the vanishing gradient problem, there are advanced RNN designs, such as the LSTM network, gated recurrent unit (GRU) network and recurrent highway network (RHN) [34]. The LSTM network is composed of cells attached to 3 gates, as in Figure 4, and each gate decides which input value to apply and how much among the current values to forget or output. As shown in Figure 5, the cell state update includes the addition operation. The RNNs' units are only composed of multiplication operation, but as the LSTM includes addition, it can alleviate the gradient vanish problem [34].

The RBPF Trained by LSTM Networks
In the previous section, the two-step validity check logic was designed. The terrain validity check logic was not robust in all the trajectories and was affected by the previous time data of the validity check index. Also, although the IRA validity check logic provided great improvement in those simulations, the incorrect estimates caused the validity check logic to malfunction as the logic was based on the residual between the measurement and the estimated position. Therefore, we proposed the validity check logic using RNN, which is robust in all the terrains and can operate normally in the incorrected estimates. RNN is an artificial neural network that recognizes patterns from time series data. It can memorize patterns from time series data that can consider the current and past input data at the same time. RNN can process various lengths of sequence information, but actually, it only effectively processes comparatively short sequences and cannot remember the incidences from the far past. This is due to the vanishing gradient problem, in which the gradient of the output errors cannot be delivered to the initial layer [34]. In other words, it is inappropriate to make simple RNN to learn using back propagation through time (BPTT) for dealing with longer time series data [31,34]. Among the various tricks for solving the vanishing gradient problem, there are advanced RNN designs, such as the LSTM network, gated recurrent unit (GRU) network and recurrent highway network (RHN) [34]. The LSTM network is composed of cells attached to 3 gates, as in Figure 4, and each gate decides which input value to apply and how much among the current values to forget or output. As shown in Figure 5, the cell state update includes the addition operation. The RNNs' units are only composed of multiplication operation, but as the LSTM includes addition, it can alleviate the gradient vanish problem. [34].   The LSTM network was used with only one hidden layer. The input gate, , forget gate, , output gate, , cell input, , cell state, , hidden state, ℎ , and output state, ̂, are as follows [24,26]: = tanh( ℎ ℎ −1 + + ) , The LSTM network was used with only one hidden layer. The input gate, i t , forget gate, f t , output gate, o t , cell input, g t , cell state, c t , hidden state, h t , and output state,ŷ t , are as follows [24,26]: Here represents the model parameters, including the weighting and bias matrices. σ(·) is an element-wise sigmoid function, and denotes the element-wise multiplication of the vectors. The bias of the forget gate was initiated to 1, and the rest of the bias was initiated to 0. All of the initial values of weights were sampled in Gaussian distribution, N(0, 0.1). Figure 6 shows a flow chart of the LSTM module composed of the LSTM layer, rectified linear unit (ReLU) layer, and fully connected linear output layer in Equation (57). ReLU is an activation function defined as the positive part of its argument in the artificial neural networks and is as follows: Here, x is the input to a neuron. As it leads to better training results of deeper networks than the logistic sigmoid [35], the ReLU is currently the best used activation function for deep neural networks. In the IRA usability check module, the conditions are the minimum standards that decide whether TAN can be performed in the current flight state, and the range and look angles that are the outputs of the IRA sensor are received normally. Those conditions are also applied to the conventional RBPF based TAN in the previous section. The usability check conditions are as follows: • The range output of IRA, ρ > 10 m, and the look angle of IRA, ζ < 10° • The roll angle of aircraft, γ < 10°, and pitch angle of aircraft, φ < 5° • The difference in look angles measured from the left and right antenna, |∆ | < 1°  The LSTM module is composed of 4 networks: two LSTM h s, LSTM R , and LSTM Q , as show in Figure 5. Two LSTM h s, LSTM R , and LSTM Q are the modules to train the x k = δφ δλ T of the measurement model in Equation (2), the measurement noise covariance of the PF part, and the process noise covariance of the PF part, respectively. Both LSTM h s have the same structure, which are separately applied to the latitude and longitude errors. At first, this study tried to directly estimate the terrain height, h k (x k ) in the measurement model of Equation (2), but it was impossible to find the patterns of the terrain height. As for the layer composition, the architecture of the proposed LSTM module was designed with the review of Ref. [24], and the number of nodes was tuned by checking the learning performance. The number following the linear, ReLU, and LSTM layers means the number of neurons (or node). Dropout is a regularization technique that alleviates the overfitting problem in various neural networks. The main idea is to randomly drop the connections from the neural network during training [36]. Fully connect means that all the values of the neurons in the current layer are calculated by using all the neurons of the previous layer. Also, unlike the pose estimate problem from images, as the means and variances for normalization the input are unpredictable in the case of the navigation system, this study used variables through the following equations: Here, V and D represent, respectively, the eigen value and vector of the estimate of covariance, p k|k−1 in Equation (7). SIC j is a similarity index that uses cosine similarities between the prior probability, P i 1 , or the posterior probability, P i 2 , and the uniform probability, U i , of the i-th particle [33]. V IE is the validity check index that uses entropy in Equation (50). When these variables are used as input, there is no need for the normalization step, and as the scales of all input terms are similar, it contributes to stable learning. The inputs for each network are as below. The input states of networks for process and measurement noise covariance were considered for the values of the one previous and current step. This is because, as shown in Table 2 from the previous section, when the influence of the validation check logic for RBPF based TAN was considered with the V IEs of the current and one previous step, it had the best navigation performance. Also, as the input of LSTM h , the values of the states of PF part from the 7-step previous time to the current time were used. The number for the input data was determined as the value to maximize the training accuracy. Also, after tuning so that the output of the network was between 0 to 2, this study scaled the range below for use. In the regression problem that uses the neural networks, when the range of outputs is between -1 to 1, or 0 to 2, we can acquire stability and high training accuracy [34]. As for the value over 2 or below 0, this study limited them to the maximum and minimum value of the noise covariance for the stability of the filter: Output of LSTM Q after postprocessing: Output of LSTM R after postprocessing: 10.0 ≤ √ R k ≤ 40.0 LSTM h consists of 2 stacked LSTM layers with 512 nodes each, followed by 2 fully connected layers with 512 (ReLU layer) and 1 (linear regression layer) nodes. The input elements of the LSTM layer were randomly dropped in 0.8 probability. This structure uses the same weights, but as learning does not depend on certain neurons or connections, it helps to prevent overfitting. LSTM R and LSTM Q consist of a single layer with 28 hidden states each, followed by 2 fully connected layers with 28 (ReLU layer) and 1 (linear regression layer) nodes. The ReLU function is applied to activate a fully connected layer, except for the last regression layer.

Design of the LSTM-RBPF Based TAN
In Figure 6a, the LSTM module composed of four networks explained in the previous section and the IRA usability check module were used instead of the terrain validity check that uses mutual information and the IRA validity check that uses the residual check method, as shown in Figure 1.
In the IRA usability check module, the conditions are the minimum standards that decide whether TAN can be performed in the current flight state, and the range and look angles that are the outputs of the IRA sensor are received normally. Those conditions are also applied to the conventional RBPF based TAN in the previous section. The usability check conditions are as follows: • The range output of IRA, ρ > 10 m, and the look angle of IRA, ζ < 10 The roll angle of aircraft, γ < 10 • , and pitch angle of aircraft, ϕ < 5 The difference in look angles measured from the left and right antenna, |∆ζ| < 1 • As shown in Figure 6, the proposed method uses more measurement information than the method introduced in the previous section by using the LSTM module that learns the process and measurement noise covariances and the state variation at the measurement update stage. For convenience, the proposed method will be called LSTM-RBPF hereafter. The INS/TAN integrated navigation was designed with the same architecture as the 13th feedforward EKF introduced in the previous section. The block diagram in Figure 6b shows a more detailed flow of the LSTM-RBPF-based TAN. The state estimated in the propagation step, x k|k−1 , and the estimated state in the previous step that was already stored, x k−1|k−2 are converted to input data for LSTM Q through Equations (59) and (60) in the 'EIG & SIC 1 ' module, as shown in Figure 6b. The outputs of LSTM Q and LSTM R were learned to minimize the loss function, L 1 , calculated by the scaled navigation rms error as shown below: Here, the estimate of state by LSTM-RBPF isx k|k = [δφ k|kδ λ k|k ] T . R ns and R ew represent the radius of the curvature of the Earth's ellipsoid in the north-south and east-west, respectively. L N is a scaling factor that limits the extent of the output layer to a specified range and is set to 50 in this study. The output of LSTM h is learned to minimize the loss function, L 2 , as shown below: Here, α reg is the regularization constant and is set to 0.74. · is the Euclidean distance between the input, x n k|k−1 , and output, x n k|k−1 . Here, x n k|k−1 = δφ k|k−1 δλ k|k T is the true value of latitude and longitude that is known. The regularized constant can reduce the overfitting problem. As the result, the proposed LSTM h can provide stable solutions in not only the training data, but also the new test data.
For the training process, there is a need for true noise covariances. But as the true data that can be known represent true position information, it replaced the rms error between the estimated and true noise covariance. The important thing is to tune the range of loss function in order to limit the network output within the desired range. This requires a scaling factor, L N , which is set to 50 through numerical simulations. When L N is set to 50, the highest training accuracy is represented. LSTM R estimates the measurement noise covariance in a similar manner. It uses the same loss function, but there is a difference in using V IE and SIC 2 calculated through Equations (50) and (60) as input data. As for the LSTM h , it requires the estimate of state in eight time-steps as the input data, and a buffer was added to store this. Also, the LSTM Q added L2 regularization terms that added the squared difference between the learned state and the estimate of state to loss function.

Training Accuracy of the LSTM-RBPF
This study used trajectory 1 from Figure 3a for the training data. This trajectory starts from an island and includes sea (flat terrain) and mountains (rough terrain) for 1660 s. We wanted to design a robust validity check logic for any circumstances. So, we used trajectory 1, including all the rough, flat, and repetitive terrains, as a training set. It is important to select the training data properly, as it determines the training and test accuracy. The design and simulation conditions of RBPF were the same as those shown in Table 1. However, the process and measurement noise covariance were used as the initial value until the 8th time-step, and then the learned values in the LSTM module were used. The IRA outputs for the simulation were separate simulator output values, including the signal processing and error model provided by the developer, which the model verified with its similarity with the real outputs through several captive flight tests. The learning parameters and conditions are as shown in Table 3. The maximum epoch was set to 150, and Figure 7a indicates the mean value of the loss function per epoch. One epoch means the duration that all the training data are used to train the model parameters of the LSTM module. One iteration means the duration that the model parameters are updated. In this study, one iteration was performed per 2 Hz. This study used the adaptive moment estimation (Adam) optimizer. As for the optimizer, there are many methods like stochastic gradient descent (SGD) with momentum, adaptive gradient (Adagrad), root mean square propagation (RMSProp), Adam, and more. To speed up the process of training, SGD with momentum memorizes the previously moved direction to the current gradient. Adagrad controls the step size in accordance with the variation of model parameters for the same reason. RMSProp can maintain the difference between the model parameters while preventing the infinite increase of gradient by adding the squared value of the gradient to Adagrad [37]. Adam is the optimizer that builds on the strengths of RMSProp and SGD with momentum, which is defined as below [37]: Here, θ is a model parameter such as bias and weight. L(θ) and ∇ θ L(θ) are the values of the loss function, L 1 or L 2 , and their gradients, respectively. β 1 is the gradient decay factor, and β 2 is the squared gradient decay factor. η is the learning rate, and is the constant for preventing a divide by zero error and is set to 1 × 10 −8 . Generally, when the Adam optimizer is used, the stable training accuracy is acquired quickly. When the magnitude of the gradient of loss function exponentially increase, it is likely that the training becomes instable and diverse in several iterations. The gradient explosion easily occurs in areas where the uncertainties of the measurement output increase or the flat terrain continues. To prevent this, this study used the L2 norm-based gradient clipping method. The gradient threshold was set to 3.4 after considering the error range of the TAN system. When the epoch is over 120, the value of the loss function of LSTM h becomes rather unstable due to the L2 regularization term. The graph about the value of loss function per iteration in the 1st, 52nd, and 120th epochs is shown in Figure 7b. The significant error due to the uncertainties of measurement and flat terrain in the 1st epoch was confirmed to be significantly improved in the 52nd and 120th epoch. However, when applying the model parameters that passed over 100 epochs to the new test data, there was no improvement effectiveness, which may have been caused by overfitting. In other words, the trained model meets with high training accuracy in the current training data, but doesn't help to improve the accuracy in the new test data. To prevent this overfitting problem, we stopped the training process in the 52nd epoch, as shown in Figure 7b. In all the simulations below, the learning was performed in an Intel ® Xeon ® , two of CPU @2.10GHz, 64.0GB DDR3 RAM computing environment of thinkstation P900 model (Lenovo, Beijing, China). Table 3. Learning parameters and conditions.

Maximum epoch 150
Initial The gradient explosion easily occurs in areas where the uncertainties of the measurement output increase or the flat terrain continues. To prevent this, this study used the L2 norm-based gradient clipping method. The gradient threshold was set to 3.4 after considering the error range of the TAN system. When the epoch is over 120, the value of the loss function of LSTM ℎ becomes rather unstable due to the L2 regularization term. The graph about the value of loss function per iteration in the 1st, 52nd, and 120th epochs is shown in Figure 7b. The significant error due to the uncertainties of measurement and flat terrain in the 1st epoch was confirmed to be significantly improved in the 52nd and 120th epoch. However, when applying the model parameters that passed over 100 epochs to the new test data, there was no improvement effectiveness, which may have been caused by overfitting. In other words, the trained model meets with high training accuracy in the current training data, but doesn't help to improve the accuracy in the new test data. To prevent this overfitting problem, we stopped the training process in the 52nd epoch, as shown in Figure 7b. In all the simulations below, the learning was performed in an Intel ® Xeon ® , two of CPU @2.10GHz, 64.0GB DDR3 RAM computing environment of thinkstation P900 model (Lenovo, Beijing, China).

Evaluation Accuracy of the LSTM-RBPF
To verify the design of the learned LSTM-RBPF, this study applied the design to new test data, not the training data. Figure 9 shows the results of the TAN and INS/TAN integrated navigation errors when the proposed LSTM-RBPF-based TAN performs the Monte Carlo simulation 100 times. When compared with the conventional RBPF-based TAN, it was confirmed that its navigation performance was excellent in all trajectories. In the case of trajectory 2, as it passes to the sea after 350 s, the RBPF-based TAN rms error was significantly diverse. Of course, when the sea continued, the LSTM-RBPF-based TAN eventually becomes diverse, as shown in Figure 9b, but its degree was much less than the RBPF-based TAN. In Table 2, the IRA and terrain validity check logic applied to the conventional RBPF-based TAN was essential to prevent filter divergence in flat and repetitive terrains, but it had an inverse effectiveness in trajectory 3, including only rough terrains. On the other hand, as shown in Figure 9c,d, the performance improved, even in only rough terrain. As for trajectory 4, which started from the sea and ended in rough terrain, the conventional RBPF-based TAN did not perfectly converge, but the proposed method converged quickly when it entered the rough terrain, as shown in Figure 9e,f.

Evaluation Accuracy of the LSTM-RBPF
To verify the design of the learned LSTM-RBPF, this study applied the design to new test data, not the training data. Figure 9 shows the results of the TAN and INS/TAN integrated navigation errors when the proposed LSTM-RBPF-based TAN performs the Monte Carlo simulation 100 times. When compared with the conventional RBPF-based TAN, it was confirmed that its navigation performance was excellent in all trajectories. In the case of trajectory 2, as it passes to the sea after 350 s, the RBPF-based TAN rms error was significantly diverse. Of course, when the sea continued, the LSTM-RBPF-based TAN eventually becomes diverse, as shown in Figure 9b, but its degree was much less than the RBPF-based TAN. In Table 2, the IRA and terrain validity check logic applied to the conventional RBPF-based TAN was essential to prevent filter divergence in flat and repetitive terrains, but it had an inverse effectiveness in trajectory 3, including only rough terrains. On the other hand, as shown in Figure 9c,d, the performance improved, even in only rough terrain. As for trajectory 4, which started from the sea and ended in rough terrain, the conventional RBPF-based TAN did not perfectly converge, but the proposed method converged quickly when it entered the rough terrain, as shown in Figure 9e,f.

Evaluation Accuracy of the LSTM-RBPF
To verify the design of the learned LSTM-RBPF, this study applied the design to new test data, not the training data. Figure 9 shows the results of the TAN and INS/TAN integrated navigation errors when the proposed LSTM-RBPF-based TAN performs the Monte Carlo simulation 100 times. When compared with the conventional RBPF-based TAN, it was confirmed that its navigation performance was excellent in all trajectories. In the case of trajectory 2, as it passes to the sea after 350 s, the RBPF-based TAN rms error was significantly diverse. Of course, when the sea continued, the LSTM-RBPF-based TAN eventually becomes diverse, as shown in Figure 9b, but its degree was much less than the RBPF-based TAN. In Table 2, the IRA and terrain validity check logic applied to the conventional RBPF-based TAN was essential to prevent filter divergence in flat and repetitive terrains, but it had an inverse effectiveness in trajectory 3, including only rough terrains. On the other hand, as shown in Figure 9c,d, the performance improved, even in only rough terrain. As for trajectory 4, which started from the sea and ended in rough terrain, the conventional RBPF-based TAN did not perfectly converge, but the proposed method converged quickly when it entered the rough terrain, as shown in Figure 9e,f. To understand the characteristics of the learned process and measurement noise covariances, the standard deviations of the noise covariances per iteration are shown in Figure 10a,c,e,g. The learned process noise covariance, , was 3-dimensional data composed of two and one . To understand the characteristics of the learned process and measurement noise covariances, the standard deviations of the noise covariances per iteration are shown in Figure 10a,c,e,g. The learned process noise covariance, Q k , was 3-dimensional data composed of two Q n k and one Q l k .  To understand the characteristics of the learned process and measurement noise covariances, the standard deviations of the noise covariances per iteration are shown in Figure 10a,c,e,g. The learned process noise covariance, , was 3-dimensional data composed of two and one .
(a) (b) In Figure 10, either of √ is shown as the others represent the same characteristics. The process noise in the system model means the reliability of the system, and the measurement noise in the measurement model represents the reliability of the measurement sensor. In the most adaptive filters, the two terms were generally predicted to be in an inverse proportion. This is shown in Figure  10e, which is the result of trajectory 2 with only the rough terrain. However, in the rest of the trajectories, including the sea, there is no similar pattern. As shown in Figure 10a, the process noise gradually decreases in the sea (1200~1500 s), but the measurement noise is maintained at a small value. On the other hand, Figure 10c confirms to have a smaller process and measurement noise after 350 s. Meanwhile, Figure 10g has a smaller measurement noise in the sea (20~60 s), while the process noise is maintained in large values. This is the learning result from reducing navigation errors that cannot be modeled numerically. Also, Figure 10b,d,f,h represent the differences between the terrain In Figure 10, either of Q n k is shown as the others represent the same characteristics. The process noise in the system model means the reliability of the system, and the measurement noise in the measurement model represents the reliability of the measurement sensor. In the most adaptive filters, the two terms were generally predicted to be in an inverse proportion. This is shown in Figure 10e, which is the result of trajectory 2 with only the rough terrain. However, in the rest of the trajectories, including the sea, there is no similar pattern. As shown in Figure 10a, the process noise gradually decreases in the sea (1200~1500 s), but the measurement noise is maintained at a small value. On the other hand, Figure 10c confirms to have a smaller process and measurement noise after 350 s. Meanwhile, Figure 10g has a smaller measurement noise in the sea (20~60 s), while the process noise is maintained in large values. This is the learning result from reducing navigation errors that cannot be modeled numerically. Also, Figure 10b,d,f,h represent the differences between the terrain height of the new position learned by LSTM-RBPF-based TAN and the height of the position estimated by time propagation in the conventional RBPF-based TAN. As we noted in the previous Section 4, it was impossible to directly estimate the terrain height, h k (x k ), in the measurement model of Equation (2), so two LSTM h s were used to train x k = δφ δλ T in Equation (2). To verify the measurement model learned by LSTM-RBPF-based TAN, Figure 10b,d,f,h represent how large is the difference between the terrain height of the conventional RBPF-based TAN and the terrain height at the state, x k , learned by the LSTM h module. In the rough terrain, there was an overall 2 m difference, as shown in Figure 10h, but there was also a difference in the maximum 8 m, as shown in Figure 10b,d. As shown in Figure 10d,h, unlike the prediction, there was no difference in the terrain height between the RBPF and the LSTM-RBPF-based TAN in the sea, but it was due to a low altitude of sea terrain. It does not mean there was an insignificant difference between learned and estimated positions. Table 4 below is the result of comparison between the conventional and proposed methods. According to Table 2, Case 4 shows the most stable performance among the conventional RBPF based TAN with various validity check logics. But the navigation performance of the proposed LSTM-RBPF based TAN is better than Case 4 for all trajectories. It was verified that the proposed method showed excellent navigation performance in all trajectories. As for the analysis of the average navigation result, the TAN error of the proposed method was about 37.0% of the conventional RBPF based TAN, and the TAN/INS error of the proposed method was about 47.2% of the RBPF based TAN. The results verified that the proposed method is robust to flat and repetitive terrains and the uncertainties of measurement outputs than the conventional RBPF based TAN.

Conclusions
This study applied a deep learning method based on LSTM network to improve the performance of TAN that could be replaced in environments where GPS is unavailable. In the case of TAN, it has advantages as it is not affected by external jamming or climate, but its navigation performance degrades when the roughness and uniqueness of the terrain are not secured. Thus, for a highly precise TAN navigation performance, a terrain validity check logic is needed. However, most studies on the TAN technique focused on rough and unique terrains or introduced the method of avoiding flat and repetitive terrains by using the path planning and SLAM techniques. In particular, for the PF-based TAN, there is no verified validity check technique, so, in this study, the terrain and IRA validity check logic by using MI and the residual check method were designed to improve the conventional RBPF-based TAN. However, this study demonstrated that the validity check logic of the conventional RBPF-based TAN for improving navigation performance in flat or repetitive terrains occasionally has an inverse effectiveness in rough terrains through Monte Carlo simulations.
Next, this study proposed the LSTM-RBPF-based TAN that trains the measurement model with strong non-linearity and the process and measurement noise covariances of RBPF to minimize navigation errors. There have been studies that estimated the process and measurement noise. However, the method cannot guarantee stable navigation performance in flat and repetitive terrains. Otherwise, the proposed LSTM-RBPF-based TAN was verified as being able to improve the performance of TAN and INS/TAN integrated navigation in all trajectories, including rough and flat terrains, through Monte Carlo simulations.
We will apply the proposed LSTM-RBPF-based TAN on embedded computing boards and conduct captive flight tests on aircraft in the future. We are currently doing studies about the real-time implementation of the proposed design.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The system matrix, A is as follows: Here,   . C n b is the coordinate transformation matrix from the body frame to the navigation frame. F 1 is as follows: F 32 = 2U 0 sin φV u + 2U 0 cos φV n − δR ew R ew V n V u R ns + V e V n R ew + V n V e R ew sec 2 φ, F 33 = tan φV n − V u R ew , (A5) Here, R ew and R ns represent the radius of curvature of the Earth ellipsoid in the east-west and north-south, respectively. The perturbation of these are δR ew and δR ns . V e V n V u is the velocity of the aircraft in the navigation frame. φ and λ is the latitude and longitude of the aircraft, respectively.
U 0 is the rotation velocity of the Earth and f e f n f u is the specific force of the accelerometer.