Research on intelligent forecasts of flight actions based on the implemented bi-LSTM

Rapid identification of flight actions by utilizing flight data is more realistic so the quality of flight training can be objectively assessed. The bidirectional long short-term memory (bi-LSTM) algorithm is implemented to forecast the flight actions of aircraft. The dataset containing the flight actions is structured by collecting tagged flight data when real flight training is exercised. However, the dataset needs to be preprocessed and annotated with expert rules. One of the deep learning (DL) methods, called the bi-LSTM algorithm, is implemented to train and test, and the pivotal parameters of the algorithm are optimized. Finally, the constructed model is applied to forecast the flight actions of aircraft. The training’s accuracy and loss rates are computed. The duration is kept between 1 through 3 h per session. Thus, the development of training the model is continued until an accuracy rate above 85% is achieved. The word-run inference time is kept under 2 s. Finally, the proposed algorithm’s specific characteristics, which are short training time and high recognition accuracy, are achieved when complex rules and large sample sizes exist.


INTRODUCTION
Rapid forecasting of the flight actions of aircraft based on flight records is a new way of objectively assessing the quality level of flight training operations when data-driven methods are implemented.In the conventional sense, however, three issues exist when flight movements are forecasted.First, the constructed rules are so complex that the personnel in charge of recognizing them need to have a high level of expertise.Otherwise, it is difficult to accurately identify them, resulting in erroneous recognitions, for example, the movements of jack flights from a vast quantity of flight data sets.Second, the parameters implemented for judgment rules for the flight actions of aircraft are not accurate enough to interpret all actions; namely, only a few of them can be interpreted accurately.Third, the interpretation process takes a long time; namely, the speed of interpretations is generally slow.Therefore, it is necessary to construct a model that intelligently forecasts several types of flight actions automatically.Parameters can be estimated as closely as possible to overcome problems faced by human screening and evaluation stages.Since improving pilots' flight techniques is important, the objective is to develop better training modules for pilots's training.Thus, hidden dangers are eliminated and the factors causing accidents can

NEURAL NETWORKS
The selection process of suitable networks Deep learning (DL) algorithms represent the forefront of AI architectures, enabling knowledge acquisition from datasets.They are derived from application domains such as autonomous driving, robotics, facial recognition, natural language understanding, and other areas.However, contemporary AI models rely on extensive datasets for comprehension and struggle to transfer their domain-specific knowledge to new contexts.The essential DL algorithms contain convolutional neural networks (CNNs), recurrent neural networks (RNNs), long short-term memory networks (LSTMs), and generative adversarial networks (GANs).For example, CNN excels in image recognition by emulating the human brain's visual organization through a sophisticated DL architecture.RNN operates across multiple time steps and shares weights over time.However, the gradientvanishing issue hinders its effective learning capability from longer sequences during backpropagation.Alternatively, LSTM presents a better solution to address the issue of short-term memory limitations by introducing internal mechanisms called gates that regulate the flow of information within long chains to facilitate accurate predictions.
The research employs a bi-directional RNN algorithm tailored to the characteristics of research objects to recognize flight actions.The bi-LSTM, an extension of RNNs and LSTMs, excels at grabbing long-term dependencies when dealing with sequenced data.Given that aircraft flight action data is characterized as a typical time series dataset, the selection of the bi-LSTM becomes imperative due to its capability to classify actions based on multiple parameters determined by their distinct characteristics across different periods.
The bi-LSTM crunches data sequentially, effectively capturing contextual information within the sequence and exhibiting enhanced generalization capabilities.Compared to the LSTM algorithm, the bi-LSTM algorithm incorporates reverse temporal details, allowing it to encode both past and future context simultaneously.It enables a more comprehensive grasp of the sequence's feature dynamics and facilitates a more precise prediction of its subsequent states.

The bi-directional long and short-term neural networks
The bi-LSTM algorithm comprises input, forward backward, and output layers, respectively as shown in Fig. 1.
The LSTM algorithm comprises three control gates: forget, input, and output, respectively.The forget gate regulates the retention or deletion of information in the cell state, which computes a score based on the previous hidden state and current input at each step to find which information needs to be forgotten or preserved.The input gate determines the relevance of incoming information to update the current cell state, while the output gate controls the generation of the next hidden layer state (Yang & Xie, 2005;Chuan et al., 2004).
The bi-LSTM algorithm, utilizing the interplay among neurons, forget gates, memory gates, output gates, and hidden states, performs better in processing sequential data as it captures both long and short-term dependence within the sequence (Jia et al., 2018).

Deep learning
RNN generally holds internal information.Prior input information, a substantial dataset, is utilized to estimate the subsequent information precisely.A network called Memory is suggested, which functions as an LSTM containing gates, namely, forget, input and output, and memory state structure, respectively.Elements' bitwise operations are represented by red, the network layer by yellow, and the cell state by the arrow, where the information is stable.The score that will be updated is determined by the input gate layer, and then a new candidate score vector is constructed by a tanh layer, which is supplemented with the state.Subsequently, the cell state is streamlined, and eventually, the output gate determines the output by employing the cell state.Equation (1) presents it.
Equation ( 2) presents the forgetting gate's equation and has x i and h iÀ1 outputs.The score of each number in the cell state C iÀ1 has either 1,"fully kept", or 0, "fully discarded".
In Eqs.
(3) and (4), C denotes the new scores of the vector and I i finds what score will be streamlined.Equation (5) designates that the emotional state of an individual alters from C iÀ1 to C i ; f Ã i C iÀ1 , Indicating that the new vector must be discarded.
Equation ( 6) denotes the state to be outputted and then is processed by Eq. ( 8) to attain a score in [−1, 1] and find the output.

LSTM with attention mechanism
NNs in the standard memory form can no longer extract emotions from sentences.The attention mechanism applied in DL functions resembles human beings' selective attention mechanisms.As distinct emotions loom, the sentences' significant sections are grasped.The objective of the attention mechanism is to pick up more substantial information but ignore irrelevant ones when dealing with a problem.Thus, the computational burden of DLs can also be reduced.A modified LSTM with an attention mechanism that grasps pivotal sections of sentences is suggested to resolve the issue.Equations ( 7) through ( 10) are presented as follows: where M 2 R dþd a ð ÞÃN and N show the sentence's sequence, H denotes the input sentence's hidden note, Eq. ( 12) is employed to predict the result, W p 0 and W x shows the learned parameters of the model, h Ã represent the attribute representation of the given sentence as the input, and h N characterizes the hidden vector of the last layer of the hidden layer.

Deep neuro factorization machine
The number of dimensions is reduced by the factorization machine (FM).A linear model can be employed if low-order data is available, while NNs are nonlinear models with high-order data.However, estimating the parameters requires complex operations when the sparsity of the data is observed.Thus, a deep neuro factorization machine (DeepFM) is suggested to combine the DNN with the FM, allowing both low-and high-order features to interact.The DeepFM can execute training with an end-to-end structure, although no feature engineering is applied.Equation ( 11) presents the training stage.
where y FM shows the FM component's output, y DNN denotes the deep component's output, and y 2 (0, 1) represents the DeepFM's predictions.Va is expressed in Eq. ( 12).where the inner product hw; xi denotes the first-order feature and second-order crossfeature, respectively, y DNN is expressed in Eq. ( 14), where the activation function is represented by σ, the DNN's layer number is denoted by 1, shows the DFM's weight, a l ð Þ denotes the output of layer l, b l ð Þ represents the bias term, and H j j denotes the hidden layer's numbers.A prediction is attained when Eq. ( 14) is used.An activation function computes neurons, representing the relationship between neurons' input and output.
The data in deep neurofactors is employed as input from deep to hidden layers.If errors occur, an error range is utilized to tune the parameters.To attain an output, distinct layers are continuously adjusted.Thus, the training stage is completed when parameters converge and show no change according to the previous criteria, and the average loss function is minimized.Equation ( 15) represents the loss function.
where D represents the training set, and r u;i x ð Þ denotes the user's satisfaction score with series I.

The input vector representation of the decomposer
The sum of squared errors, RMSE, is computed in Eq. ( 16) and MAE is represented by Eq. (17).
MAE is calculated by Both small RMSE and MAE scores designate more accurate results.Both accuracy and recall metrics are utilized to measure models' performance.Equations ( 18) and ( 19) present them.
where precision is calculated as a ratio of true positives over the sum of true positives and false positives in the prediction stage.Thus, the larger the accuracy, the better the performance and the better the algorithm's impact.Recall denotes the ratio of true positives over the sum of true positives and false negatives in the prediction stage.Hence, the bigger the rate, the larger the coverage ratio, and the better the impact.Prediction accuracy is the most broadly utilized assessment indicator presented by Eqs. ( 20) and ( 21), in the literature.
where {pl, p2… pn} represents the set showing the predicted recommendations, and the set {q1, q2… qn} shows the ratings of the real users.The smaller the RMSE and MAE, the smaller the error between the predicted and real scores, and the higher the recommendation quality.The lower quality of recommendation is attained contradictorily.

IMPLEMENTATION The construction of the dataset
Flight data is typically time series data that records flight actions chronologically.The entire process of flight training, from takeoff to the shutdown of engines, includes several operations such as engine monitoring, trajectory (flight path), attitude information (flight attitude), air conditions (air-related data), and so on.The original flight training data is provided in a CSV format with timestamps and includes headers and corresponding data items.After running statistical analyses on the dataset, such as collinearity checks, correlation analyses, and cross-correlations, 21 usable and valid parameters are detected for each flight.The present research utilizes 105 CSV files, each with an approximate flight duration of 1.5 h.optimization techniques.Essentially, the Adam optimization algorithm integrates both the advantages of the SGD and RMSProp while allowing different parameters to adapt to varying learning rates.
2) The activation function is also crucial for enriching the hypothesis space, showcasing the advantages of multi-layer representation, and introducing nonlinearity to each neuron, enabling the model to flexibly approximate any nonlinear function and form diverse nonlinear models with enhanced fitting capabilities.Commonly implemented activation functions include sigmoid, Tanh, ReLU, and Softmax.Finally, the ReLU activation function is picked in the research.
3) The selection of the loss function plays a crucial role in optimizing the constructed model and is an integral parameter.The appropriate choice of loss function depends on the problem characteristics and can be guided as follows: For regression-based models involving continuous-valued vectors, MSE is suggested to be employed.In binary classification problems, it is advisable to employ the binary cross-entropy loss function.
For multi-class classification problems, categorical cross-entropy should be employed by utilizing one-hot encoding to represent the output.Whereas if integer values are employed for output representation, sparse categorical cross-entropy mapping should be implemented instead.Since the research involves multi-classification with integer outputs, sparse categorical cross-entropy mapping should be chosen to assess performance.
4) The selection of evaluation metrics or indicators is implemented to gauge the performance of the constructed model, functioning as loss functions that aim to minimize the error between true and predicted scores.However, their application differs.While loss functions guide gradient descent during training, evaluation metrics are utilized on validation and test sets to assess the constructed model's accuracy.They serve as the default evaluation metric for classification problems, representing the correctly classified proportion of samples.Therefore, the accuracy rate is adopted as the evaluation index.

Analysis of the training stage and results
A GEFORCE RTX4080 GPU computer is utilized to conduct the research.The training set is implemented to construct the model, and parameters are iteratively adjusted through repeated samples; an initial model is eventually attained.To validate the accuracy of the  1 suggests that the intelligent detection effect does not satisfy the indexing requirement.By Rule Judgement, there should be 2 Action 1s.However, the thoughtful recognition has not yet been judged.Action 2 decides by Rule Judgement that there should be three moves; however, it identifies 2. Action 3 utilizes the Rule Judgement that there should be four actions, and the intelligent recognition finds only 2, which is correct.The accuracy of current intelligence recognition accounts for just 30 percent of the data.
Action recognition is faulty since there could be possible reasons, namely, the bi-LSTM does not master the original data sufficiently.The allocated training set consists of a set of individual actions.The model's compatibility is not excellent if full data is employed as the input.The model does not segment the data adequately, resulting in worse action recognition.Also, the amount of data is limited.Therefore, the algorithm needs further improvement.

MODEL IMPROVEMENT AND OPTIMIZATION
The algorithm must be optimized to enhance accuracy and simultaneously reduce learning time.

The enhancement of the dataset
The training set is augmented to make the algorithm more sensitive to actions.There are several methods to enhance the capability of signals.Commonly and efficiently implemented data augmentation methods include time translation, data rotation, time scaling, and data truncation, which could effectively enhance the diversity and robustness of the data set and the performance of constructed models when the data is characterized as a time series.So, the Addnoise() and pool () methods are employed to add noise to the dataset to enrich time series data if needed.
The Addnoise () method is implemented to supplement noise to time series data to enhance the robustness and generalization capability of the data.It can help the constructed model master noise and uncertainty in real-world actions.
The code is presented as follows: def test_tsaug(): import tsaug from tsaug.visualization import plot X = np.array([[[1,2] X_aug, Y_aug = tsaug.AddNoise(scale=0.01).augment(X,Y) X_aug, Y_aug = tsaug.Quantize(n_levels=10).augment(X, Y) X_aug, Y_aug = tsaug.Pool(size=2).augment(X,Y) plot(X_aug, Y_aug); Also, pooling is commonly employed to reduce data dimensions, extract crucial features, and improve data efficacy.It involves scanning a window of matrices across the tensor and reducing the number of elements in each matrix by selecting either the maximum or average score.This technique enables feature extraction to possess "translation invariance," meaning that a stable combination of features can still be obtained even with slight pixel displacement in an image.In models, the pooling function typically resides one layer below the convolution function and diminishes dimensionality.Popular pooling functions include Average and Max pooling.Max pooling selects the most significant score within a window as output, while the average is computed by utilizing all scores within a window as output.These operations effectively decrease the number of model parameters while retaining essential features, thereby simplifying the structure of the constructed model.
The tsaug augmentation library is implemented in Python 3.2.In time series data augmentation, Pool (size = 2) refers to the pooling operation where every two data points of the time series data are merged into one data point.Specifically, it merges every two data points adjacent to it into a different data point for each, thus reducing the number of data points.This operation can effectively lower the size of the dataset, and the pooling operation can also be employed to downsample the dataset to manage large-scale datasets better.AddNoise (scale = 0.01) is the operation that supplements noise to the data.Noisy data is generated by utilizing a specified noise ratio (scale = 0.01), then supplemented with the original data to produce the augmented data.By adding noise, possible perturbations and uncertainties in the actual data could be better simulated, thus increasing the diversity and generalization power of the constructed algorithm.

The data splitting process
In the previous training set, the training set of segmented actions was sent to the model for learning, which can improve learning efficiency.Still, the generalization capability of the algorithm was not strong.Thus, if the data is segmented in advance and sent to the constructed model for mastering, it can lower the inference time and enhance the model's classification precision.
The article attempts to segment the data by employing the sliding window method.Generally, methods used for splitting time series data are based on sliding windows that mainly split the data set into multiple substances.So, a fixed-length window is set and is slid over the data set.The steps are presented as follows: (1) Determine the sliding window's width and step size.The sliding window's width can be assigned according to the requirements and characteristics of the dataset, and the step size can be set according to the frequency and sampling interval of the data.
(2) Flatten the data set.The data set is flattened chronologically to form a one-dimensional data sequence.
(3) Use a sliding window to segment sequence data.A sliding window is moved over the sequence data, split into multiple subsequences.
Each subsequence is then processed, which can be fed into the model for training, prediction, or processing.Note that when splitting time series data by implementing a sliding window, the following points should be noted: The width and step size of the sliding window need to be set according to the characteristics of the data set to obtain better segmentation results.The segmented subsequences should be of sufficient length and stability for further processing and analysis.
The application of sliding windows can be combined with other data processing methods, such as feature extraction and normalization, thus further improving segmentation effects and processing efficiency.Data segmentation is performed based on the regular aspects of the data features.
To avoid missing the actions of an aircraft, the goal is to detect it whenever its state changes.The subject implements qualitative and quantitative analysis methods by plotting line graphs, extracting data extremes, and computing variances, maxima, and minima.The action can also be decomposed into combinations of different flight parameters when further analysis is applied to the data features.This topic focuses on flight parameters that have a large impact on the identification of flight actions, including two features: pitch angle and tilt angle.Through repeated observations and trials, the boundary conditions of the sliding window are set based on these two characteristics.
(1) Determine the number of the ten consecutive absolute values of the pitch angle greater than 20 is greater than or equal to 2. If the pitch angle is greater than 10, it is cut.Also, the cut is terminated when six consecutive absolute values of the tilt angle less than 3.5 are greater than or equal to 4, and six consecutive absolute scores of the pitch angle less than 6 are greater than or equal to 4.

Numerical results
Previously, the internal parameters of the model were only tuned.After the previous data processing, we continued to adjust the parameters of the bi-LSTM.
The input_size denotes the input dimension, namely, how many attributes exist in each row.Parameter numbers 11, 5, and 3 are implemented to calculate the loss and accuracy rates for the training and test sets, respectively.The results are presented in Table 1.The internal parameters of the model are only adjusted.Then, the bi-LSTM-related parameters are tuned after the previously conducted data processing, as shown in Table 2.
Table 3 presents drop_outs that refer to randomly dropping some neurons during the training stage to prevent overfitting.Table 4 shows the results of utilizing different epochs.
Table 5 summarizes the samples used in each training iteration of the layer.

CONCLUSION
After the constructed model is optimized, the training and test sets have loss rates of 0.0053 and 0.208, and accuracy of 0.98 and 0.903 are obtained for the training and test sets, respectively.To ensure the generalizability of the constructed model, five additional flight data files for the parameters are selected, and the results that employ the rule and the BILSTM model are presented in Table 6.
In conclusion, the bi-LSTM cannot only forecast all the actions identified by the rule AS but also irregular movements.Meanwhile, the segmentation approach can enhance recognition precision and distinct segment actions.
. Xuejie Yang conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Table 1
Predictions (20030501).Hua the whole flight action dataset is implemented by utilizing the constructed model to generate classification results.Table1presents the outcomes.Table and Yang (2024), PeerJ Comput.Sci., DOI 10.7717/peerj-cs.2153constructed algorithm, If the tilt angle's absolute score is greater than 2 or 4 for five consecutive values, the cut is terminated.When six consecutive absolute values of the tilt angle are less than 3.5 and are greater than or equal to 4, and six consecutive absolute values of the pitch angle are less than 6 and are greater than or equal to 4, the cut is terminated.

Table 2
Loss and accuracy scores.The bold numbers are the best values attained when Input size, Drop_out, and epoch change.

Table 3
Dropout scores and loss scores.The bold numbers are the best values attained when Input size, Drop_out, and epoch change.

Table 4
Presents results when different epochs are utilized.
Note:The bold numbers are the best values attained when Input size, Drop_out, and epoch change.

Table 5
Batch_sizes and loss scores.
Note:The bold numbers are the best values attained when Input size, Drop_out, and epoch change.