Association rule mining of aircraft event causes based on the Apriori algorithm

To reveal complex causes of aircraft events, this paper aims to mine association rules between the trigger probability and relative strength via a modified Apriori algorithm. Clustering is adopted for data preprocessing and TF–IDF value calculation. Causative item sets of aircraft events are obtained based on the accident causation 2–4 model and are coded to establish code indicators. By avoiding the use of statistical methodologies to resolve not-a-number (NaN) values for altering the interrelations among causes, an enhancement in the Apriori algorithm is proposed by considering frequent items. By extracting frequent patterns, in this paper, all the association rules that satisfy three perspectives (support, confidence and lift) are determined by constantly generating and pruning candidate item sets. A network graph is used to visualize the association rules between different unsafe events and all types of causes. Finally, 9835 representative pieces of data, including general unsafe events, general incidents and serious incidents from the Southwest Air Traffic Management Bureau, are selected for analysis. The results show that improper energy allocation, poor conflict resolution ability, inadequate onsite management duties, adoption of a luck mentality, and occurrence of controller oversight are highly correlated with general unsafe events, and failure to rectify incorrect recitation is notably correlated with general incidents, while inadequate manual promotion, lack of conflict judgement and insufficient safety management are strongly correlated with serious incidents. This study quantitatively reveals the potential patterns and characteristics of mutual interactions among various types of historical aircraft events and highlights directions for controllable prevention and prediction of aircraft events.


NaN
Not a number TF-IDF Term frequency-inverse document frequency HFACS Human factor analysis and classification system Association analysis of aircraft accident causation involves deriving probabilities of event types based on historical event causes after learning and training by considering past incidents.This method serves as a crucial means for preventing and predicting unsafe events or accidents [1][2][3] .With the rapid development of civil aviation in China, the number of flights has significantly increased, and air transport has become a major link in international communication and domestic economic development.However, airline networks are complex, organizational structures are vast, regional differences are notable, and unsafe events frequently occur.According to statistics of the Civil Aviation Administration of China Safety, the accident occurrence ratio reached 0.29%, the incident occurrence ratio reached 11.39%, and general unsafe events accounted for 88.32% of the total aircraft events from 2013 to 2022, as shown in Fig. 1.
Figure 1 indicates that the occurrence of accidents and incidents linearly increased from 2013 to 2019.Due to the COVID-19 pandemic, the number of flight accidents and incidents plummeted from 2020 to 2022.However, the number of general unsafe events has increased to approximately 10,000 per year.Aircraft event investigations revealed that the number of causes underlying unsafe events and incidents exceeded 20, and the proportions are shown in Fig. 2.

OPEN
1 Air Traffic Management Department, Civil Aviation Flight University of China, Guanghan, Sichuan, China. 2 Sichuan Highway Planning Survey Design Institute Co., Ltd., Sichuan, Chengdu, China.* email: chqtx@126.comAccording to the results of these investigations, many unsafe events are typically caused by multiple factors, including human factors, equipment usage, management systems, and internal/external environments.However, the characteristic indicators leading to aircraft incidents exhibit discreteness, constrained by dynamically extracted factors.This constraint prevents precise quantification, leaving the determined causes restricted to a qualitative level, thereby affecting the reliability of the analysis and prediction results.Hence, it is necessary to reveal meaningful connections hidden within investigation data of aircraft incidents by employing machine learning to establish association rules.Association analysis aims to assess the correlations among various incident factors and event types comprehensively and systematically.Subsequently, preventive measures to reduce the occurrence of similar events or accidents can be adopted.
This paper aimed to introduce a data-mining technique.A substantial amount of incident investigation data was analysed in depth.Comprehensive and accurate correlational information was derived, and inherent correlations leading to event causes were revealed.By establishing various association rules, greater insights were obtained.A pattern of potential correlations leading to unsafe aircraft incidents was revealed.Finally, the advantages of real-time high-speed data streaming were exploited, and unsupervised and supervised learning  The determination of the causes of aircraft incidents adheres to the principles of completeness, continuity, and consistency.These principles conform with industry regulations and operational manuals.By consulting numerous unsafe event investigation experts and comparing common data cleaning methods, clustering was used to identify outliers by dividing the data of general unsafe events, general incidents and serious incidents into three groups.The 56 retained items are assigned a value of 1, while the deleted items are assigned a value of 0. The specific cleaning process is shown in Fig. 3.
A processed dataset is generated as output in accordance with the data format requirements of the code of the data analysis and processing modules (e.g., numpy, matplotlib, and pandas)

Feature extraction
The initial feature set obtained after data cleaning based on the expert system model typically consists of highdimensional data, and not all features are equally important.Irrelevant information can reduce the algorithm performance, leading to dimensionality, ultimately affecting the outcome of data analysis.Introducing feature reduction facilitates the elimination of redundant dimensions (weakly correlated dimensions) or the extraction of more valuable features, thereby increasing the computation speed, enhancing the efficiency, and ensuring the accuracy of data analysis.www.nature.com/scientificreports/ The term frequency-inverse document frequency (TF-IDF) method is a classic weighting calculation technique widely used in recent years for data analysis and information processing.The term frequency (TF) represents the frequency or occurrence of a particular keyword within an entire document.The inverse document frequency (IDF) denotes the inverse of the document frequency and is primarily employed to reduce the impact of common words across all documents that minimally influence the document.The TF-IDF model can be expressed as follows: where tfidf i,j denotes the product of the term frequency tf i,j and the inverse document frequency idf i,j .In the TF-IDF method, the weight is directly proportional to the frequency of occurrence of a given feature in a document and inversely proportional to the number of documents containing this feature in the entire corpus.A higher value of tfidf i,j indicates greater importance of the feature word within the text.
The causative factors obtained by feature extraction were sequentially numbered, constituting the current causative factor set t = {t 1 , t 2 , . . ., t i } .Simultaneously, the collected incident investigation reports were sequen- tially numbered, constituting the collection of incident investigation report texts D = D 1 , D 2 , ..., D j .
The TF value can be calculated as: where the numerator n i,j denotes the occurrence of a given causative factor in incident investigation report D j , and the denominator k n k,j denotes the total count of all causative factors in report D j .The resultant tf i,j provides the frequency of a specific feature word.
The IDF value can be obtained as: where the numerator |D| denotes the total number of incident investigation reports and the denominator j : t i ∈ d j denotes the number of reports containing word causative factor t i .If the considered word is absent in the corpus, zero denominator is obtained.Hence, in general, this can be avoided by adding 1 to the denominator, namely, 1 + j : t i ∈ d j .
The TF-IDF model can be expressed as follows:

Determination of the aircraft incident causal factor set
After data preprocessing, text mining was conducted of the historical aircraft incident investigation reports (Fig. 3).Initially, according to regulations such as the Event Information Reporting and Processing Standard and Event Samples, causative factors were decomposed into relevant causative keywords.With the use of the TF-IDF method for feature word extraction from text, preliminary causative factors of aircraft incidents were identified by matching within specified classified texts.However, the causes of aircraft incidents are multifaceted, and many people are involved.In this paper, the accident causation 2-4 model was introduced.Factors contributing to aircraft incidents in the form of human factors, equipment factors, and management factors were attributed to internal organizational reasons, while the environment was considered an external factor.The specific model is shown in Fig. 4.
The top 30 TF-IDF ranked feature words from each level were selected as causative factors.After deduplication, all the causal factors of each event were obtained.Similar or identical conditions were integrated, and a set of aircraft accident cause factors was finally extracted.

Aircraft incident causal rule mining
In this paper, the Apriori algorithm of association rules was used to discover relationships or patterns in datasets.Through the downward closure property of frequent item sets, candidate item sets were continuously generated and pruned, and all rules satisfying minimum support and minimum confidence levels were obtained.Association rules that meet the requirements also satisfy the filtering requirements.The greater the support and confidence are, the stronger the rule.In addition, in this paper, the lift index was employed to filter the obtained association rules.A lifting degree higher than 1 indicates that the former and latter terms are positively correlated.Conversely, they are negatively correlated.

Relevant definitions
The association rule problem based on events can be expressed as follows: let X = (X 1 , X 2 , . . ., X m ) repre- sent the set of causative factors obtained after data preprocessing, where m is the number of causative factors.S = {S 1 , S 2 , . . ., S n } denotes the original set of association rules for events, with n denoting the total number of association rules, each S i (1 ≤ i ≤ n) denoting a subset of item sets, and S i ∈ X.
(1) tfidf i,j = tf i,j × idf i,j (2) Event association rules can be expressed as X a => X b , where X a and X b denote the antecedent and conse- quent, respectively, of the rules.For each rule, X a ∈ X , X b ∈ X and X a ∩ X b = ∅ must meet the minimum support and confidence thresholds while yielding a lift value greater than 1.
Definition 1 Support: The probability of the simultaneous occurrence of event causative factor items X a and X b is referred to as the event causative rule support, represented by: where the numerator |X a ∪ X b | denotes the count of the simultaneous occurrence of event causative factor items X a and X b , and |S| denotes the total count of all association rules.Definition 2 Confidence: If an event causative factor item X a occurs, the probability of another event causative factor item X b occurring is referred to as the event causative rule confidence, represented by: where the numerator |X a | denotes the count of association rules containing both causative factor items X a and X b , and the denominator |X a ∪ X b | denotes the count of association rules containing event causative factor items X a and X b .
Definition 3 Lift: The measure of improvement in the probability of the occurrence of one event causative factor item X a in the presence of another event causative factor item X b can be expressed as:

Antecedents and consequents of the association rule
The core of association rules is to reveal the relationships between items in a dataset, helping to better understand the frequency at which one item set may occur given another.By identifying these relationships, potential patterns and regularities can be determined, providing support for decision-making and prediction.Association rules consist of two parts: the antecedent and the consequent.The antecedent is the condition, while the consequent is the result.The relationship between these parts indicates a trend where certain items may occur in the presence of other items.Association rules were mined based on the support and confidence.The support is a measure of the frequency of simultaneous occurrence of item sets, while the confidence is a measure of the probability of consequent occurrence given the antecedent.
In association rules, the order of the antecedent and consequent is a key concept but can also lead to confusion.Because association rules describe item sets based on their content rather than their order, the order of item sets does not affect the meaning of the rules.In other words, whether the antecedent or consequent, as long as their contents are the same, the meaning of the rules is the same, indicating some form of correlation or causality between two item sets.However, while conceptually, the order of the antecedent and consequent does not affect the meaning of the association rules, it can affect the calculation of metrics such as the support and confidence.This difference occurs because these metrics are calculated based on specific combinations of item ( 6) www.nature.com/scientificreports/sets, reflecting the degree and frequency of association between different item set combinations.Therefore, even if two rules express the same association relationship, their metrics, such as the support and confidence, may differ due to the actual occurrences in the dataset.
In conclusion, the meaning of the obtained association rules depends on the content of their antecedents and consequences, while the metrics reflect the performance and degree of association of these rules in the actual datasets.

Algorithmic improvements
In the application of the Apriori algorithm for data mining, despite the implementation of conditional checks to avoid division-by-zero errors, not-a-number (NaN) values can still occur.Such instances likely stem from either small data samples or inadequacies in meeting the threshold requirements for metrics such as the support and confidence within certain item sets, thereby resulting in NaN computations.Notably, the emergence of NaN values does not necessarily indicate code errors but may reflect the inherent data characteristics.
In practice, addressing NaN values typically involves employing statistical techniques such as mean or median imputation or adjusting thresholds to mitigate their occurrence.However, due to the distinct nature of aircraft incident data and their difference from conventional datasets, applying statistical methodologies to resolve NaN values, such as using alternative incident cause codes for filling or removing specific cause codes, could alter the interrelations among incident causes, thus compromising the accuracy of the final analysis.
To address this issue, in this paper, an enhancement to the Apriori algorithm was proposed.In contrast to conventional NaN resolution methods, the proposed approach focuses on preprocessing, specifically on filtering and tallying frequent items, aiming to enhance the efficiency and precision of the algorithm.Initially, by traversing each transaction in the dataset, the support of each item can be computed and stored in a header table.Subsequently, items with a support value below the minimum threshold can be removed from the header table, ensuring the retention of only frequent high-support items.Finally, these retained frequent items constitute the item sets.By exclusively considering frequent items, the refined algorithm aims to efficiently extract frequent patterns, thereby augmenting its performance and accuracy.This preprocessing step reduces the processing time and resource overhead associated with infrequent item handling, consequently lowering the computational complexity and enhancing the algorithmic efficiency and precision.Such a strategy plays a pivotal role in data mining, enabling the algorithm to maintain effectiveness when managing large-scale datasets.Additionally, due to the substantial volume of aircraft incident data employed, an iterative approach was adopted during coding to generate candidate and frequent item sets, avoiding recursive calls and minimizing the recursion depth.This could ensure more effective processing of large datasets while mitigating potential stack overflow issues.
According to the association rule mining method, the steps for mining accident causation association rules are as follows: Step 1: Input the dataset of accident causation factors.
Step 2: Set the minimum lift threshold, minimum confidence threshold, and minimum support threshold.
Step 3: Utilize the Apriori algorithm for generating strong association rules that meet the minimum support threshold.
Step 4: Filter the obtained frequent item sets based on the minimum support, minimum confidence, and lift thresholds; the rules that meet these criteria are considered association rules.
Step 5: Eliminate association rules where the antecedent or consequent is empty and store the association rules as aircraft incident causal rules.

Data collection and cleansing
The dataset utilized in this study was compiled from investigation reports of unsafe events from 2019 to 2022.There are 9835 pieces of data from the Southwest Air Traffic Management Bureau.Due to space limitations, the authors selected only 22 representative data points, which are distributed among different years, different flight stages, different causes and different levels of unsafe aircraft events.The resulting dataset (after data preprocessing) is detailed in Table 2.
Text mining was applied to the collected event investigation reports.Following existing regulations such as the Event Information Reporting and Processing Standard and Event Samples, causative factors were decomposed into relevant causative keywords.The TF-IDF method was employed for text feature extraction, and preliminary aircraft incident causative factors were obtained by matching within specified classified texts.The accident causation 2-4 model was applied for further screening of the aircraft incident causative factors, categorizing all factors during aircraft operation into human, equipment, management, and environmental layers.Finally, the top 30 TF-IDF-based ranked feature words from each layer were extracted as the causative factors for that layer.

Encoding of the causal factor set
The obtained 56 items are all data sources from the unsafe incident investigation reports listed in Table 1.After data cleaning with the expert system and factor screening by the accident cause 2-4 model, the 4th, 5th, 6th, 8th, 11th, 14th, 18th, 19th, 20th, 24th, 31st, 32th, 34th, 45th, 46th, 48th, 49th, 50th, 52nd and 55th items were selected.These items include the unsafe event type and all types of causes, including relevant personnel, aircraft, equipment conditions, management and environment.The set of causative factors of aircraft incidents was extracted and encoded, as summarized in Table 3. www.nature.com/scientificreports/

Unsupervised learning of causal analysis
The set of causal factors used to train the machine model lacks labels, requiring it to autonomously explore, obtain, and summarize knowledge to annotate the training data.This facilitates the discovery of inherent patterns and features among these elements.In this study, an initial correlation network for aircraft accident causation was established, as shown in Fig. 5. Figure 5 clearly shows that node A 02 occupies a central position and exhibits connections with numerous factors.However, due to the considerable number of nodes and edges, deriving precise connections between causal factors remains challenging, thereby hindering quantitative analysis of their relationships.To reveal valuable yet hidden associations, the subsequent step involved employing the Apriori algorithm for data mining.This approach aimed to reveal the latent value within the dataset, resulting in the determination of association rules meeting specific conditions.The network graph of these rules is shown in Fig. 6.
In Fig. 6, node A02 remains at the network centre, demonstrating connections with multiple nodes such as M05, E02, and H67, among others.Nevertheless, it remains challenging to quantitatively analyse these association rules.Therefore, the introduction of quantitative evaluation through the support, confidence, and lift was necessary, as shown in Fig. 7.
An analysis of the support depicted in Fig. 7 reveals frequently occurring risk factors in accidents, indicating their propensity to cause the risk state of an aircraft incident.Moreover, the analysis of high-confidence association rules reflects reliable cause-effect relationships.Association rules with high lift indicate positive or negative combinations of factors.However, in the scatter plot, numerous association rules exhibit a confidence level of 1, suggesting a high-confidence association between factors.Some feature value pairs are detailed in Table 4.
Table 4, which is based on the definition of confidence, indicates that inadequate mastery of specific content during flight training H 37 , inadequate personnel qualification management M 25 , and poor conflict resolution ability H 10 are likely to cause the occurrence of a general incident A 02 .To more comprehensively visualize the association rules causing the occurrence of a general incident A 02 , a high-confidence network graph was generated, as shown in Fig. 8.
Figure 8 shows that A 02 remained at the network centre and was interconnected with multiple nodes, indicating that many factors potentially cause A 02 .However, the association rules are overly idealized, relying too heavily on individual factors while disregarding other factors that might contribute to its occurrence, thereby potentially impacting the accuracy of the final analysis.

Initial association
To enhance the analysis accuracy, supervised learning was applied to prelabel the original training set, thereby adjusting or removing association rules with a confidence value of 1.A new set of associations was then established after this step, leading to changes in confidence values.The resulting causal correlation network of aircraft incident causes after data intervention is shown in Fig. 9.In the causality network graph, each node represents the causal factors extracted from the selected practice survey reports, including 90 causal factors and 3 incident types, and the node size is determined by the degree of the node.The correlation between the causal factors is regarded as an undirected edge between the nodes; if 2 causal factors occur in one event at the same time, there www.nature.com/scientificreports/ is an edge between the two points, and the weight of the edge is the number of accidents in which both factors concurrently appear.Figure 9 shows that A 01 and H 02 are central nodes in the network graph.In contrast to the unsupervised algorithm results shown in Fig. 5, where only A 02 emerged as the central node, this revised network graph more explicitly highlights the significance of controllers in safeguarding against aircraft incidents.Through supervised learning intervention, extreme situations in the association rules of the original training set could be addressed, allowing the analysis of association rules between factors to expand beyond the connections of a single node with others.www.nature.com/scientificreports/ The Apriori algorithm was utilized to mine association rules while adjusting the minimum support, confidence, and lift thresholds.Different minimum support thresholds yielded varying quantities of association rules, as detailed in Table 5.
To ensure the analysis accuracy, thresholds were set to filter out low-reliability association rules while obtaining a sufficient quantity for analysis.Thus, by setting the minimum support, confidence, and lift thresholds to 0.05, 0.1, and 1, respectively, rule mining was performed.After data mining, 128 association rules were obtained.A scatter plot of their support, confidence, and lift is shown in Fig. 10.
Figure 11 shows that certain nodes occur at the centre of the network graph, exhibiting more complex connections with other nodes.These nodes include general incident A 02 , serious incident A 03 , inadequate safety pressure transmission M 18 , insufficient regulation M 21 , inadequate onsite management duties H 30 , inadequate regulation M 21 , inadequate coping ability H 45 , and inadequate rigorous risk control measures M 01 .These nodes  exhibit higher degrees than other nodes, indicating more frequent association rules with other nodes.This emphasizes the need for specific attention and strict control of these nodes in civil aviation safety management.Based on the diagram, it can be preliminarily determined that A 01 , A 02 and A 03 hold central positions in the network graph and possess significant weights.In contrast to the unsupervised algorithm results in Fig. 6, despite containing fewer nodes, the variations among different association rule indicators increased.This approach is more advantageous for subsequent analyses of the relationships between association rules, enhancing the credibility of the analysis results.www.nature.com/scientificreports/Analysis of the high-support association rules Fifty association rules with high support were extracted from all association rules.These high-support association rules are detailed in Table 6.
The 50 high-support association rules exhibited support values varying between 0.057142 and 0.142858, confidence values varying between 0.2 and 0.833334, and lift values varying between 1 and 4.375.High-support association rules indicate frequent relationships between factors, with higher support indicating stronger rules.A diagram of the high-support association rules is shown in Fig. 12.
The analysis revealed important nodes, such as A 01 , A 02 , A 03 , H 30 , M 21 , and H 10 .The frequent relationships between the identified factors include correlations between air traffic controllers H 02 and general unsafe events A 01 , between inadequate manual promotion M 14 and serious incidents A 03 , between improper energy allocation H 04 and general unsafe events A 01 , and between poor conflict resolution ability H 10 , inadequate onsite  management duties H 30 , and occurrence of luck mentality H 12 and general unsafe events A 01 .These frequent influences between factors contribute to the aircraft operational system occurring in a high-risk state, leading to aircraft incidents.Hence, focused attention and preventive measures are needed for the corresponding personnel, equipment, management, and environmental factors related to these causal factors to minimize their impact.

Analysis of the high-confidence association rules
Fifty association rules with high confidence were selected from the 128 association rules.These high-confidence association rules are detailed in Table 7.
These 50 high-confidence association rules exhibited confidence values varying between 0.5 and 0.833334, support values varying between 0.057142 and 0.142858, and lift values varying between 1.25 and 11.666667.Analysis of the high-confidence association rules, represented in the network diagram shown in Fig. 13, provides more intuitive judgement of the relationships between factors.
An analysis of the diagram reveals that the occurrence of aircraft general unsafe events A 01 is highly likely due to controller oversight H 02 .There is a 75% chance of personnel-related factors such as the occurrence of luck mentality H 12 causing the occurrence of aircraft general unsafe events A 01 during work.Similarly, serious incidents A 03 are 75% likely to occur due to inadequate manual promotion M 14 .When a general incident A 02 occurs, there is a 66.7% likelihood of it being caused by a failure to rectify incorrect recitation H 06 .Such highconfidence association rules highlight significant causal relationships, indicating that certain antecedent factors are highly likely to cause subsequent factors, thereby increasing the risk of aircraft incidents.www.nature.com/scientificreports/ to prioritize the aforementioned nodes.Prompt action should be taken upon detecting tendencies towards the occurrence of these nodes to prevent the occurrence of unsafe incidents.Given the importance of ensuring aviation operational safety and preventing events, learning from past events is crucial.However, due to limitations in data sampling and the finite cognitive understanding of manually labelled experiences, it is possible that the results of machine learning might possess directional bias.Therefore, during the next phase of research, efforts will focus on expanding the dataset and employing human experiences more systematically, possibly along the direction of deep learning methodologies.

Figure 1 .
Figure 1.Civil and general aviation aircraft accidents and incidents in China from 2013 to 2022.

Figure 2 .
Figure 2. Causes of incidents and general unsafe events in China from 2019 to 2022.

Figure 5 .
Figure 5.Initial association network of the causal relationships from unsupervised learning.

Figure 6 .
Figure 6.Network graph of the discovered association rules.

Figure 7 .
Figure 7. Scatter plot of the support, confidence and lift from unsupervised learning.

Figure 8 .
Figure 8. High-confidence network graph from unsupervised learning.

Figure 9 .
Figure 9. Incident cause correlation network from supervised learning.

Figure 10 .
Figure 10.Scatter plot of the support, confidence, and lift of the association rules from supervised learning.

Figure 11 .
Figure 11.Diagram of the 128 association rules.

Figure 12 .
Figure 12.Diagram of the high-support association rules.

Figure 13 .
Figure 13.Diagram of the high-confidence association rules.

Table 1 .
Items of incident investigation statistics.

Table 2 .
Main investigation attributes after data preprocessing.

Table 3 .
Codes of the causes of aircraft incidents.40,Weak risk awareness H 41 , Excessive delegation of authority H 42 , Inadequate estimation of weather impacts H 43 , Unclear basic concepts H 44 , Inadequate coping ability H 45 , Inaccurate language in voice clearance communications H 46 , Failure to comply with instructions H 47 , Intercepting incorrect flight paths H 48 , Inadequate attention to airborne situations H 49 Management Layer Inadequate rigorous risk control measures M 01 , Insufficiently detailed conflict resolution training M 02 , Some individuals adopt a sense of luck regarding information reporting M 03 , Insufficient safety management M 04 , Lack of effective supervision and evaluation mechanisms M 05 , Inadequate onsite management precision M 06 , Inadequate grasp of ideological dynamics in a timely manner M 07 , Insufficient differentiated training M 08 , Ineffective communication between relevant units M 09 , Ineffective work style of management personnel M 10 , Inadequate analysis and research of risks M 11 , Inadequate daily education of controllers M 12 , Inadequate implementation of responsibilities M 13 , Inadequate manual promotion M 14 , Inconsistent understanding of important regulations M 15 , Lack of strict daily supervision M 16 , Inadequate depth in case analysis M 17 , Inadequate safety pressure transmission M 18 , Failure to maintain relevant data as per agreement M 19 , Inadequate training M 20 , Inadequate regulations M 21 , Inadequate combination M 22 , Existing loopholes in implementing relevant requirements M 23 , Lack of rigorous work processes M 24 , Inadequate personnel qualification management M 25 , Inadequate management of team resources M 26Equipment LayerExisting technical prevention measures failed to control risks E 01 , Meteorological radar equipment did not respond to airborne weather conditions in real time E 02 , Lack of alarm prompts when the automated system encounters abnormalities E 03 , Lack of equipment support E 04 , Difference between permission heights displayed in standby and primary systems E 05 , Occurrence of equipment failure during primary and secondary automated system switching E 06 , Sudden malfunction of instrument landing system equipment E 07 01 , Controller H 02 , Duty manager H 03 , Improper energy allocation H 04 , Mishearing H 05 , Failure to rectify incorrect recitation H 06 , Lack of preparation for unlandable situations H 07 , Shift in work focus H 08 , Lack of conflict judgement H 09 , Poor conflict resolution ability H 10 , Inadequate monitoring H 11, Occurrence of luck mentality H 12 , Failure to detect flight conflicts in advance H 13 , Scattered work style H 14 , Weak safety awareness H 15 , Unauthorized departure during duty H 16 , Cumulative risk of personnel fatigue H 17 , Violation of operational manual regulations H 18 , Some regulations were orally requested but not included in the manual H 19 , Weak team cooperation awareness H 20 , Inadequate risk assessment H 21 , Insufficiently stringent work procedures H 22 , Inadequate risk identification H 23 , Inadequate ability to manage complex situations H 24 , Plans did not fully consider the interference caused by sudden situations H 25 , Occurrence of inertia thinking H 26 , Delayed issuance of landing permits H 27 , Inadequate radar monitoring H 28 , Insensitivity to alarms H 29 , Inadequate onsite management duties H 30 , Failure to conduct radar identification as required for aircraft H 31 , Insufficient understanding of "highlight display" and "conflict line" significance H 32 , Inadequate recognition of key risks in hotspots H 33 , Poor work status H 34 , Inadequate emergency management H 35 , Inconsistent situational awareness H 36 , Inadequate mastery of specific content during flight training H 37 , Failure to request ascent height in time H 38 , Inadaptability to new work procedures H 39 , Inadequate technical prevention measures H Environment Layer Noisy operational environment S 01 , Delay in specific transfer of flight information from the previous control unit S 02 , Momentary busy period of the tower control channel S 03 , Nonstop flight construction S 04 , Low-level flight volume leading to relaxation S 05 , Scattered thunderstorm weather conditions S 06 , Presence of air force activities S 07 , Sudden isolated thunderstorm weather conditions S 08 Event Type General unsafe event A 01 , General incident A 02 , Serious incident A 03 Vol.:(0123456789) Scientific Reports | (2024) 14:13440 | https://doi.org/10.1038/s41598-024-64360-6

Table 4 .
Some of the high-confidence association rules from unsupervised learning.

Table 5 .
Changes in the association rules for different minimum support thresholds.

Table 6 .
Some of the high-support association rules from supervised learning.

Table 7 .
Some of the high-confidence association rules from supervised learning.

Table 8 .
Some of the high-lift association rules from supervised learning.