Prioritizing flight simulators of the brazilian air force by the analytic hierarchy process and hypothesis tests

by the simulators of the C-95M, T-27, A-29 and F-5M with similar values and the A-1 in the last position. Once again, the C-105 had the greatest preference, for having a more reliable simulation system, in addition to having several visual scenarios with well-defined airports. The Wilcoxon-Mann-Whitney test confirmed the general preference for the C-105 simulator in relation to the second place (C-95M), with p-value = 0.0142 and the other relations, according to Equation (13). well-defined C-105 simulator isolated first C-95M T-27 close by, A-29 and the F-5M and again the A-1 isolated in the last position. Once again, the C-105 was preferred because it has a more reliable simulation system, in addition to possibly being easier to use the functions provided by the Instructor Station. The Wilcoxon-Mann-Whitney test confirmed the general preference for the C-105 simulator in relation to the C-95M, with p-value = 0.0052 and according Equations (14) (15). T-27 = 0,2668. The results showed the T-27, A-29 and C-95M in the top three positions. The T-27 simulators are used for the ground school of cadets at the Air Force Academy and for the instructors ” training. The A-29 is used in three squadrons and the simulators are essential in the training of new pilots and for their flight instructors. C-95 is a transport aircraft with high demands on the Brazilian Force and also requires training simulators from its crews. The Wilcoxon-Mann-Whitney test confirmed the general preference for the T-27 simulator in relation to the A-29 simulator, with p-value close to zero and the other relationships, according to Equation (16).

Prioritizing flight simulators of the brazilian air force by the analytic hierarchy process and hypothesis tests

METHODOLOGY
The research was carried out in five steps. Initially, the literature was reviewed to survey methodologies used in similar problems to choose the decision support algorithm. The defense sector, driven by the growing need to use increasingly advanced systems in an environment of budgetary constraints, requires the use of a project prioritization tool, based on technical criteria to efficiently employ scarce resources (Arnaut et al., 2012;Stromgren et al., 2018;Janzwood, 2021). In fact, the purpose of the research is to apply a method of decision support to prioritize flight simulators of the Air Force Command. Thus, the search in the literature focused on multicriteria decision making methods (MCDM) that support this research objective and does not fit properly in the search for a research gap. In Step 1, it was found that several authors applied MCDM to prioritize solutions in the defense area. Matos et al. (2018) explored a limited budget scenario and developed a model that allowed choosing which projects would be the object of intervention based on a multicriteria analysis using the Analytic Hierarchy Process (AHP). Camilo, Gavião and Kostin (2020) and Silva, Belderrain and Pantoja (2010) also used the AHP to prioritize strategic aerospace projects for the Brazilian Air Force, given a similar context of economic scarcity and increasingly frequent budget cuts in the country. Salgado (2021) identified a sample of ships for polar research and their respective capacities to the construction of a new Brazilian Antarctic research vessel. He explored a hybrid model AHP-TOPSIS and PBC as a benchmarking methodology, proposing the improvement and simplification for the acquisition of naval assets. Santos et al. (2021) also considered the scenario of budgetary constraints to select a medium-sized warship to the Brazilian Navy, by AHP. Bimo et al. (2022) used AHP to select amphibious aircraft models to the Indonesian navy. In Hamurcu and Eren (2020), the authors proposed a methodology based on AHP and TOPSIS to evaluate unmanned aircraft (UAV) alternatives for a selection process. In the Portuguese Navy, the AHP was explored for the prioritization of naval projects (Simplício, Gomes and Romão, 2017). The AHP stands out among the various methods that support the multi-criteria decision, due to its logical and calculation simplicity, being indicated by Abastante et al. (2019), Agápito et al. (2015), Balusa and Gorai (2019) as one of the most adopted methods for solving problems of this nature. In the area of project or portfolio selection, AHP is also widely used (Agápito et al., 2019;Goswami, Behera and Mitra, 2020;Souza et al., 2022). In Step 2, the hierarchical structure of the problem was built. The top is the objective to be solved, followed by evaluation criteria and sub-criteria, ending with possible alternatives to the problem. This structure follows the AHP model (Saaty, 1980;Wind and Saaty, 1980). Fig. 1 illustrates this hierarchical structure. The general objective seeks to prioritize flight simulators, from the point of view of the defense sector and considering the country"s budget constraints. In this hierarchy, the 1st level is composed of criteria selected from the attributes listed by the specialists, which consider technical aspects, the demand for training from FAB and the maintenance costs of the simulators. The 2nd level is composed of the technical sub-criteria considered in the research. The 3rd level is the simulators to be prioritized. In AHP, this hierarchical tree is similar to the traditional decision matrix of other MCDM methods, because it indicates the criteria, subcriteria and the alternatives of the problem. However, the evaluations that complete this matrix are different, as they derive from peer evaluations, rather than the isolated performance of each alternative in each criterion.

2/19
Prioritizing flight simulators of the brazilian air force by the analytic hierarchy process and hypothesis tests The 1st level of technical criteria were obtained from ICAO (2015) (Bass, Clements and Kazman, 2003;Zheng et al., 2009). The criteria "Training Demand" and "Maintenance Costs" derive from DCA 11-45 (Brazil, 2018a) and PCA 11-47 (Brazil, 2018b). These guidelines encourage the use of simulation devices to improve the operational training of pilots, including effective logistical support, preventive and corrective maintenance. The criteria and subcriteria used in the modeling are presented in Table 2. This sub-criterion involves four aspects: the structure and layout of the flight deck, the flight modeling (aerodynamics and engine), the aircraft systems and the flight controls and forces.
The layout of the flight cabin involves its physical structure, internal environment, instrument presentation, controls and crew seats.
Flight modeling (aerodynamics and engine) involves the mathematical models and associated data to be used to describe the aerodynamic and propulsion characteristics needed to be modeled in the flight simulator.
Aircraft systems include hydraulic, fuel, electric power, among others. Modeled systems will allow normal, abnormal and emergency procedures to be carried out.
Flight controls and forces are the mathematical models and associated data that describe the required dynamic characteristics that have been modeled in the flight simulator.
Which simulator has the best technical features?

Effects simulation
This sub-criterion involves two aspects: sound effects and visual effects.
Sound effects are related to sounds generated outside the cabin environment, such as sounds from aerodynamics, propulsion, road noise and weather effects, and those internal to the cabin.
Visual effects encompass the projection system used to display an image outside the cockpit (eg collimated or noncollimated) and the field of view (horizontal and vertical) that must be seen by pilots using the flight simulator from their reference point of view. Technical requirements such as contrast ratio and spotlight details are also

3/19
Prioritizing flight simulators of the brazilian air force by the analytic hierarchy process and hypothesis tests Criteria Sub-criteria Description Research question considered. If so, the Head-Up Display (HUD) may be considered.

Environment simulation
This sub-criterion involves three aspects: navigation, weather conditions, and aerodrome and terrain modelling.
Navigation represents the simulated navigational aids, systems and networks with which flight crew members are required to operate, such as GPS, VOR, DME, ILS or NDB.
Weather conditions can be simulated, from ambient temperature and pressure to storm modeling, etc.
The aerodrome and terrain modeling should detail its characteristics and include such items as generic aerodromes versus custom aerodromes, visual scenery requirements, terrain elevation and Enhanced Ground Proximity Warning Systems (EGPWS) databases.

Instructor station
Instructors initiate exercise sessions and engage students by exposing them to variables they will experience in the real world. Options include the ability to set the time of day as well as weather conditions including fog, wind speed and direction. At any time, instructors can assist students for unexpected occurrences including weather events, obstacle loads and mechanical failures, as well as including the ability to define normal, abnormal and emergency procedures.

Training demand xxx
The training demand is a management criterion, arising from the number of pilots to be trained, the number of simulators available, the difficulty inherent to the type of aircraft, which need more training hours due to flight missions, among other related aspects.
Which simulator has the greatest training demand?

Maintenance costs xxx
As the simulators are already in operation, the acquisition costs were not considered. This criterion considers the costs of spare parts, the costs of technical teams needed to repair the simulators, among other related aspects.
Which simulator has the lowest maintenance costs?
The 3rd Step focused on questionnaires to collect information from experts about the criteria, subcriteria and simulators. These assessments were used in AHP.
The 4th Step focused on choosing specialists with training and experience to assess their preferences for flight simulators. Table 3 presents the demography of the experts consulted. In addition to the qualification indicated, this body of experts is responsible for providing highlevel advice on this topic in the Air Force. The 5th Step consisted of modeling the assessments using the AHP algorithm. This process is composed of a sequence of calculations, to produce the final weights of the alternatives, whose highest value indicates the flight simulator considered preferred by the specialists. Initially, specialists" assessments need to be standardized, as each respondent chooses their reference for the assessment of the others, based on their experience and knowledge. The procedure for standardizing the assessments follows the principle of additive transitivity, as presented in Alonso et al (2008), Alonso et al (2009), Li et al (2019 e Gavião et al (2021). Thus, the number of pairwise assessments of each specialist is considerably reduced, which impacts the response time and the effort required by the specialist to answer the questionnaire. Assessments are carried out based on the nine-point scale, proposed by Saaty (1980). For the pairwise assessments, the scale indicated in Fig. 2 was used.  Saaty (1980) After completing the pairwise evaluation matrix, described in Equation (1), the sequence of Equations (2) to (6) are applied to calculate the weights of the alternatives and compute the Consistency Ratio (RC) of the evaluations. Literature records some techniques for calculating AHP weights. Here, we opted for the original model deriving from linear algebra, based on eigenvalues and eigenvectors of the evaluation matrices. The equations used were described in Liu e Lin (2016). RC indicates whether the expert"s judgments are considered logically consistent. RC values greater than 10% are considered inconsistent, requiring a new round of evaluations. Prioritizing flight simulators of the brazilian air force by the analytic hierarchy process and hypothesis tests The process was carried out for each Expert, calculating their respective final weight for the alternatives. The harmonic averages of the 32 sets of weights were calculated and adjusted to the unit sum. The use of harmonic mean has already been applied with the AHP to calculate the consistency ratio (Stein and Mizzi, 2007;Zheng and Ma, 2018). However, the use of a measure of central tendency helped define the final results, based on 32 expert responses, simplifying the decision-making process. Chakrabarty (2021) highlights the existence of seven measures of central tendency, capable of summarizing a set of data in a measure that represents them. The harmonic mean is the lowest value, when compared to the traditional arithmetic mean and the geometric mean (Vogel, 2022). Thus, it is possible to assume that the use of harmonic means reflects a conservative position for decision making, because if preferences are confirmed at the smallest differences between the results, by hypothesis tests for instance, the largest differences will also be statistically significant. Table 5 presents a sample of seven evaluations, due to the conciseness of the text.

12/19
Prioritizing flight simulators of the brazilian air force by the analytic hierarchy process and hypothesis tests  At the different hierarchical levels, it is possible to assess the specialists" marginal preferences based on average weights. Among the various averages, the harmonic average indicates a point value that is more representative of a data set than the arithmetic and geometric averages. For example, in a set of ten values, where nine of them are unity and the last is ten, the arithmetic mean is 1.9, the geometric mean is 1.26, and the harmonic mean is 1.1, indicating that the latter is closer to most values in the sample. Table 7 presents the harmonic average of the 32 experts" weights, by level.

13/19
Prioritizing flight simulators of the brazilian air force by the analytic hierarchy process and hypothesis tests Initially, the harmonic mean was applied to the 32 weights of Level 1, of the criteria. The means were C1 = 0.234, C2 = 0.238 and C3 = 0.133, showing a balance between the technical characteristics and the training demand of the FAB and, ultimately, the maintenance costs.
The similarity between the results of C1 and C2 motivated the checking of results by hypothesis testing, to verify, statistically, whether this difference is significant or not. In other words, the hypothesis test makes it possible to identify whether it makes sense to consider that C2 is preferable to C1 or if this difference of 0.004 between them is statistically insignificant. The use of hypothesis tests in support of AHP was applied in Lin et al. (2013), Ateş and Önder (2021), Lee et al. (2000) and Mufazzal et al. (2021).
Means describe specific values of a sample, so they should be considered as a preliminary preference, to be statistically tested. As they are not normally distributed samples, the Wilcoxon-Mann-Whitney non-parametric hypothesis test was applied to verify whether the differences between the results are statistically significant for a defined confidence interval. Thus, the 32 final results, at each level, were applied to hypothesis tests to verify if the differences between them were significant, clearly indicating the preference relationship, or if the differences were not significant.
In a hypothesis test, the p-value, a probability that measures the evidence against the null hypothesis, is calculated for a given confidence level. Generally, a significance level (denoted alpha) of 0.05 is conventional in statistics. This level of significance indicates the threshold 5% risk of concluding that there is a difference between the data sets, when in fact the difference is negligible. Thus, for a p-value ≤ α, the difference between the data medians is statistically significant, so we reject the hypothesis that nullifies the possibility of data similarity in the assumed risk level, which is why it is called the "null hypothesis". Otherwise, if the p-value ≥ α, we do not reject this null hypothesis and assume a similarity between the data. In this context, it is possible to conclude that the difference between the population medians is statistically significant.
The Wilcoxon-Mann-Whitney test to criteria 1 and 2 indicated a p-value = 0.7112, well above the 0.05 significance level, assuming that there is no significant preference for C2 over C1. However, the lower preference of C3 over C1 and C2 is more evident and was confirmed by the hypothesis test. The p-value for the comparison between C1 and C3 was 0.005145 and between C2 and C3 was 0.003358, both below alpha = 0.05, indicating significant differences. Equation (7) shows the final preference ( ) or equivalence (  ) relationship between these criteria.
The harmonic averages of the 32 weights to the Subcriteria indicated the marginal preferences of this level (SC1 = 0.4127, SC2 = 0.1266, SC3 = 0.1162 and SC4 = 0.0989). The results show a strong preference for "Flight Simulation" and an equivalence between the other Subcriteria. The difference between SC1 and the others was statistically significant, with pvalues close to zero. However, the p-values for the comparisons between SC2, SC3 and SC4 were well above 0.05, so it is possible to assume that their differences are not considerable. Equation (8)  Possibly, the C-105 had the greatest preference because it was the most reliable, as its aerodynamic model and engine were identical to that of the real aircraft. Another relevant point is the position of the C-95M, close to the T-27, since both were built using the same technology and by the same Center to which the specialists belong.
The Wilcoxon-Mann-Whitney test showed that the difference between the C-105 and T-27 simulators is not statistically significant, as the p-value = 0.1193. However, between the C-105 simulator and the four remaining simulators, the differences were considerable, according to Equation (9). Between the simulator of the T-27 and the A-29 the p-value was 0.0820, but between the T-27 and the three remaining simulators the difference was significant, according to Equation (10). The other preference relations are indicated in Equation (11), in which the simulators of the A-29, F-5M and C-95M are equivalent, but preferable in relation to the simulator of the A-1. The harmonic averages of the simulators to the SC2 were A-1 (Alt.1) = 0.0559, A-29 (Alt.2) = 0.0768, C-105 (Alt.3) = 0.1702, C-95M (Alt.4) = 0.1187, F-5M (Alt.5) = 0.0764 and T-27 (Alt.6) = 0.1296. The results showed the C-105 alone ahead, followed by the T-27 and C-95M with close values, followed by the A-29 and F-5M set, with the A-1 highlighted in the last position. Possibly, the C-105 had the greatest preference because it is the most reliable simulator for the aircraft, with sound effects closer to reality and the only one with a collimated visual system. The Wilcoxon-Mann-Whitney test confirmed that the C-105 simulator stands out in relation to the following (T-27) and the others, with p-value = 0.0077, according to Equation (12) The harmonic averages for the weights to the SC3 were A-1 = 0.0480, A-29 = 0.0851, C-105 = 0.1661, C-95M = 0.1493, F-5M = 0.0765 and T-27 = 0.1162. The results showed the C-105 in first position, followed by the simulators of the C-95M, T-27, A-29 and F-5M with similar values and the A-1 in the last position. Once again, the C-105 had the greatest preference, for having a more reliable simulation system, in addition to having several visual scenarios with welldefined airports. The Wilcoxon-Mann-Whitney test confirmed the general preference for the C-105 simulator in relation to the second place (C-95M), with p-value = 0.0142 and the other relations, according to Equation (13) The harmonic averages for the weights to the SC4 were A-1 = 0.0493, A-29 = 0.0932, C-105 = 0.1536, C-95M = 0.1224, F-5M = 0.0793 and T-27 = 0.1046. The results showed four welldefined groups, the C-105 simulator isolated in first position, followed by the C-95M and T-27 close by, the A-29 and the F-5M and again the A-1 isolated in the last position. Once again, the C-105 was preferred because it has a more reliable simulation system, in addition to possibly being easier to use the functions provided by the Instructor Station. The Wilcoxon-Mann-Whitney test confirmed the general preference for the C-105 simulator in relation to the C-95M, with p-value = 0.0052 and the other relationships, according to Equations (14) and (15) .

.1
Alt Alt  The harmonic averages for the weights to the C2 were A-1 = 0,0435, A-29 = 0,1433, C-105 = 0,0764, C-95M = 0,1141, F-5M = 0,0707 e T-27 = 0,2668. The results showed the T-27, A-29 and C-95M in the top three positions. The T-27 simulators are used for the ground school of cadets at the Air Force Academy and for the instructors" training. The A-29 is used in three squadrons and the simulators are essential in the training of new pilots and for their flight instructors. C-95 is a transport aircraft with high demands on the Brazilian Air Force and also requires training simulators from its crews. The Wilcoxon-Mann-Whitney test confirmed the general preference for the T-27 simulator in relation to the A-29 simulator, with p-value close to zero and the other relationships, according to Equation ( The harmonic averages for the weights to the C3 were A-1 = 0.0812, A-29 = 0.0705, C-105 = 0.0308, C-95M = 0.2684, F-5M = 0.0725 and T-27 = 0.1825. The results showed the C-95M and T-27 in the first two positions for being the simulators with the lowest maintenance costs and built in 2018 and 2019, respectively. Then the A-1, F-5M and A-29 simulators had similar values, followed by the C-105, which has a very high maintenance cost. The last three placed still share the use of several components of the real aircraft (avionics), which can increase maintenance costs. The Wilcoxon-Mann-Whitney test confirmed the general preference for the C-95M simulator in relation to the T-27, with p-value = 0.050 and the other relationships, according to Equation (17) Finally, the harmonic averages of the simulator"s weights were calculated, as shown in Fig.  3.

Figure 3 -Final weights
The T-27 Tucano simulator was the most marginally preferred, with a preference of 0.3118, followed by the C-95M Bandeirante simulator with 0.2077. In third was the C-105 Amazonas with 0.150 and in fourth was the A-29 Super Tucano with 0.1383. The simulators with lesser preferences were the F-5M Tiger II with 0.1039 and the A-1 AMX with 0.0883, achieving similar results.
The T-27 Tucano occupied the first position due to its regularity in the evaluations of the criteria and sub-criteria. Although the C-105 simulator had the highest preference in the four technical subcriteria, a low value in "Training Demand" and a very low performance in "Maintenance Costs" caused it to be repositioned to third place.
The Wilcoxon-Mann-Whitney test confirmed the general preference for the T-27 simulator in relation to the C-95M, with p-value close to zero and the other relations, according to Equations (18) This final list of preferences indicates that, in the event of scarcity of resources to serve all simulators, the demands of the T-27 simulator should be primarily met, followed by the C-95M or C-105 simulators. Next, the needs of the A-29 simulators and, finally, the F-5M or A-1 simulators must be observed.

CONCLUSION
This research aimed to apply a method of decision support that allows prioritizing the projects of flight simulators of the Air Force Command in view of the country"s budget constraints. Over the years, it has become evident that the biggest problem for Defense is the restriction of budgetary resources, as the amounts made available are insufficient to meet the financial needs of the Armed Forces, requiring the prioritization of the most relevant and urgent projects. In this context, a search was carried out in the research bases to survey studies that used decision support models, in which AHP was chosen, as it is a widely used method for solving similar problems. It is also worth noting that the use of hypothesis tests to assess the statistical differences between the AHP marginal preferences made the description of the preference or equivalence relationships between the simulators stricter.
The COMAER flight simulators selected for this work were the A-1 AMX, A-29 Super Tucano, C-105 Amazonas, C-95M Bandeirantes, F-5M Tiger II and T-27 Tucano. For modeling the hierarchical structure of the problem, the following criteria were defined: technical features of the simulators, training demand in the Air Force and maintenance costs. The first criterion was subdivided into four subcriteria: flight simulation, effects simulation, environment simulation and instructor station.
Data were collected through questionnaires, sent to 32 experts with experience in the criteria raised, to enable the application of the AHP method. The analysis and treatment of the collected data made it possible to indicate a prioritization of projects for the COMAER flight simulators.
The results indicated a prioritization among the projects analyzed, with the simulator of the T-27 Tucano as the most preferred, followed by the simulator of the C-95M Bandeirantes and the C-105 Amazonas, which obtained statistical similarity to each other. In fourth place was the A-29 Super Tucano simulator. The two simulators that had the least preference were the F-5M Tiger II and the A-1 AMX, which achieved results that were statistically close to each other.
This research can be improved. Initially, it is possible to expand data collection to another group of specialists, coming from other sectors of the defense industrial base, from the Ministry of Defense, among others. Finally, the use of other multicriteria decision support methods can bring new perspectives to decision makers, although it requires the development of new questionnaires to adapt data collection according to the chosen methodology.