Integrated Component Reliability Modeling for Helicopter Service Industry

A helicopter service industry is concerned with component reliability, as poor component reliability jeopardizes the safe operation of aircraft. Currently the maintenance process used for component overhauls and replacements is typically based on the maximum intervals called Hard Time (HT) limits recommended by manufacturer without using the real-world reliability data. In this case study, an integrated component reliability modeling procedure using is proposed to identify proper component overhaul and replacement intervals for a leading helicopter service industry. This procedure considers analysis methods including removal rate analysis, mean time between failure (MTBF) analysis, average life analysis, data distribution analysis, and total quality management (TQM) shop survey which can be used as a framework to support reliability programs in the helicopter industry, working as a decision-support tool for the modification of manufacturer’s recommended intervals. An illustrative example is provided to show the use of this modeling procedure. Future work could be done to correlate inventory analysis using component reliability modeling leading to total productive maintenance.


Introduction
The generally accepted definition of reliability is the probability that a device will provide adequate operation for a given time in its intended application. When system performance is time dependent (i.e., the length of time a system is expected to operate), then reliability can be measured in terms of mean life, failure rates, and mean time between failures. Reliability modeling is used to reveal recurring patterns of failure and underlying causes of those failures. When the failure data are known, decisions can be made concerning reliability expectations, corrective actions, maintenance procedures, and costs of repair or replacement [1].
When it comes to maintaining aircraft, service time is affected by the strengths of the components. Mechanical components age with use, particularly components of helicopters. These types of aircraft, often high in utilization, are subject to harsh operating environments and mechanical degradation from gear meshing and aging. When considering component reliability, a helicopter is viewed as a system made up of literally hundreds of sub-systems and components. Thus, the reliability of individual helicopter components becomes crucial for the reliability of the aircraft as a functioning whole.
The primary component maintenance process used in helicopter service industry is called "Hard Time" (HT) limit which defines the maximum interval for performing maintenance task. According to U S Federal Aviation Administration (FAA) [2], HT limits typically apply to component overhauls, but also include component retirement life.
The HT limit is provided by the component manufacturer. A short HT limit is beneficial to the manufacturer's pocket book, as it suggests components be overhauled or replaced frequently. When the HT limit is set to a time interval that is too short, the operator will be subjected to unreasonably high maintenance and operating costs. Conversely, if the HT limit is set too long, the safety of the aircraft is compromised due to poor component reliability.
With safety being held as the highest standard by the FAA, HT limits on components are elected as the primary maintenance target, but how would any industry know if the manufacturer HT limits are the safest and the most cost effective choice? In general, there are many recommendations for reliability data analysis, and some industry movement for HT limit modifications, but step-by-step procedures to evaluate HT limits are not found. For example, Meeker and Escobar [3] focus on statistical classification of failure distributions. A model is proposed in [4] using the concepts of soft life and hard life to optimize the total maintenance cost. Raju et al. [5] develop a spoon-shaped curve model to reveal the relation between maintenance and product life whereas Leung at al. [6] use systematic mapping of time-to-failure patterns to scrutinize component life characteristics. While there are several types of reliability analysis and various reliability modeling approaches, there is nothing that pulls from all the different reliability models to provide managers in helicopter service industry with support for a decision concerning component replacement intervals.
This case study focuses on developing an integrated component reliability modeling procedure using several data analysis methods for a leading helicopter service industry. The concept of measuring degradation during inspections to produce an estimate for reliability is introduced as a key feature of our modeling approach. The model provides formula for probability distribution and hazard function, and uses a computer program to carry out the calculations for analysis. The model assesses the useful life of components that only fail due to degradation and points to other sources for other component replacement problems. The four phases of modeling procedure are developed in the case study for the helicopter service industry utilizing real reliability data, and described with consolidation of results for assisting decision making process to improve current reliability levels.

Integrated Reliability Modeling Procedure
The component reliability modeling in this case study makes use of several types of data analyses that can be performed on a collected reliability data set to support the decision-making process. While these reliability data analyses appear to be independent of each other, all of them, in fact rely on the same data set. While the analyses are performed in parallel, they provide slightly different perspectives for management decision making. There are four phases in the integrated component reliability modeling procedure as depicted in Figure 1. The four sequential phases of the modelling procedure are: 1. The first phase involves data collection in which reliability data is gathered. 2. The second phase involves groundwork analysis which presents an effort to prepare gathered data for reliability analysis. The groundwork analysis includes data grouping and window selection involving choosing an appropriate time period for data analysis. It serves in organizing the data for statistical calculations, allowing the rest of the component reliability analysis procedure to flow smoothly. Data plotting methods such as Pareto chart [7] can be done in this step to identify critical problems, and qualitative methods such as root-cause analysis [8] can also be used to suggest corrective action. 3. The third phase involves the integrated component reliability modeling which is performed in five predominant analyses including both quantitative and qualitative measures. 3.1. Removal Rate Analysis: When a component is replaced as part of scheduled or unscheduled maintenance, it is called a removal. "Component Removal Rate" (CRR) is calculated by taking the number of removals (both scheduled and unscheduled) for a time interval, dividing it by the "Total Number of Flight Hours" (TNFH) for that same interval. The unscheduled component removals or "Component Failure Rate" (CFR) is calculated in the same way but excludes the scheduled removals in the calculations. The "Percentage of Failing Removed Components" (PFRC) shows the percentage of components that failed out of all the removed components for the time period analyzed and is calculated by taking the "Total Number of Unscheduled Component Removals" (TNUCR) and dividing it by the "Total Number of Component Removals" (TNCR), which includes scheduled and unscheduled removals. If the PFRC is low, it may suggest the HT limit is set too short, and if the PFRC is high, it may suggest that the HT limit is set too long [9]. 3.2. MTBF analysis: Mean Time Between Failures (MTBF) provides a calculated average HT limit based on real data for comparison with the manufacturer's HT limit [10,11]. In many industries, calculating component reliability simply means  [13]. Using real reliability data, an advanced reliability analysis is modelled with Weibull distribution fitted using regression to generate a table of results showing probability of component failure with different HT limits [14]. This analysis is a three-step process that performs "Time since Overhaul" (TSO) averaging followed by the computation of the Manufacturer's HT "δ" Ratio, and finally the Weibull plot. Here a Weibull distribution is suited for randomly failing components because it takes early failures into consideration. 4. TQM shop survey: Total Quality Management (TQM) shop survey provides a qualitative approach for analysis, gathering intelligent information based upon component experts' opinions [15]. Experts' experience and inherent knowledge is valuable and could bare weight in decision support for HT limit modifications.
The five different component reliability analyses are performed in parallel on the component under consideration. These analyses constitute the main portion of the modeling performed in this procedure. Each individual analysis generates a slightly different set of results, thereby allowing five different perspectives on component's reliability. This provides a greater amount of data for comparison and the decision support process. The analyses can be accomplished in any order since they all rely on the same set of raw data.
The last phase involves the result comparison and decision support process whereby results are consolidated and reviewed for improving current reliability levels.

Illustrative Example
The company used in this example has 13 fleets with over 230 helicopters, accompanied by large maintenance and inspection programs mandated by the FAA to support the safe operation of the helicopter fleets. Each helicopter literally has hundreds of components interacting to keep the aircraft operational, but there are a dozen or so

Data Collection
The company considered in this case study has previously implemented an "Enterprise Resource Planning" (ERP) solution. The ERP is a web-based software program used to collect and store company data. This software stores the data in a large Oracle database, organized by the software into thousands of supporting tables. Stored data is selected by using dynamic queries via Microsoft Access and dumping the data into Microsoft Excel for analysis. Table 1 lists the data fields that are selected for our analyses.

Groundwork Analysis
In this example, the groundwork analysis was quickly accomplished due to the selection of only one component (i.e., the 412 C-box) undergoing the analysis procedure. The Window Selection is simply a technique of choosing an appropriate time interval for data analysis. Since no component modifications were identified, the entire timeperiod was used for each of the subsequent reliability analyses. Table 2 shows the result of data grouping. The Weibull Plotting sub-step is at the core of all the Component Reliability Analyses within the entire procedure. Microsoft® Excel was used to accomplish this sub-step and produced a table of results that related HT Limit modifications to the percentage risk of C-box failures as in Table 2. The Weibull Results Table was then plotted as seen in Figure 2 to provide a visual aid for managers, if necessary.
A Pareto chart can be used to plot data based on reason for removal, and it is clear that the leading problem with C-box failures is metal contamination. Metal contamination is from gear and/or bearing wear or breakdown. An investigation of each of the incidents was performed by first reviewing the maintenance procedures. It was concluded that the cause of the metal contamination was not a result of poor maintenance procedures. Further investigation to identify possible causes is needed.    Table 3 summarizes the removal rate calculations. A data spike was identified in Year 4 of CFR with PFRC at 67%. About 67% of the components failed before the HT limits, and investigation is needed to determine whether the HT limit is too long. Using the same data, ratios for MTBF analysis can be calculated. The MTBUR was calculated by dividing 87,763 hours by 17 unscheduled removals to obtain 5,162.5. The MTBSR was then obtained by dividing 87,763 hours by 16 scheduled removals to obtain 5,485.2, and the MTBR was computed by dividing 87,763.0 hours by 33 (i.e., 16 + 17) to obtain 2,659.5. The manufacturer's recommended HT Limit (i.e., 4,000 hours) is 23% lower than the MTBUR (i.e., 5162.5 hours), suggesting that the HT limits may be too low.  There were 17 unscheduled removals (i.e., failures) occurred during the 4-year period. The sum of all components' time since overhaul (TSO) for the 17 removals is 39,876.2 hours. There were 16 scheduled removals and the TSO is 16 multiplied by 4,000 hours for a total of 64,000 hours. The failure rate is estimated to be 17 over the sum of 39,874 and 64,000, which is 0.000164 per hour. The average life is then estimated to be 1 over 0.000164 which is 6,110 hours. This is about 35% over the HT limit at 4000 hours.

Component Reliability Modeling
Using real reliability data, a Weibull distribution is fitted using regression and the failure probability can be estimated based on the HT limits as shown in Table 4. The resulting R-square of the fitting is about 89% indicating the Weibull distribution covers 89% of the data variation, which is generally respectable. The current HT limit of 4,000 hours corresponds to 45% C-box failures prior to reaching the limit. About 55% of the components can last longer than 4,000 hours. The graph of Weibull results comparing HT Limit modifications to the percentage risk of C-box failures is shown in Figure 2.   The CSTP is a numeric count of how many components are coming through the shop each month. The AV is a numeric count of the CSTP at which point the shop expert would believe something is going wrong. The MCFO is a description of the leading problem with component failures based upon the shop expert's experience and opinion. In this case study the CSTP per month is equal to 1 with an AV of 2 or more. The MCFO was noted as Metal. Table 5 was assembled containing key results and values of all the analyses for this case. The observations can be summarized as below:

Result Comparison and Decision Support
The removal rates analysis using Pareto analysis and the TQM Survey MCFO results were noted as the same (i.e., both metal), indicating that statistical analysis and human experience within the company pointed towards the same leading component failure problem.
The removal rates analysis CFR result (equal to 0.019 failures per 100 flight hours) and the failure rate from the average life analysis (0.01637 failures per 100 flight hour) have similar figures based on of C-box failure data.
The removal rate analysis concluded that approximately half of all combining gearbox failures were a result of metal contamination. A vendor audit was recommended to detect any possible manufacturing problems for addressing with corrective action in an attempt to decrease the number of metal failures, thereby improving the C-box's reliability.
The MTBF Analysis concluded that the manufacturer's HT limit was set too low and supported raising the limit. The average life analysis also concluded that the manufacturer's HT limit was set too low and supported raising the limit.
The data distribution analysis presented a C-box failure percentage versus HT limit table so that managers can assess the risk involved with using various HT limits. About 55% of the time the component can last over the HT limit before failure.
A manager who is satisfied with the component's current reliability would choose to remain within the "borderline" range, specifically, HT limits ranging from 4,000 hours to 4,900 hours. If a manager feels that the current HT limit is unsatisfactory due to component performance and safety reasons, he may opt to reduce the HT limit and retreat into the conservative range using HT limits less than 4,000 hours. On the other hand, a manager who feels confident in the company's preventive-maintenance practices may want to take an aggressive approach to setting the HT limit, specifically, selecting values on the order of the MTBUR (i.e., 5,162.5 hours).  In our case study the management decided to extend the HT limit to 4500 hours. Should a manager opt not to modify the HT limit whatsoever, proactive methods are provided by this modeling procedure in the analyses actions sub-step, which in this case would specifically target metal failures through a vendor audit. Ultimately, key managers responsible for the company's directives, safe operations, and future goals, are left with decisions like these and more.

Conclusion
This research was conducted to utilize various reliability modeling methods to assimilate analysis results into a procedure that aids managers in the decision-making process with regard to component reliability, especially HT limit modifications. Once developed, a realworld case study was carried out. The soundness of the component reliability modeling procedure as demonstrated through the application of a real-world case study proved advantageous as originally anticipated.
In the case study, the result comparison table alerts managers to the leading failure problem of metal contamination for combining gearbox failures. The results inform managers that the current HT limit of 4,000 hours is nearly a conservative approach running the risk of 45% component failure, but its real world performance is of the borderline approach nature, running the risk of 52% component failure and performing as a 4,700 hour overhaul would be projected to perform. By performing the component reliability modeling procedure on only one component for the company in the case study, the value of the Weibull distribution for managers is confirmed as increasingly powerful.
The management decided to modify the HT limit from the current manufacturer's recommended 4,000 hour overhaul to a 4,500 hour overhaul. The new HT limit would remain comfortably in the borderline-approach of component failure risk and allow additional revenue-flight time with less frequent replacement costs. This is all accomplished while maintaining a healthy level of safety based on the statistical component failure risk only increasing 5% with the new HT limit. The results from the case study confirmed the advantage of using reliability modeling by providing component reliability improvement options through HT limit modifications and through targeting component weaknesses.