An integrated process model for root cause failure analysis based on reality charting, FMEA and DEMATEL

Article history: Received: September 11, 2018 Received in revised format: September 11, 2019 Accepted: December 12, 2019 Available online: December 12 2019 Root cause failure analysis (RCFA) is a structured process to acknowledge cause and effect relationship of failure or unfavorable events in the organization in order to prevent repetition or reduction of failures. There are various tools for RCFA like interviewing, fault tree analysis (FTA), 5 whys, failure mode and effects analysis (FMEA), Pareto analysis and storytelling method with different strengths and weaknesses. In this paper, an integrated process model is developed using Reality Charting, FMEA and DEMATEL to understand and implement RCFA effectively. The proposed process model has eight main steps. The presented model in a case study for the UTD20 engine is implemented and thus, in addition to the use of supporting research in model development, real data as well as the approval of the UTD-20 analysis team members are assisted to validate the proposed model. © 2020 by the authors; licensee Growing Science, Canada.


Introduction
Recognition of the roots of poor quality and failures is a key approach and an essential step for improving processes and, thus, analysis of the cause of failures is an important feature. Different mechanical, chemical, environmental or physical factors play a critical role in failures or defects (Mannan, 2013;Khan et al., 2015). One of the key challenges for organizations is to find the right way to prevent and control failures and prevent further damage (Hekmatpanah et al., 2011;Tahan et al., 2014). Based on the importance of preventing the recurrence of failures, root cause analysis of failures has been turned into a vital component of different industries such as aviation, nuclear, oil & gas, Heating, ventilation, and air conditioning (HVAC), power and communication, steel etc. Root cause failure analysis method (RCFA) is a method which emphasizes on recognition of root cause failures and controls similar failures from happening by discarding the causes recognized (York et al., 2014;US-DOE, 2010). The main objective of RCFA is to find and analyze all solutions which lead to product or process errors and also to identify and control the related risks. RCFAs can reduce costs by improving processes / products and improvements in the early stages of product / process development will require simpler and less costly changes (Mobley, 2002). In RCFA method the causes of the problem are identified, and then solutions are proposed to eliminate, change or control these causes in order to prevent recurrence of failures. The proposed solutions are evaluated and refined based on two criteria of effectiveness and applicability (Hussin et al., 2016). RCFA is a systematic method to solve problems on a step by step procedure to determine root cause of failures (Zavagnin, 2008). To carry out RCFA, it is necessary to pinpoint the essence and frequency of failure to decide whether we need to apply RCFA or not. The analysis is begun with gathering comprehensive information and after analysis of the gathered information, it is finished by presenting solutions to prevent failures (Mahto & Kumar, 2008).

Background of the study
A review of background studies shows that various studies have been conducted based on the RCFA method. Mahto and Kumar (2008) use root-cause identification methodology to eliminate the dimensional defects in cutting operation in CNC oxy flame cutting machine (Mahto & Kumar, 2008). Shrouti et al. (2013) propose an approach based on computer experimentation technique for root cause analysis of product failures by linking warranty failure modes (are represented by Key Failure Characteristics or KFCs) and the geometrical design parameters (are represented by Key Product Characteristics or KPCs) (Shrouti et al., 2013). Penros and Frost (2015) examine the failures of an electric machine. They mention RCFA as effective approach for failure analysis; meanwhile, repair versus replace decisions are discussed in this research (Penrose & Frost, 2015). Aurisicchio et al. (2016) provide a root cause analysis approach based on the Issue Based Information System (IBIS) and Function Analysis Diagram (FAD) (Aurisicchio et al., 2016). Hussin et al. (2016) investigate RCFA practices in oil and gas industry and propose areas for improvement. They conduct a survey among RCFA researchers from various aspects of RCFA including investigation team, data collection, process knowledge, tool competency, report, recommendation and RCFA system in organization (Hussin et al., 2016). Azis et al. (2017) perform root cause failure analysis (RCFA) and troubleshooting of the failure in power plant (Azis, Nurbanasari, Hermanto, & Kristyadi, 2017). Nugrohu et al. (2017) discuss about hydropower plant and preventing damages of hydropower generators by using root cause failure analysis (Nugroho et al., 2017). Peetersa et al. (2017) propose a method by combining FTA and FMEA for RCFA. In this method, first, FTA is performed, which results in a set of failure modes; then, using FMEA, is assessed in order to select the critical system level failure modes (Peeters et al., 2017).

Table 1
A review of some published works Research (Year) Approach-Technique(s) Case study Mahto and Kumar (2008) Eliminating  (Januardi, et al., 2018). Lokrantz et al. (2018) propose a machine learning framework using Bayesian networks to model the causal relationships between manufacturing stages using expert knowledge, and demonstrate the usefulness of the framework on two simulated manufacturing processes (Lokrantz et al., 2018). Jafarzadeh Ghoushchi et al. (2019) believe despite the high applications of FMEA, this method has some shortcomings that can lead to unrealistic results. According to this, they propose an approach of combining FMEA and Z-MOORA. Z-number theory (Z-MOORA) is used as a basis to prioritize the failures using the proposed multi-objective optimization by ratio analysis (Jafarzadeh Ghoushchi, Yousefi, & Khazaeili, 2019). Zhao et al. (2019) concentrate on the failure mode and root cause of drive shaft failure in a vehicle through checking the macroscopic and microscopic morphologies of the fracture surface, the chemical composition, metallographic analysis, and mechanical properties of the material, and finite element calculations of the drive shaft (Zhao et at., 2019). Table 1 Table 2 shows the comparison of RCFA methods and tools based on Gano's criteria (Gano, 2011).

Reality Charting
Reality Charting has a distinctive structure from other methods of RCFA. Reality Charting is the only method which shows a graphical representation of causes and their relationship along with evidences. Reality Charting method allows the experts to add their comments and help promote its breadth by depicting a clear understanding of the reality which prevents further failures. This is the only method which presents functioning and effective procedures to eliminate causes of root failures by correct understanding of cause and effect and creating a clear and common reality among experts while building a higher understanding comparing to other methods (Gano, 2011). Gano (2011) using the tool of Reality Charting, presented the Apollo root cause failure analysis. Apollo technique does not have the weaknesses of other methods but in cases where failures are numerous, it lacks an organized structure for investigating, recognizing and prioritizing the failures. It also would not analyze counter relations of failure roots to choose the most effective root cause in order to take preventive steps.

Failure modes and effects analysis (FMEA)
FMEA is an analysis to diagnose, reduce and eliminate potential failures in a system. The types of FMEA are Design (DFMEA), Process (PFMEA) and Service (SFMEA). FEMA is a systematic approach to evaluate and classify potential and actual risks in a product or process and ranking the risks in order to take corrective steps to eliminate risks with highest consequences and classifying again in a continuous improving cycle (Stamatis, 2003). Use of FMEA could be classified into three parts as follows:  Qualitative analysis: Based on recognizing all modes of failure, causes and effects  Quantitative Analysis: Based on evaluating Risk Priority Number (RPN)  Modifying Analysis: Based on interpretation of well-being strategies with an aim to reduce risk levels. In this method, after recognition and extraction of risks, RPN shall be calculated for each mode and effect by multiplying S×O×D as RPN, where S is severity (effect), O is occurrence (probability) and D is detection. These three factors have been ranked in the values of 1 to 10. Risk priority number is the basis of prioritizing failure modes. Considering the above factors, numbers from 1 -10 could be chosen. The RPN may have value from 1 -1000. High RPN numbers for a failure mode shows higher risk in system/ product confidence. In high RPNs, evaluation team has to take proper corrective steps to reduce the level. Ignoring RPN, attention must be paid to failures whose severity is high. To control risks, corrective steps for failure modes and ensuring that risks are reduced, RPN has to be calculated again (Liu et al., 2013).

DEMATEL Technique
This method could be applied to structuralize a series of assumed information so that the intensity of relations could be investigated and given points, seek feedbacks along with their importance and calculate non-transferable relations. The basis of DEMATEL is based on the hypothesis that a system comprises of a set of criteria. Pair relationships between these criteria could be turned into models through mathematical. DEMATEL technique was generally devised to investigate extremely complicated global problems (Wu, 2008;Tseng, 2009). The method of Decision Making Trial and Evaluation is based on the theory of Graph and comprehensive method for model making and analysis related to the complicated cause and effect relationship among the elements of a problem. Diagrams could depict the concept of intensive inter-relation of cause and effect in numerical form Wu (2008) and Tseng (2009).

Development of integrated process model
In the present study the above mentioned Apollo weak points have been modified by FMEA and DE-MATEL. Thus, the Reality Charting presented has been enriched.

Modeling and integration
In analyzing failures through Apollo method, numerousness of failures does not count and only one failure would be analyzed. Therefore, we could initially recognize the failures and then take steps to apply FMEA and extract RPN of failures before deploying Reality Charting by the enriched method of Reality Charting presented above. By applying FMEA the failures of less importance would be eliminated and the possibility of concentration on failures of higher priority would be provided. After using Reality Charting tool, cause and effect relations governing the appearance of failures would be recognized and roots extracted. In this stage, Apollo method could not discover the most effective root among failure roots. Therefore, using enriched Reality Charting, after recognizing root failures and applying DE-MATEL technique, failure interrelations (effectiveness and susceptibility) would be analyzed and the most effective failure root would be known. Fig. 1 depicts the proposed process model.
Step 1: Forming the analysis team Step 2: Identification and prioritization of failures using FMEA and problem definition Step 3: Creating a cause-and-effect network using "Reality Charting" method Step 4: Selection of the most important roots by DEMATEL technique Step 5: Identifying effective solutions/strategies for the most important roots Step 6: Implementing selected solutions Step 7: Controlling the results

Details of integrated process model (steps)
The details of the process model steps can be explained as follows: Step 1: Forming the analysis team An analysis team is formed to conduct the root cause analysis. The team members are selected from the people in various business processes (or departments) of the organization that experience the problem. The analysis team might be CFT or cross-functional team.
Step 2: Identification and prioritization of failures using FMEA and problem definition By using the FMEA, the failures if product are identified and prioritized based on the RPNs. The continuation of the analysis can focus on actual or potential failures with the highest RPNs.
Step 3: Creating a cause-and-effect network using "Reality Charting" method In this step, will be used the reality charting tool and the root causes of the failure will be identified. The Apollo method is a term coined by Gano (2011) to apply the Reality Charting method and is not a new and separate topic. Naturally, the members of the analysis team have diverse views and different insights regarding the failure and their causes. Taking advantage of the opinions of all these people is a difficult Step 1: Forming the analysis team Step2: Identification & prioritization of failures •FMEA Step 3: Creating a cause-andeffect network

•Reality Charting
Step 4: Selection of the most important roots •DEMATEL Step 5: Identifying effective solutions Step 6: Implementing selected solutions Step 7: Controlling the results which could not be beneficial without applying suitable methods. In Reality Charting, the cause and effect relationships are shown through a chart. With Reality Charting, it possible to observe all relations together and in a simple manner. This also prevent effect of influential people. Step

4: Selection of the most important roots by DEMATEL technique
After recognition of failure roots DEMATEL method has to be used to diagnose the cause and effect relationships and the extent of effectiveness and susceptibility of them. Roots with the highest points would be chosen to be later modified.
Step 5: Identifying effective solutions/strategies for the most important roots In this step, the analysis team must determine solutions to address important root causes. These solutions should be as quick and effective as possible. Criteria such as time, cost, feasibility, effectiveness, etc. can be helpful in choosing the best solutions.
Step 6: Implementing selected solutions In the following and in this step, selected solution must be implemented; for this purpose, the managers of organization can use responsibility assignment matrix (RAM). RAM is widely adopted in project management for human resource planning. Since the project team is temporary, RAM is used for the assignment of responsibilities to project team members; so, a RAM is an ideal tool for an incentive system in project management (Yang & Chen, 2009).
Step 7: Controlling the results Finally, the results of selected solution implementation must be controlled. Calculating RPN in this step and comparison this parameter in "before of" and "after of" step 6 can be an effectiveness approach aiming to control results. After controlling the results, the corrective actions may be necessitated to address nonconformities.

Verification and Validation of proposed process model
In order to verify the proposed model, it is observed that many steps of the process model are supported by research (See Table 3).

Table 3
The evidences for verification of proposed process model Step 2: Identification and prioritization of failures using FMEA and problem definition -----Gano (2011) (Gano, 2011) Step 3: Creating a cause-and-effect network using "Reality Charting" method Step 7: Controlling the results In order to enrich the model, to analyze the interactions between the roots and to find more important (more influential) roots, step 4 is proposed by the authors. Finally, other steps are integrated as the process model. The validity of the process model is tested by applying it to real data. For this purpose, in a car repair company, a proposed process model is used to analyze the UTD-20 engine failures. As a result, the steps of proposed model sequences, their sequences, as well as the results for UTD-20 engine failures, have been approved by members of the maintenance company analysis team. Table 3 shows the evidence of the validity of the process model by supporting research. The following article will discuss the case study of the UTD-20 engine and the application of a process model to its failure analysis.

Case study: RCFA of UTD-20 engine using the proposed process model
In order to validate the proposed process model, engine has been chosen. In 1966, the Russian company "Barnaul Transmash" designed and produced an engine named UTD-20 to fulfill its industrial needs in the field of heavy vehicles. This engine includes 6 cylinder and is V-shape engine with capacity of 15.8 liters, angle of 120 degrees and 330 horsepower.

Implementing Steps 1 and 2
Analysis team was chosen from experts of UTD-20 engine with academic education and minimum 15 years of related experience. Initial list of failures was prepared based on repair history and got analyzed by team members. At this stage, a list of failures was provided; then, in analysis team sessions and based on consensus, FMEA analysis was performed to determine most important failures. As a result, and based on experts' views, "mixing oil and fuel" with risk number (RPN) 360 was selected as most significant failure.

Implementing Step 3 (developing cause and effect network by reality charting)
With studying cause and effect analysis based on "reality charting" it became clear that the most important and documented cause for oil and fuel blend was "crude fuel burning in UTD-20" and consequently penetration of fuel to the cartel from around rings. Various problems lead to crude fuel burning, the most important of which are "insufficient oxygen/air" or "worn Farsonga". The most important reason for insufficient air is congestion of exhaust pipe. This problem leads to silicon level increase in the fuel and ends in premature wear and tear. Fig. 2 depicts the part of cause and effect network regarding "reality charting" method related to "mixing oil and fuel" failure. After further investigation, members of the analysis team, found 8 reasons as indicated in Table 4 as root cause failure for mix of fuel and oil in UTD-20 engine.

Performing Step 4 (analyzing interrelations of the root causes with DEMATEL technique)
In order to analyze interrelations between the root causes of selected failure, DEMATEL technique has been utilized, such that experts in the analysis team would be able to exert more mastery on expounding their views in relations between the root causes. Table 5 shows the matrices related to the calculations of the DEMATEL method (Wu, 2008;Tseng, 2009).         Table 9 Matrix of undirected relation  Addition of the element of each line of full relation matrix (R) for each root cause shows the extent of its effectiveness on other causes (extent of effectiveness of causes). On this basis the cause for shortage of Farsonga needle has been the most effective root cause. The exposition of tanks to dust in open air, Lack of instructions, System tests, Lack of washer, Penetration of oil into filter and cyclone, Insufficient awareness from engine rpm and Penetration of gasoil into filter and air cyclone were respectively in lower ranks. Total of the items of column (J) for each cause shows the extent of susceptibility of that particular root cause to other causes of the system.

Fig.3. Influential network relation map (INRM) related to the root causes of "mixing oil & fuel"
Based on this, Lack of Farsonga needle, was the most susceptible root cause and Penetration of gasoil into filter and air cyclone, Penetration of oil into filter and cyclone, System tests, Lack of washer, Lack of awareness from engine rpm, Exposition of tanks to dust in open air and Lack of instructions were of later ranks. Horizontal vector (R+J) is the extent of effectiveness and susceptibility of root cause in the system. In other word, the more cause is to (R+J), the more interaction with other causes of the system. Based on this reason, Lack of Farsonga needle has the highest interaction compared with other causes of the research study. Vertical vector (R-J) shows the power of effectiveness of each cause. In general, if (R-J) is positive, the variable is considered a causative variable, and if negative, it is an effect. On this basis the criteria of Lack of Farsonga needle, System tests, Penetration of oil into filter and cyclone and Penetration of gasoil into filter and air cyclone were considered effects while other root causes were counted as causes. At last, influential network relation map (INRM) (Wu, 2008;Tseng, 2009) was drafted (Fig. 3). Longitudinal axis shows R+J while transverse axis shows R-J. The situation of each root cause is indicated by (R+J, R-J).

Implementing Steps 5 to 7
After recognition of the most important (most influential) root cause, "lack of Farsonga needle", related solutions have to be determined and applied to neutralize or reduce its effects. The objective of developing the solutions is to move from unfavorable condition to the favorable condition. Generally, these strategies/solutions could be classified into three categories (IAEA, 2015;Gano, 2011): issue. The members of the analysis team understood that Farsonga defect could be recognized from the sound of the engine; so, the needles need to be replaced soon. This strategy can prevent bigger negative consequences. Therefore, it was suggested that:  Needle re-order point (ROP) is improved,  For each customer, we may allocate a few more needles as "spare" for preventive maintenance.
After finding and implementing effective strategies/solutions, the frequency of the failures may decrease. Also, effective strategies/solutions can reduce the number of risks (RPNs) in future reviews.

Conclusions
In this article, a process model has been proposed based on FMEA, Reality Charting (The Apollo method) and DEMATEL. The Apollo method is a term coined by Gano to apply the Reality Charting method and is not a new and separate topic. The proposed model has been implemented for UTD-20 in order to validate and apply it. This model may prevent unfavorable accidents and repetitive failures which are important advantages which and could not be ignored compared with other methods. Some advantages are as follows:  Initial Screening of failures using FMEA to reduce analysis time,  Recognizing the most effective cause using DEMATEL method,  Applying Reality Charting,  Higher power in preventing unfavorable accidents,  Capability of applicable training to operators and staffs of organizations.
In recent years, preventative actions have become more valuable; consequently, development of such methods and skills are critical and worthwhile in organizations.