Dynamic reliability assessment of a complex recovery system using fault tree, fuzzy inference and discrete event simulation

A. Nobakhti st_a_nobakhti@azad.ac.ir, S. Raissi Raissi@azad.ac.ir, K. K. Damghani k_khalili@azad.ac.ir, R. Soltani r.soltani@khatam.ac.ir Any failure on the recovery system will cause a lot of environmental damage as well as energy loss. Hereof two types of alternatives; fast opening valve system (FOVS) and seal drum system (SDS) may be installed. The focus of this article is on the decision stage to choose the most preferred option in terms of reliability assessment. The major challenge in the research problem is on changing the pressure and temperature during operational cycles, which significantly affect the reliability. In addition, the lack of historical data complicates the reliability assessment method. Hence, we proposed a hybrid approach using fault tree analysis (FTA) and the Mamdani fuzzy inference to estimate reliability response as a function of a few frequently operating pressure and temperature. Also, discrete-event simulation helped us to evaluate the system reliability at different operating conditions. The comparisons reveals that the FOVs outperforms on average of 22.4% than the SDS and it is recommended for putting into practice for purchasing. Highlights Abstract

FTA helped to estimate reliability fitness function • for each alternative.
The Mamdani fuzzy inference handled multi-at-• tribute failure risks using a questionnaire.

The FOVs acts better at dynamic conditions in •
terms if discrete-event simulation.
Dynamic reliability assessment of a complex recovery system using fault tree, fuzzy inference and discrete event simulation

Introduction
Today, saving energy and preventing environmental pollution caused by burnt fossil fuels are two important issues in the refinery equipment selection process. It is no longer time for flare gases to be burned in refineries and, in addition to wasting a good energy source, to injury the environment.
In almost all societies, tackling the significant environmental damage caused by fossil fuel emissions is on the agenda of senior executives. They must use all means in strategic decision-making to reduce environmental losses to save the future. Countries participating in the Paris climate agreement, developed under the United Nations Framework Convention on Climate Change (UNFCCC), are bound to take actions to reduce their greenhouse gas emissions to meet a nationally determined contribution (NDC). According to [8], flaring reduction plays a major part in facilitating the reduction of emissions and reaching the targeted NDC. To this end, refinery industries are setting aside portions of their budgets to expand refineries lacking flare gas recovery units (FGRUs) to recover a large portion of flare gas, making it available for energy production etc.
It is vital that FGRU has a continuous operation as it prevents extra emission to the atmosphere and returns large benefits, so a proper safety sub-system is required to prevent failures caused by the outof-range pressure and temperature of the gas. Therefore, decisionmakers are facing the daunting task of choosing among the proposed plans for the FGRU safety structures. The chosen alternative must be fully justified in terms of resilience against the volatile operating conditions, thus the need to make a comprehensive prediction of system reliability.
The selected alternative will be operating for more than two decades and making the wrong choice can lead to huge financial losses or a large amount of pollution because of more failures. Since the decision-making is performed in the pre-installation and purchasing stage, failure data are not available. Besides that, precision in reliability prediction needs the consideration of alterations in reliability value caused by operational conditions and contributing factors. The generated reliability values must be responsive towards more than only time which is the output of the traditional reliability methods. Demonstration of the changes of reliability versus certain contributing factors requires proper initial data able to describe such changes and a proper technique to process such data.
Expert elicitation is widely used in papers to compensate for the void of unavailable information when prediction is needed and has the flexibility to provide a researcher with the desired type of data. The failure probability of components can be assessed in different circumstances using the opinion of a group of experts. Changes in failure probability versus contributing factors can help calculate the relevant changes of safety system reliability that is the main requirement for a prognostic study. Quantification techniques generate probabilities from linguistic possibilities so that calculations are performed. Data are gathered for the certain key components whose failure will cause the failure of the system and to identify those components a functional and physical breakdown is required.
Fault tree analysis (FTA) can demonstrate the functional and physical breakdowns of the system and can be inserted with different types of data. This provides the possibility of branching and dividing failure causes and detecting the groups of components whose failures cause a major system failure. FTA uses logical gates such as "AND" and "OR" to describe the effects of components' failure on system breakdown using the Boolean algebra and has wildly been used for safety, risk, and reliability analysis. However, as for most techniques, FTA has its own limitations. For instance, FTA's routine calculation methods can't describe the changes of the output versus changes of the basic event values. Time is one of the factors able to change the basic event values and in certain cases, the similarities of the output values in a specified period of time would lead to choosing the wrong alternative. Yet, time isn't the only factor that influences FTA's output. For systems operating under volatile conditions, stress factors also affect FTA's output but their influence can't be modeled.
FTA receives numerical data, so in order to quantify the linguistic data gathered from expert elicitation, the fuzzy inference system (FIS) is a powerful technique to be used for quantification. This helps describe changes in basic event probabilities with the changes of stress factor levels. So, a fusion between FIS and FTA leads to having a responsive FTA output. With this being said, neither a lot of reported research exists on the development of a technique for FTA enrichment nor has there been any work on the combination of FIS and FTA.
The main goal of this paper is to develop a practical technique to compute the changing output of the FTA, which is reliability in this paper, versus contributing factors so that an accurate prediction of performance is made. Performance predictions help in making a justified choice between the proposed alternatives in the absence of historic data. The proposed fuzzy interface combinatorial usage of FTA and FIS (FIFTA) is an innovative approach proposed here to tackle reliability estimation in the absence of historical data. Also, for more accurate calculation, a combination of FTA and discrete-event simulation is used so that with the help of this alternative method, the reliability of the two systems can be calculated.
These two methods can also graphically illustrate reliability response surface as a function of the relevant covariates to provide insights especially for decision makings in the purchasing stage. It provides an applicable method for a facile computational prediction of future performances that aims to replace the usage of failure rates by a combination of instructed expert elicitation and fuzzy inference system and discrete event simulation.
In section 2, there is a review of papers with similar cases. Section 3 provides information about the proposed alternatives for the safety sub-systems. In section 4, the challenges of calculating valid reliability for the mentioned alternatives are discussed. The proposed method for this is found in sections 5 and 6. Two methods are implemented on the alternatives, and the results are presented in section 7 where further discussion is also made for more clarification. A conclusion is made in section 8.

Literature review
Reliability engineering is a major sub-discipline for systems engineers to assess the probability of surviving a system over time. This method focuses on lifetime evaluation under stated conditions for a specified period of time. Various researchers have tried to provide ef-ficient methods for estimating system reliability based on empirical data. Some of them proposed aggregate method of selecting a theoretical distribution for empirical data [19]. They applied three criteria for assessing the quality of the goodness of fit.
If the operating conditions change, then the reliability analysis will be a difficult task, which is a matter of dynamic reliability. In dynamic reliability analysis, a set of the mathematical framework is presented which has the capability of handling interactions among components and process variables. In principle, they constitute a more realistic modeling of systems for the purposes of reliability, risk, and safety analysis. Dynamic reliability requires more sophisticated tools than non-dynamic reliability. Dynamic reliability needs to apply a more complicated mathematical methods approach takes into that account changes or evolution of the system structure.
Changes in process parameters may be random or deterministic. Indeed, reliability modeling of the former is far more difficult than the latter and is often accomplished by computer simulation techniques. Interested readers could refer to [11] for deterministic changes, [4] for stochastic changes and [18] for ranking defects.
Ambiguity and vagueness are issues that are caused by the unknown characteristics of the complex systems or insufficiency of historical failure data that leads to making rough estimations, hence increased error in the final results. Therefore, to minimize this error, fuzzy logic may be a proper alternative [16]. A combination of FTA and fuzzy logic would create the new (FFTA) technique that has wildly been studied in recent years where expert elicitation is used to obtain the linguistic values as possibilities which are then transformed into quantitative probabilistic values for basic events of the fault tree. [15] Employed a combination of fuzzy logic and expert elicitation to deal with vagueness and subjectivity of the information and generated basic event failure probabilities without reliance on quantitative historical failure data and performed a sensitivity analysis using importance measuring. Yazdani et al. [21] used fault tree qualitative analysis technique to identify various potential causes of crude oil tank fire and explosion (COTFE) and used a hybrid approach of fuzzy set theory to quantify the COTFE fault tree; the results were compared with that of a conventional fault tree. Weak links were identified using importance measuring of basic events. [14] proposed a fuzzy-based reliability approach to deal with qualitative linguistic terms to evaluate the failure likelihood of basic events of nuclear power plant safety system; and validated the results by a benchmarking the generated failure probability to the actual failure probabilities collected from the operating experiences of the David-Besse design of the Babcock and Wilcox reactor protection system.
Certain papers went further and tried to improve the elicitations and didn't stop on a sole reliance on raw opinions. Baig et al. [3] used corrosion simulation software and provided the experts with the obtained results to improve the elicitations. They gathered information to estimate the failure probability of CO2 transporting the pipeline using FTA. Attention to computer simulation in estimating reliability has been considered by various researchers in recent years. The reason for this is the existence of different random variables and the complexity of systems analysis by analytical methods. For example, we can refer to [1,11], and [13], whose methods have been cited by many researchers.
In order to deal with the uncertainty in linguistic data, researchers have often recommended the use of fuzzy methods. A two-dimensional fuzzy fault tree analysis to incorporate hesitation factor for expert elicitation where linguistic terms were expressed with a degree of hesitation introduced by [20]. Through applying such a technique, the probability of chlorine release was estimated for Indian conditions.
In cases where historic data is insufficient but a failure rate may be obtained from a given static failure distribution that could satisfy the desirable accuracy, it is possible to obtain information from data banks like OREDA. Elsayed [6] performed a four-step procedure to estimate reliability with failure and repair data from OREDA and calculated availability and maintainability as well. Zhang et al. [22] graded a floating offshore wind turbine (FOWT) system structurally and functionally, thereby assessing the sequentially dependent failures and redundancy failures using a dynamic fault tree. Reliability estimation was based on failure data achieved from OREDA. In order to nominate a diagnostic method and measuring the total predictive performance score, an integrated fuzzy DEMATEL-fuzzy analytic network approach presented in [12].
In the case study cited in the current research, none of the aforementioned alternatives for the recovery of flare gases were practically available, and empirical data on their performance were not available, so we had to use the experiences of technical experts in similar matters. This made the data collected via linguistic variables and we needed to use the appropriate tools for quantification to perform the calculations. Therefore, as a new initiative, a combination of the Mamdani Inference System; FIS and The Fault Tree Analysis; FTA methods has been used to investigate the various failure modes under different operating conditions. However, it has been used in several cases for approximation and estimation with different purposes. Azadeh et al. [2] used FIS as a means of approximation for human reasoning to provide knowledge for correct and timely diagnosis of pump failures. Choi et al. [5] used FIS in combination with relative risk score (RRS) as a new approach for liquid and gas pipeline risk assessment and proved that the new method provides more accurate results in comparison with the conventional method. Elvidge et a. [7] used Mamdani and Sugeno FIS as an alternative approach to qualitative risk matrix to handle multiple attribute risk problems with imprecise data. He found out that while Mamdani method is intuitive and well suited for human inputs, the Sugeno method is computationally more efficient and guarantees the continuity of the final risk output surface.
Nematkhah et al. [9] investigated some methodologies to how to decrease energy consumption and reduce the environmental pollution of flare systems. In this study, three different scenarios evaluated by the use of an environmental flow diagram in a gas refinery in southern Iran. The results showed that pressurizing gas and injecting it into oil wells is one of the best ways to reduce flames in the Feller gas system. [22] studied three different system configurations on flare gas recovery to evaluate the efficient system. In this study, systems with liquid ring compressors and aqueous amine solvents for the abatement of acid gases are used in a refinery complex. The results show that amine consumption in some configurations is much lower than in others.
Recently, two designs of flare gas recovery systems were designed and reliability was chosen as the deciding factor for comparing two systems. First, failure models of the two designs have been implemented. Second, a stochastic hybrid method is used to evaluate the probability of disaster in these failures [8].

System description
Two alternatives are proposed as FGRU safety sub-systems to keep it intact against out-of-range characteristics of passing gas. These alternatives have many similar but their main difference is in the pre-flaring section which can either be a fast-opening valve system (FOVS) and seal drum system (SDS). The relevant diagrams are depicted in Figure 1 as (a) and (b) respectively.
There are various incidents that can lead to damage or FGRU breakdown. A dangerous scenario may occur when out of control gas pressure or gas temperature happens. Three hazardous scenarios are discussed in section 5.3. The purpose of installing a safety system is to block the routs leading to FGRU to keep it intact and to open more capacity to the flaring tower to prevent piping ruptures.
There are pre-defined responses towards each scenario in each safety system that is initiated when dangerous temperature or pressure is detected by sensors and proper messages are sent to the valve actuators. The actuators receive the signals from sensors and open or close a valve's body; thereby directing the gas with dangerous temperature or pressure level to the flaring tower. If the safety subsystem, fails to respond towards a dangerous scenario, not only risky occur to FGRU but the safety subsystem itself is likely to get damaged.
A general view of the FGRU depicted in Figure 2. Hence gas enters from the flare header to the safety system and is directed in a proper volume to the compressor to get prepared for recovery. The route leading to the recovery section is called the 'vertical route'. The extra gas or gas with dangerous characteristics will be transferred through the 'horizontal route' to be burnt in the flaring tower. The components' names and symbols are provided in Table 1. There are 3 main components in a vertical route that prevent the entrance of gas with dangerous characteristics to the compressor which are a rotary valve (RV1), a control valve (CV1), RV1's task is to close with sensors' message and CV1 must close when a difference of pressure is detected between system entrance and the entrance to seal drum (SD). The horizontal route leads to the flaring tower before which (in the area depicted with a dashed line) the safety system (FOVS or SDS) must be installed here to react to the signals sent from sensors. This part of the safety system opens more capacity to the pipes, so the extra gas is emitted without causing any damage or helps direct some extra gas to the flaring tower to prevent flashbacks. Flashback is the result of very low pressure in the horizontal route that will reverse the direction of the gas and damage pipes and components. Of the two safety systems, SDS is a collection of SD and the accompanying valves which are two rotary valves (RV2 & 3) and two pressure safety valves (PSV 1& 2). SD in SDS, contains a proper level of water to keeps gas flow in a single direction (from inlet to the outlet) which is helpful in preventing flashback making it quite useful for implementation in the pre-flaring section. SD also prevents gas outlet until the pressure reaches a desired, and often predetermined pressure.
On the other hand, FOVS remains a collection of valves that respond to different scenarios by a harmonious function of sensors to make a safe passage for gas in a fashion that damages are prevented to the piping systems or to the FGRU. It comprises of a control valve (CV2) and a reserve line for when CV2 is being repaired, a pin valve (PV) and a fast-opening valve (FOV). When pressure increases in the horizontal route, valves in this structure will unlock one by one to provide more capacity for gas to be released into the flaring tower.

Problem statement
The valuable components and repair costs of FGRU raises the imperative of the fully justified selection of a safety subsystem, resilient against the volatile operating conditions, to protect the FGRU against gas with dangerous characteristics. The resulting reduced damages to FGRU, apart from expenses, helps to minimize the emitted gas to the atmosphere, facilitating meeting NDC.
Of the two suggested alternatives for the safety subsystem are FOVS and SDS. The one with higher reliability and consequently fewer failures should be chosen to decrease FGRU damages. FOVS or SDS will be the pre-flaring section of the safety subsystem whose components interact with other components of the other sections so, making an isolated reliability assessment of them without considering their interrelations wouldn't be valid. So, to compare them in term of reliability, the performance of the whole safety system must be assessed when either of them installed.
The traditional reliability methods only considered the dependency on time which overlooked the environmental factors. Using such results leads to having to tackle unpredicted failures in such a volatile environment and the objective is to obtain the reliability of the subsystem when it is exposed to different operating conditions and different scenarios.
Generating reliability values versus the three contributing factors of the studied case (time, pressure, and temperature), requires a specific type of data able to associate an operating condition to a failure probability value. In other words, a function is required with a domain that consists of a space made up of three axes of time, pressure and the temperature limited to their boundaries (i.e. maximum, and minimum levels of contributing factors). The codomain is a value between (0, 1) that describes a failure probability. In other words, a type of failure data is to be provided for each component that describes its endurance under a certain operating condition. Obtaining such data isn't possible through measurement because the alternatives haven't been installed yet, and there is no such data in the data-banks.
When experiencing the need to making calculations for a system in its pre-installation stage, the available type of data are failure rates gathered with the assumption of a stable failure distribution from other similar systems. Reliability calculations based on failure rates show only reliability changes versus time and the assumption of a stable failure distribution neglects the effects of the stress factors. It is professionally recognized that the failure distributions' scale changes with the presence of a stress factor whose level is higher than that the operating condition. This alters the area under the distribution function and consequently changes the reliability values.
Apart from the need to gather a type of data that can describe the simultaneous presence of the contributing factors, a technique is required to process the data so that it is available to be used in the fault tree. It is intended to generate a response surface for reliability to study its changes versus contributing factors. Data is gathered using a designed questionnaire and the utilized technique is FIS, both of which are explained in the next section.

The proposed method to estimate system reliability surface
In order to estimate the recovery unit reliability as a function of operating condition, dynamic fault tree analysis (FTA) fixed as the main core of the estimation. Due to lack of historical data, expert judgment is used on the failure likelihood of each component at different operating circumstances. Then Mamdani fuzzy inference method is applied to quantify the linguistic data and to generate different points to draw the response surface for each alternative in a four-dimension space. A general overview of the proposed method is depicted in Figure 3.
Due to a lack of historical data in purchasing stage, we prepared a verified reliable questionnaire ( Table 2) to analyze each alternative component's breakdown likelihood over different process conditions based on the experts' opinions. Here temperature and presser are deduced as the main contributive factors on the components' failure. To overcome the ambiguity, arouse from linguistic terms we converted all despondence via normalized fuzzy sets.
The gathered data presented component failure possibilities in association with temperature and pressure levels. Any data point reveals an expected prior possibility of a component lifetime at a given temperature and pressure using a triplet of (time, pressure, temperature). The purpose is to quantification that possibility so that a component failure probability response surface is drawn. The surface will associate each component breakdown probability with an operating condition. Converting possibility into probability requires quantification

System failures
In order to be able to draw FT, an explicit definition of failure is required. For that, the structural and functional breakdown of the system should be examined. The structural breakdown of the system indicates that the critical components of the FGRU are: pipes, sensors, valves, and compressors. The functional breakdown of the system indicates that gas is directed by pipes into compressor or flaring section, sensors detect temporal characteristics of gas and send signals to valves when the gas with outof-range pressure or temperature enters, valves change the route of gas and open more exit capacity so that compressors or pipes are not damaged. Compressors alter the characteristics of the gas so that it is ready to be recovered.
System failure occurs if the gas route isn't altered because of valve failures or gas isn't directed toward to the compressor because of piping damages. Valves fail under the effect of changing pressure and temperature that accelerate valve body degradation. If valves fail, pipes and compressors are exposed to the danger of getting damaged by a hazardous scenario (5.3). Therefore, a failure definition can be presented as follows: Valves degrade gradually to the point of not being able to func-1.
tion in demand. A hazardous scenario occurs i.e. gas with an out of the standard 2.
level of temperature or pressure enters. Automated system fails to respond i.e. gas with out-of-range 3.
pressure or temperature isn't directed appropriately because of valve failures. Gas causes damages to the compressor or critical pipes, and 4.
the system fails.
This definition helps us divide basic events of FT and form the branches. These four segments occur respectively but FTA logical gates can't enforce the order of occurrence. DFTA gates can't be used because failure rates are required for solution and they are unapt for this study as there is the need to assess multi-dimensional data; so, inhibit gate is inevitably used to describe the relation between them. The second segment of failure definition is not a failure but an event, but it is presented in the model and its probability is considered the percentage of time that it happens (each percentage is presented in 5.3).
Valves' failure is caused by the changing pressure and temperature so failure data is gathered using the questionnaire in Table 2. These data will be the basis of FIFTA study where we insert different numerical levels for pressures, temperatures and times into FIFTA to study the changes of failure probabilities of valves and the whole system. But pipe and compressor failure probabilities are obtained using a different questionnaire where experts are only asked to specify the failure possibility of the components under one of the three hazardous scenarios. This is due to the fact that the cause of their failure is the occurrence of a hazardous scenario when there is no proper response. Defuzzification of these possibilities is performed using the method described in [10] for each scenario. The obtained probabilities are considered as a constant in FTA formula and the basic events describing their damages are not a part of FIFTA process.
It should be stated that the independent failures of pipes and compressor (i.e. failures caused by initial defects, by degradation, by faulty design, etc.) are not considered here and it is assumed that they will remain intact in normal conditions during the predicted lifetime because of the sufficient protective measures and high-quality materials. Also, sensor failures aren't taken into account since changes in temperature or pressure have such a small effect on them that it can be neglected and since they are of high-quality materials, their independent failures are omitted from calculations.

Constructing dynamic fault tree analysis
Fault Tree; FT is constructed for both systems according to the above-mentioned failure definition. The first levels of this diagram are presented in Figure 4-a for FOVS and in Figure 4-b for SDS.

Most common hazardous scenarios
In order to examine fault tree in dynamic circumstances, three more probable extreme operational conditions examined in this research, they called hereinafter as: Scenario a: Examining the failure of the system at high pressure operating conditions with a chance of 33% according to the historical data. Scenario b: Examining at Low pressure) with a probability of 29% in occurrence. Scenario c: Examining at low temperature (22% occurrence).
Since there are two alternatives of FOVS and SDS for comparison at the above-mentioned three scenarios, six fault tree diagrams should be constructed. Figure 5 illustrates one of them as a sample. Interested readers can receive other diagrams by their request to authors.

Questionnaire cell formation
As mentioned earlier, data should be gathered with a properly designed questionnaire. In the designed questionnaire, experts are asked to express their opinion about the failure possibility of a component that ensures a certain operating condition created by contributing factors. For example, condition 1 is when a component is in its early age period, and endures a low pressure, and a low temperature, first cell of the questionnaire, and the expert provides a linguistic value in that cell using a fuzzy label like 'low' to describe failure possibility of the component in that condition. This linguistic data contains 3 input variables (i.e. time, pressure and temperature) and 1 output variable (failure possibility) giving it multiple dimensions. The purpose of gathering data in this manner is to study failures in each operating condition so that the whole system can be studied under each condition. As a result, the questionnaire should be designed in a manner that every cell represents an operating condition. In each cell, the expert describes the failure likelihood of the component in that condition. Table 2 shows the design questionnaire.

The Mamdani fuzzy inference system
Failure possibility examined based on information gathered from qualified experts. Their judgments requested different operation conditions using lingual terms, which modeled by fuzzy numbers. Some researchers have used the method of fuzzy inference in the oil and gas and petrochemical industries for risk analysis [6]. Hence Mamdani FIS is applied to create a control system by synthesizing a set of linguistic control rules obtained from experienced human operators. In a Mamdani system, the output of each rule is described by a fuzzy set. Since Mamdani systems have more intuitive and easier-to-understand rule bases, they are well-suited to expert system applications where the rules are created from human expert knowledge, such as medical diagnostics. This technique generates a numerical value i.e., failure probability. E.g. it is required to know the failure probability of RV that has operated for 2 years when it endures a pressure of 50 bars and a temperature of 0℃. Each cell in the questionnaire describes this operating condition to a degree between (0, 1) i.e. membership function. The opinions for each operating condition are aggregated based on the membership functions of each cell to generate a failure probability. The generated probability by FIS suggests for the above example, that there is a 0.02 chance of failure for RV in that condition. Since there are 9 types of valves (Table 1), and opinions vary about their failure likelihood, a FIS should be developed for each of them.   If (Time is low) and (Temperature is medium) and (Pressure is low) then (possibility is 3) (2) 2.

If (Time is low) and (Temperature is high) and (Pressure is low) then (possibility is 3) (3) 3.
If (Time is low) and (Temperature is low) and (Pressure is medium) then (possibility is 10) (4) 4.
If (Time is low) and (Temperature is medium) and (Pressure is medium) then (possibility is 3) (5) 5.
If (Time is low) and (Temperature is high) and (Pressure is medium) then (possibility is 3) (6) 6.
If (Time is low) and (Temperature is low) and (Pressure is high) then (possibility is 17) (7) 7.
If (Time is low) and (Temperature is medium) and (Pressure is high) then (possibility is 9) (8)
If (Time is medium) and (Temperature is low) and (Pressure is low) then (possibility is 13) (10)

10.
If (Time is medium) and (Temperature is medium) and (Pressure is low) then (possibility is 11) (11)

11.
If (Time is medium) and (Temperature is high) and (Pressure is low) then (possibility is 11) (12)

12.
If (Time is medium) and (Temperature is low) and (Pressure is medium) then (possibility is 11) (13)

13.
If (Time is medium) and (Temperature is medium) and (Pressure is medium) then (possibility is 11) (14)

If (Time is medium) and (Temperature is low) and (Pressure is high) then (possibility is 22) (16) 16.
If (Time is medium) and (Temperature is medium) and (Pressure is high) then (possibility is 17) (17)

19.
If (Time is high) and (Temperature is medium) and (Pressure is low) then (possibility is 11) (20)

20.
If (Time is high) and (Temperature is high) and (Pressure is low) then (possibility is 11) (21)

21.
If (Time is high) and (Temperature is low) and (Pressure is medium) then (possibility is 18) (22)
If (Time is high) and (Temperature is high) and (Pressure is medium) then (possibility is 11) (24) 24.
If (Time is high) and (Temperature is low) and (Pressure is high) then (possibility is 24) (25) 25.
If (Time is high) and (Temperature is medium) and (Pressure is high) then (possibility is 24) (26) 26.
If (Time is high) and (Temperature is high) and (Pressure is high) then (possibility is 23) (27) 27.
The FIS has 5 functional blocks to measure data with multiple input and output variables. Of these 5 blocks, database block and rule base block store predetermined data and if these blocks are formed, the others will perform the quantification. The formation of these two is performed as follows.
When the input and output variables are identified, their range of variation is specified. Then the desired number of fuzzy labels (3, 5, 10, etc.) divide the variable's variation range. Here, 3 labels namely high, medium and low are used for each input variable (whose combination builds up questionnaire cells) and 25 labels for the output variable (labels of the opinions). Fuzzy membership functions are defined for each fuzzy set which consists of the shape of each function (e.g. triangular) and its boundary. These data are stored in a database block whose formation is depicted at the top-right hand side of Figure 3. Figure 6 contains the shapes and boundaries of the membership functions for each variable which is obtained through consultation with concerning engineering teams. The provided opinions by the experts that are fuzzy labels of possibility are FIS rules which are stored in the rule base block whose formation is depicted at the bottom right-hand side of Figure3. The two initial steps for rule base block formation are discussed in 5.4 and 5.6. 5.6. Questionnaire partitioning FIS development wasn't possible if all the level combinations of the input variables didn't exist in the questionnaire and this caused too many cells which make human comparison quite inaccurate. In order to decrease the number of the comparisons and also to provide a guideline for the experts to help increase the accuracy, a zoning system is used based on how much stress a combination of contributing factors' levels (one of the questionnaire cells) creates for a component. A component is more likely to fail in a condition with a higher degree of stress. Thereby 5 stress levels were specified to create 5 regions (stress row in Table 2) for comparison instead of 27-factor level combinations (i.e. 3*3*3). Experts were to fill out these regions by a set of fuzzy labels that was suggested for each stress region but they were free to choose other values for different combinations in the same stress region if they saw fit. Table 3 shows each stress level with its proper set of fuzzy labels. Table 4 shows membership function boundaries for the output variable.
Data was gathered from a group of 12 engineers with relative knowledge and enough experience from departments of management, maintenance, and design. Since opinions vary, aggregation is needed so that a single value is produced for each cell. To this end, a weighing factor was calculated for each engineer according to a weighing system in Table 5 so that a weighted mean could be calculated for the opinions. Table 6 shows the computed weighting factors for each engineer. The results of the weighted mean for the RV1 are presented in Table 2 as an example from which the rules for this component were extracted and were written below the Table.

Fuzzy Inference Fault Tree Analysis; FIFTA
At this point, FT is drawn, opinions are gathered to form the rule base block and membership functions are stored in the database block; The desired combination of variable scalar ranges (e.g. 7th year in a pressure of 150 bars and a temperature of 50℃) is selected as an op-erating condition (i.e. combination (i)). This combination is inserted as an input for each FIS and a probability value is generated for each basic event. Basic event probability values are inserted into the FTA formula and the probability of the top-event is calculated (i.e. probability (i)). Then a new combination is selected to generate a new topevent probability and so on; until enough points are generated for the response surface to be drawn. The result is a n-dimensional surface (n-1: the number of contributing factors) whose vertical axis show FT output or cumulative failure probability (CFP) of the proposed alternative. The horizontal axes show the contributing factors. This process is presented in the middle of Figure 3. Figure 7 illustrates a more detailed description of the process of generating one output point.
The response on a 3-D surface is drawn for both systems, presented in Figure 8. Time is separated as the 4th dimension. The influence of time and the two other contributing factors can be seen simultaneously which is the unique trait of this technique.

Discrete-event simulation
In this section, the FTA method will be used again as the core of the method, and in addition, discrete event simulation will be used to evaluate the system reliability.
One of the applications of discrete event simulation is in assembly and production systems and the use of this tool develops this capability for managers and engineers to gain a broad understanding of their system and can evaluate the effect of a small change in the whole system. And thus be able to calculate the reliability of the system. For example, suppose that by making a change in a station in the system, we have caused changes in the performance of that station. These changes may be predictable because the system under study is extremely small and its relationship with other components has not been studied. But answering the question of what effect the changes made in this station will have on the efficiency and reliability of the whole system and on other stations is a question that is very difficult to answer without using simulation tools. In many cases it is impossible. In this regard, in this section, a discrete-event simulation is implemented to evaluate and compare the reliability of two common flare systems.

Gathering input data
The input data actually provides the driving force for the simulation model. The steps that need to be taken to create an efficient model for the input data are: data collection, a)   parameters of the selected probability distribution, evaluating the selected distribution and its related parameters c) for the goodness of fit.
In order to collect data, the following methods were used: observing the system and collect sufficient samples of each proc-• ess, interviews with related experts, • imaging, video recording, and recording of system processes, • collecting raw data from software available at the refinery. • After collecting the required data, random variables were modeled using a candidate probability distribution. Hence any statistical package may be applied. Table 7 prepared a list of the best probability fitting function as well their relevant estimated parameters.

Simulation model
After collecting all the necessary information from each of the flare gas recovery systems, and fitting the appropriate distributions for the data, a computer simulation of the systems was performed. In this research, the Arena software has been used for simulation. Arena is an application software for simulating discrete event systems. Arena is complete software for simulation studies and supports all steps of a simulation study. Arena provides templates that make it easy to create the right animation for simulation issues. Templates are a group of modules that contain entities, processes, and special language for a specific type of problem. Arena has an input analyzer and an output analyzer. The user can view the raw data input using the analyzer. The output analyzer is also for viewing and analyzing simulation data.
The settings of the simulation model components are mentioned as below.
A) Observation Period: Since the work schedule of the flare gas system is usually determined at the beginning of each month, the observation period of each simulation sub run is considered to be 30 working days.

B) Number of replications:
In order to achieve acceptable results and reduce the length of the confidence interval of system performance criteria, it is necessary to run a simulation model for a significant number of replications. The number of replications of the simulation is determined according to the half-width of the system performance criteria. The most important performance measure for this purpose is the average system reliability. Our experiments showed that if we consider the number of replications of the simulation as 90, the half-width of the above performance criteria has reached an acceptable level and is about 1 to 3% of the average. In order for the simulation model to reach a steady-state and the output of the model to be calculated in a steady state, a warm-up time is mainly considered for the system. This time period only plays the role of warming up and stabilizing the system performance criteria and has no role in the final calculations.
In order to calculate the system warm-up time period, the behavior of some system performance criteria has been examined and the time it takes for them to reach a steady-state has been considered as the system warm-up time period. Figure 9 shows the trend chart of the average system reliability in three different replications. As can be seen from Figure 9, in all replications after a period of 4 days, the reliability of the system has reached a stable state. Therefore, the warm-up period of the system is 4 days. D) Verification of the Simulation Model: One of the basic steps after creating a simulation model is to check the verification of the model. In this section, it should be checked whether the structure of the simulation model is based on the conceptual model and its hypotheses. There are different methods to check the verification of the model. In this study, the following steps were performed to verify the model: Checking software sub-models and debugging software  codes.
A more detailed review of the model by other experts.  Checking model outputs for different inputs.  Checking the model step by step and compare the output of  mode variables with manual calculations. Preparation of two-dimensional and three-dimensional anima- tion of the model to understand and correct mistakes. E) Validation of the Simulation Model: Validation is the study of whether the conceptual model and the specific model created accurately represent the system under study. Since simulation is an estimate of the real world, it should be noted that it is not possible to validate 100% of the model with the real system. In this research, the three-step method proposed by Naylor and Finger has been used: Step 1: To develop a model with high frequent validity The purpose of the first stage is to create a model that has the most apparent validity so that it seems logical from the point of view of the people in the model system. In this section, sensitivity analysis was used to check the apparent validity of the model; in this way, we changed the failure rate of system components and examined its impact on system reliability. It is clear that as the failure rate decreases, the reliability of the system must increase.
Step 2: An Empirical Investigation of model hypotheses In this step, two main categories of model hypotheses related to model structure and related to model information were examined. The above hypotheses were tested experimentally and intuitively with the cooperation of refinery experts.
Step 3: Examining the simulation outputs The most effective consideration for validating the model is that the simulation outputs should not be as significantly different as possible from the actual process outputs. For this purpose, the hypothesis test method has been used to validate the model outputs. In this study, the amount of system exhaust gas in has been selected as a criterion for comparison with the real system and validation of simulation outputs. Here, the unit of measurement of gas exhaust is reported by MSCMD (Million Standard Cubic Meter per Day). Each cubic meter per day (m3/d) of flow rate equals: 0.000035 million standard cu-ft of gas per day (at 15°C).
In order to validate the model, the average exhaust gas of the simulation model (Y1) was compared with the actual system average (Z1) and the following hypothesis was tested: If the 0 H hypothesis is not rejected, then there is no reason to reject the equality of the model exhaust gas averages and the actual system exhaust gas. If the assumption 0 H is rejected, then the assumption of the equality of the means of the exhaust gas of the model and the actual exhaust gas of the system is rejected and the model is not valid. Since the P-value (0.067) is greater than the significance level (0.05), there is no reason to reject the H0 hypothesis. Looking at the results of the above hypothesis test, we find that there is no significant difference in 95% confidence level between the outputs of the simulation model and the outputs of the real system; Therefore, the resulting simulation model is valid.

Results and discussion
Using FIFTA, a sufficient number of points are generated to draw a surface for each alternative. The cumulative probability of failure surface is a functionality associating an operating condition to a probability value. This is the required function described in the problem statement that associates a point in its domain (i.e. the space created by axes of time, pressure, and temperature and limited to their boundaries) to a value in its codomain (i.e. CFP); which demonstrates each alternative's resilience under different operating conditions. It should be pointed out that cumulative failure probability is drawn instead of reliability to have a convex function for a better illustration. It is known that R(t) = 1 − F(t) so a rise in CFP means a fall in reliability.
In order to illustrate the surfaces, one of the dimensions of the domain space needs to be separated so that the surface is drawn on a plane. The "Time" axis is separated to study the changes of reliability on the pressure-temperature plane. This provides the opportunity for the decision-makers to investigate the effects of the behaviors of gas on the system's failure probability in its different age periods.
Since FIFTA is being used to generate data, the surfaces on the pressure-temperature plane can be drawn for any age period of the system. 25 surfaces were drawn for the systems for each year, and the surfaces with the most significant changes were chosen to be illustrated in this paper.
The CFP surfaces were drawn so that reliability differences would help make a choice between the proposed alternatives. In the presented graphs, the CFP surface of the FOVS is always below that of the SDS meaning that the reliability of FOVS is higher than that of SDS in all operating conditions. Thus, FOVS outperforms SDS for concerned refineries and could be installed prior to the flaring tower.
In order to have a simplified representation of the drawn graphs, the pressure-temperature plane is divided into nine areas, seen in Table 8.
Each of these areas stands for a general operating condition where a system has relatively similar behavior. A proper number of points on each surface are selected in each area and an average of their CFPs (ACFP) is calculated. The results can be seen in tables (9,10,11). The last column of the Tables shows the percentage of difference of the ACFPs (as a representative of reliability and performance) between SDS and FOVS. As expected, there is always a positive difference in the last column because the CFP surface of the SDS is always above that of the FOVS. Besides the results obtained for the alternatives studied in this paper, in other cases after drawing the surfaces, there might not be a clear winner. In some cases, the surface of an alternative may be partially above and partially below that of the other alternative, which shows different resilience in different operating conditions. Thus, in the above Tables, some of the calculated numbers in the last columns would be negative. This could make the decision-making a lot more complicated.
To make a decision between such alternatives, different scores could be attributed to each of the 9 areas. This way, equal reliability values in different operating conditions would not be equally significant. Scores for each area can be based on a number of factors (e.g., the percentage of time they occur, how costly the type of the damage caused by an operating condition can be, the likelihood of failure of the systems in each area, etc.). It is up to the decision-making team to differentiate the importance of good performance in an area. The attributed scores by the decision-making team, are seen in column 2 of the above-mentioned Tables.
The score of an area can be used as a weight for the ACFP of that area to calculate a weighted average and have a single numerical value for the whole surface in a time period. Based on the calculated values for each year, a 2D graph is drawn in Figure 10 to show the difference in the performance of the alternatives versus time. Table 12 shows the weighted average of ACFPs. As seen in Figure  10, there is a clear advantage to using FOVS since it has a lower cumulative failure probability during its life (22.4% difference in average). Also, it can be seen that the difference in performance gets larger with the passage of time which concludes the comparisons.
Individual assessments can also be made on each alternative using the surfaces, and the following information might be of interest for the design team, maintenance team, and the management: The most dangerous scenario that can happen for the safety system is when gas passes through the systems with high pressure (250, 350) and low temperature (-50, 0) where the likelihood of failure is at its maximum level. Besides that, the safest operating condition is now detected in Figure 8.c where systems are in a high age. A combination of pressure of (50, 200) and a temperature of (50, 150) is the safest operating condition where both systems have the highest reliability level. The minimum level of pressure is considered "50" for a minimum flow that avoids flashbacks.
The above paragraph highlights another possible usage for the results obtained from FIFTA in cases where the contributing factors can be brought under control. Using the resulting surfaces from FIFTA, one can identify the best operating areas where reliability value is higher and keep the levels of the contributing factors in the identified areas. These are the standard limits that can be implemented in monitoring or controlling subsystems.
Other than that design improvements can be made in a system by identifying the operating conditions causing the lowest reliabilities. Then, if possible, sensitive components to that operating condition can be replaced with the ones that are more resilient against them (e.g. if high temperature decreases the reliability, high-temperature resilient components that can be used in the system).

Conclusion
In the current research, we showed that it is possible to obtain an interactive output result from FTA by fusing FIS and discrete-event simulation so that output changes can be identified for different contributing factors. The proposed expert-based approach and zoning system can help gather the required information for calculations in the purchasing phase. This provides a practical approach towards prognostic studies when actual assessments haven't been performed on a system. We showed that from the two proposed alternatives as a safety sub-system for an FGRU, FOVS outperforms SDS in a different age in terms of reliability judging by the lower CFP, and since the systems are assessed in different operating conditions, the comparison is fairly comprehensive which makes the final decision highly justified. Taking multiple factors into account helps also prevent the unforeseen failures of the safety subsystem.
The generated surfaces can also provide insight for design enhancements and control processes by indicating the system's resilience towards different operating conditions. Also, if there is the possibility to control the contributing factors, the surfaces can provide an approximation of standard limits for their levels. Here, judging by the generated results, it is suggested that the winning alternative isn't exposed to a simultaneous rise in gas pressure and temperature due to the massive plunge of reliability in this area.