An analysis to multi-state manufacturing system with common cause failure and waiting repair strategy

Abstract: This paper presents a mathematical model for a multi-state complex repairable manufacturing system with four types of failures such as common cause, partial, human and catastrophic incorporating waiting time to repair. As regards to repairing of the system, it has to be wait for repair due to unavailability of repair facility after common cause failure. The system may also fail due to catastrophic failure and due to incorrect start of the system, which can occur due to an untrained and inexperienced operator (human failure). The analysis is carried out using the supplementary variable technique and Laplace transformation for evaluating reliability measures such as availability, reliability, mean time to failure, mean time to repair, cost benefit analysis and sensitivity analysis of the system. Some graphical illustrations have also been taken to highlight the practical utility of the model.


PUBLIC INTEREST STATEMENT
• Analysis of a mathematical model for a multi-state complex repairable manufacturing system. • Common cause failure, partial failure, human failure and catastrophic incorporating waiting time to repair have been used. • The analysis is carried out using the supplementary variable technique and Laplace transformation. • Reliability, mean time to failure, mean time to repair, cost benefit analysis and sensitivity analysis of the system are evaluated. • Graphical illustrations have been taken to highlight the practical utility of the model.

Introduction
At various stages of design, planning and operation, system engineer are involved in many manufacturing cells (Henoch, 1988). In context to manufacturing cell design and flexibility issues, a wide range of modeling techniques is available. A wide range of products can be manufactured in the one and the same outfit because of new techniques and production concepts in the industry by introducing flexibility into the production machines in order to obtain the desired demand for customized products (Jain, Sharma, & Baghel, 2002). Thus to achieve this flexibility, flexible manufacturing systems (FMSs) are designed to produce a trade-off between the efficiency of transfer lines and the flexibility of job shops. These systems are able to accomplish this trade-off because of their reduced level of human interaction and their ability to eliminate the set-up times between consecutive operations. Although such systems promise flexibility, they also generate new problems. Misconception in design or mistake in implementation can lead to unreliable systems with low levels of availability, inadequate production efficiency, low reliability and high operational cost. A high degree of reliability is essential to justify the investments (Altumi, Philipose, & Taboun, 2001).

Literature review
Reliability and availability are the most significant design issues associated with the ability of a system to perform as expected over a period of time. A reliable manufacturing system should be able to meet the reliability and availability expectations. A direct way to increase the ability of producing a part in an expected time period is to increase the manufacturing capacity (Kusiak & Lee, 1997). Reliability is regarded as the functional performance of the product. It involves not only a manufacturing system, but also product and part design. As we all are aware of the fact that failure is expected for everything in the world and manufacturing systems are no exceptions. Thus, for the analysis of reliability of manufacturing systems, it is necessary to involve all possible failure of a component or system because any type of failure (internal or external) may cause the lower reliability or lower functional possibility of the system. An effective work related to improving the development of Multi-State System has been done by (Yingkui & Jing, 2012). Authors have summarized the review about the latest studies and advances about multi-state system reliability evaluation, multistate systems optimization and multi-state systems maintenance. Lin and Li (2015) extended a multi-state physics model (MSPM) framework for component reliability assessment by including semi-Markov and random Shock process. A Monte Carlo Simulation algorithms has been implemented to compute component state probabilities. Kumar, Varshney, and Ram (2015) studies the reliability analysis of the casting process in foundry work using a probabilistic approach. Taking this into consideration, the impact of five types of failures namely common cause, catastrophic, partial and human failure with waiting time to repair, on a multi-state repairable manufacturing system has been discussed in this paper. Lots of work has been done by the previous authors in this field, with the help of which we have categorized the work as given below.

Models with common cause failure
Common cause failure is the failure of numerous components from a common event which has been transmitted through a coupling factor. According to the earlier researchers, common cause failure is the failure of multiple, first-in-line items to execute as essential in defining decisive time period due to a single fundamental fault or substantial phenomena such that the effect is judged to be loss of one or more systems. Common cause failure analysis is important in reliability and safety studies, as common cause failures frequently govern arbitrary hardware failures. System affected by common cause failures are systems in which two or more events have the potential of occurring due to the same cause. Some typical common causes include collision, tremor, force, gravel, strain and temperature. It is an important phenomenon for a system with failure dependent parts. Dhillon and Yang (1992) presented the two developed models representing two identical unit standby systems with common cause failures. In that work, the authors did not include the catastrophic failure. Yin (2010) analyzed and compared several common cause failure models and presented a new common cause failure model for system reliability estimation based on Bayesian network. From that study, the authors concluded that the reliability models based on Bayesian network is reliable. The authors did not evaluate other reliability measures such as availability, mean time to failure (MTTF), mean time to repair (MTTR) and sensitivity of the proposed model. Ram and Singh (2008) analyzed a complex system with common cause failure and two types of repair. In that work, the authors did not include human error and catastrophic failure in reference to the failure of such system. El-Damcese and Temraz (2012) presented a mathematical model for reliability and availability of a parallel repairable system consisting of n identical components with degradation and common cause failure. In that study, the authors did not include any other type of failure which is possible in the stated system.

Models with human error
It is well known fact that human performance has several potential benefits for the study of humanmachine interaction and system design. On the other hand human error is an improper or detrimental human pronouncement or action that reduces or has the potential to reduce the efficiency, security and system concert. Human error is defined as the failure to execute a particular task that could escort to interruption of planned operations or outcome in destructing belongings and gears. There are many reasons for the happening of human errors, such as insufficient lighting in the work area, insufficient guidance or the ability of the manpower involved, deprived gear design, elevated noise levels, insufficient work arrangement, inappropriate tools, inadequately written equipment maintenance and working procedures. The human error has a key role while the error remains unobserved and uncorrected. Much focus is laid in this area of research such as; Dhillon and Yang (1992) presented the two mathematical models representing the two identical unit standby systems with human errors. The reliability of the system, MTTF, variance of time to failure, and time dependent system availability expressions has been developed. The authors did not evaluate the mean time to repair, expected profit and sensitivity of the models, which are the important measures of reliability. Giuntini (2000) developed a mathematical procedure for the estimation of the contribution of human operators in the man machine system. The effectiveness of human and performance of a task has been discussed by the author. Feng and Kapur (2009) discussed the reasons of measurement errors (inherent design of measurement systems, fluctuations of environmental elements and human error) during any inspection process. Hattori, Nakajima, and Ishida (2011) discussed the individual behavior model by using participalary modeling in traffic domain. A methodology that can stimulate prior knowledge of explaining human driving behavior in specific environment has been shown. The authors also constructed a driving behavior model based on the set of prior knowledge. In that study, the authors have analyzed the human driving behavior model under specific environment, but they did not analyze the same in specific conditions. Ram, Singh and Varshney (2013) investigated the reliability of a standby system under human failure, but the authors did not include common cause failure and catastrophic failure in their work. Gorguluarslan, Kim, Choi, and Choi (2014) developed a mathematical model for reliability estimation of washing machine spider assembly via classification.

Models with catastrophic failure
Catastrophic failure is a sudden and total failure of any system for which recovery is impossible or we can say that, a serious effect on contagion characterized by the sudden and complete loss of component performance. Catastrophic failure can be the result of deliberate degradation and discontinuous failure. Lombardo, Masnata, and Settineri (1997) proposed a cutting tool beating and chipping detection system for continuous and interruption cutting based on the analysis of cutting force components. The authors did not analyze the sensitivity of the system. Pandey, Singh, and Sharma (2008) evaluated various reliability measures such as MTTF, availability, cost and transition state probabilities of a subsystem on a website which provides the "contact us" functionality. Copula technique has been used by the authors for this analysis. In that work, the authors did not include common cause and human failure. Also, they did not analyze the sensitivity of the system. Ram and Singh (2008) discussed about the availability of a complex system consist of two repairable subsystems A and B in (1-out-of-2: F) and (1-out-of-2: G), subsystem B has n units in series (1-out-of-n: F) with partial and catastrophic failures. The authors have analyzed that model under the preemptive resume repair policy by using the supplementary variable technique, Laplace transformation and copula. Also transition state probabilities, availability, MTTF and expected profit have been evaluated along with the steady state behavior of the system. In that work, the authors did not analyze the system under other multiple failures. Gamage and Xie (2009) investigated the possible defect detection methodologies and has proposed a system that is capable of real time monitoring of defects in the cost extrusion manufacturing process. Ram and Singh (2009) analyzed the reliability of an engineering system with two types of repair facilities: major and minor, by using copula approach. The authors concluded that incorporation of copula can improve the reliability of the system. The authors did not analyze that how much the system is sensitive? Ram and Singh (2010) analyzed the availability, MTTF and cost of a complex system under preemptive-repeat repair discipline by using copula technique. Supplementary variable technique and Laplace transformation have been used by the authors for that analysis. The authors did not analyze the system under multiple failures such as human, catastrophic, common cause and partial failure simultaneously. Singh, Ram, and Chaube (2011) considered a system having three units, one of them can be controlled by the controller and other two are independent. The authors assumed that two repairmen are involved in repair of the system. A mathematical model has been obtained with the help of copula approach and the system has also been examined by using Abel's lemma for steady state behavior. In that study authors did not evaluate the other important reliability measures such as MTTF, mean time to repair expected profit and sensitivity of the system. Ram and Singh (2012) analyzed the cost benefit of a system under head-of-line repair by using the copula approach. The authors did not analyze the system under other failures.

Models with partial failure and waiting time to repair
Partially failed means the system as a whole is not failed due to the problems either in the system or in its components. In such types of failures, the functioning of the system is slow down i.e. the system is in degraded state. The reparability of any system is an important factor to ensure the reliability of the system. Since reparability problems arise not only in engineering systems, but also in manufacturing systems. Therefore, much more attention concentrates on the study of repairable systems. If the repairman is not available for repair after the instant failure, then the system has to wait for repair. Wang, Ke, and Lee (2007) analyzed the reliability and sensitivity of a repairable system with M operating, S standby units and R unreliable source stations under unit failure. Ruiz-Castro, Fernandez-Villodre, and Perez-Ocon (2009) studied a multi-component system subjected to internal and external failures. The authors assumed that the external failure may be repairable or not. The availability and conditional probability have been calculated for finite and infinite number of units. In this study, the authors did not calculate the reliability of the system. Singh and Srinivasu (1987) analyzed a single server two identical unit cold standby system. The authors assumed that the service facility is available whenever the operating unit fails, and the failed unit waits for repair for a random time due to preparation for repair. The authors used regeneration point technique to study the stochastic behavior of the system. The analysis is carried out under the supposition that failure time distribution is negative exponential, whereas inspection preparation time for repair and repair time distributions are arbitrary. In that work, the authors did not include common cause failure, which is possible in such types of systems. Hsu and Shu (2010) considered a problem of reliability assessment and determination of optimal replacement time for machine tool under wear deterioration. A non-homogeneous continuous time Markov process for modeling the tool wear process has also been proposed by the authors. El-Said and El-Sherbeny (2010) did the cost benefit analysis of a two-unit standby system with two-stage repair and waiting time. At first stage, the repairing of the unit is started, but it does not get completed and the process is completed in second stage. That analysis has been carried out by using regenerative point process techniques for the measurement of effectiveness of the system. In that work, the authors did not include catastrophic, common cause and human failure in reference to the failure of the system. Ibrahim and Saminu (2012) developed a stochastic model for identical two unit parallel system with two types of failures. The authors assumed that the system works in three modes: normal, deterioration and failure. From that study, the authors concluded that the mean time to system failure and availability of the system exposed to various degrees of deterioration decreases more that the system without deterioration.  investigated the reliability of a standby system under waiting repair strategy and human failure. The authors assumed that the repair of the main and standby units follows the general distribution, whereas the repair due to human failure is obtained with the help of the copula. Various reliability measures such as availability, reliability, MTTF and profit function have been evaluated by using the supplementary variable technique and Laplace transformation. In that study, the authors did not analyze the sensitivity of the system.
To further the above mentioned research, it can be said that the situation of waiting time to repair, common cause failure, human error can arise due to immediate unavailability of repair facility, which seems to be possible in many manufacturing systems and the current economic scenario. The present paper deliberates the concept of a complex repairable manufacturing system, which can fail completely due to the common cause, human error, catastrophic failure and unavailability of repair facility. The present contribution is structured as follows: Section 2 describes the literature review related to the work, Section 3 shows the details of the mathematical model containing the nomenclature, description with assumptions, state transition diagram and formulation and solution of the proposed model. Section 4 covers the particular cases of reliability measures. And Section 5 presents the result, discussion and finally Section 6 is the conclusion of the proposed analysis.

Nomenclature t/s/i
Time scale/Laplace transform variable/numeric value S j Transition state for j = 0, 1, 2, 3, 4, 5 The probability that the system is in good state S 0 at an instant "t" The probability that the system is in failed state S 1 due to common cause failure at an instant "t" The probability that the system is in degraded state S 3 due to the partial failure at an instant "t" The probability density function that the system is in state S m at epoch t and has an elapsed repair time of x, where l = 2i, c, h and m = 2, 4, 5

P(s)
Laplace transformation of P(t) P up (s)/P down (s) Laplace transformation of the probabilities that the system is in good or degraded states/failed state A(t) Availability of the system at time t R(t) Reliability of the system at time t

η(x)/ψ(x)/ϕ(x)
Repair rates from the state unavailability of repairman/catastrophic failure/human failure Failure rates of the system due to common cause failure/partial failure/ catastrophic failure/human failure/waiting time to repair E p (t) Expected profit during the interval [0, t) K 1 , K 2 Revenue and service cost per unit time respectively Manglik and Ram (2013) have analyzed a complex system consisting four subsystems in the series. The first three subsystems have a single unit in series, while the last subsystem has one main and two cold standby units. In that work, the authors have taken unit failure consideration. In the other work of Ram and Manglik (2014) have analyzed a complex repairable system having three units in parallel with partial failure, catastrophic failure, human failure and waiting time to repair.

Model description and assumptions
In the context of these previous works, in the present paper, the authors have analyzed a complex repairable manufacturing system having multiple failures. Initially, the system is in good working condition after common cause failure, the system goes into a failed state, where the system is waiting for repair and due to non-availability of repair facility, goes to failed state. The system goes to the degraded state due to the partial failure of its components and it can also fail from that state due to human error and catastrophic failure. Partially failed means the system as a whole is not failed due to the problems either in the hardware or the software while catastrophic failure means complete, sudden, often unexpected breakdown in a machine, electronic system, computer or network. The system can be repaired in both the cases. For the failures, the repairs are done absolutely so after the repair every subsystem is as good as new. The system configuration and state transition diagram of the proposed model has been shown in Figures 1 and 2 respectively. Failure and repair rates are implicit to be constant in general. With the help of the supplementary variable technique and Laplace transformation, following reliability measures of the system have been evaluated: (i) Transition state probabilities of the system.
(ii) A series of reliability measures such as availability, reliability, MTTF, busy period (MTTR), sensitivity analysis and cost effectiveness of the system.  Some numerical examples are also presented to illustrate the model mathematically.
The state specification of the system is as given below:

State
State description S 0 The system is in good working condition S 1 The system is in failed state due to common cause failure (Ram, 2010) and waiting for repair S 2 The system is in failed state due to the unavailability of repairman S 3 The system is in a degraded state due to the partial failure S 4 The system is in failed state due to the catastrophic failure S 5 The system is in failed state due to the human failure That is, S 0 is the state where the system is in good working condition. S 1 , S 2 , S 4 and S 5 are the states where the system is failed due to the failure of common cause, waiting time, catastrophic and human and S 3 is the state where the system is in degraded mode due to partial failure.
The following assumptions are associated with the model: (i) Initially the system is in good state.
(ii) The system has three states namely good, degraded and failed.
(iii) The system has completely failed due to the common cause, human and catastrophic failure.
(iv) The system waits for repair due to the common cause failure.
(v) All failure and repair rates are constant.
(vi) The system can be repaired only, when it is in completely failed mode.
(vii) The repaired system works like a new one.

Formulation and solution of the mathematical model
As per Appendices 1-3, the transition state probabilities of the designed manufacturing system are: Laplace transformations of the probabilities that the system is in the up (i.e. either good or degraded state) and failed state at any time are as follows: (1) P up (s) =P 0 (s) +P 3i (s) = 1 + s + h + c P 0 (s)

Availability analysis
Taking the values of different parameters as h = 0.045, c = 0.035, α = 0.030, β = 0.020, ω i = 0.010 η(x) = ψ(x)= φ(x) = 1, putting all these values in Equation (7) and then taking the inverse Laplace transform, we get: Varying time unit t from 0 to 15 in Equation (9), we obtain Table 1 and correspondingly Figure 3, representing the behavior of availability of the system with respect to time.

Reliability analysis
Taking all repairs equal to zero and inverse Laplace transform of Equation (7), one may get: Let us fix the failure rates as h = 0.045, c = 0.035, α = 0.030, β = 0.020. By putting all these values in Equation (7) and varying time unit t from 0 to 15, one can obtain Table 2 and Figure 4 respectively, which represents the reliability variation of the system.

Mean time to failure analysis
The MTTF can obtain as: Setting h = 0.045, c = 0.035, α = 0.030, β = 0.020 and varying h , c , α, β one by one respectively as 0.1 to 0.9 in 0.1 intervals in Equation (11), one may obtain the variation of MTTF with respect to failure rates (Table 3 and Figure 5).

Expected profit
Cost control is critical to maintain product reliability for any manufacturing system. Clearly, reliability alone will not guarantee product viability. Similarly arbitrary cost cutting can be detrimental to profit when the relating system reliabilities too low. Let the service facility be always available, then expected profit during the interval [0, t) is given as: Using Equation (9) expected profit for the same set of parameters is given by:

Busy period analysis or mean time to repair
MTTR is the average time that a system will take to recover from any failure. For MTTR, taking all repairs to zero in Equation (8)

Sensitivity of reliability
Sensitivity analysis is a technique to predict the conclusion of a decision if a state of affairs turns out to be different compared to the key prediction. It is very useful when attempting to determine the impact the actual outcome of a fastidious variable will have, if it differs from what was previously assumed. Sensitivity analysis is used to determine how sensitive a model is to changes in the value of the parameters of the model and to change in the structure of the model. The sensitivity analysis for changes in reliability resulting from changes in the system parameters λ h , λ c , α and β by differentiating Equation (10) with respect to failure rates h , c , α and β respectively and by putting h = 0.045, c = 0.035, = 0.030, = 0.020, we get the numerical values of Now taking t = 0 to 10 units of time in the partial derivatives of reliability with respect to different failure rates, we have obtained the Table 6 and Figure 8 respectively.

Sensitivity of MTTF
Sensitivity analysis for changes in MTTF resulting from changes in system parameters i.e. system failure rates h , c , α and β. By Differentiating (11) Table 7 and Figure 9 respectively.

Sensitivity of MTTR
Sensitivity analysis for changes in MTTR resulting from changes in system parameters By Differentiating (13) Table 8 and Figure 10 respectively.

Result discussion
From the results as received from the analysis of the design system, one can conclude as below: (i) The state availability is a number greater than zero and less than one. It is equal to zero when no repair is performed and equal to one when the equipment does not fail. Analysis of Table 1 gives the idea of the availability of the stated system with respect to time t. Critical examination of corresponding Figure 3 yields that the values of the availability decreases approximately in an even manner with the increment in time.
(ii) The numerical value of reliability R(t) is always between one and zero, i.e. R(0) = 1, R(∞) = 0 and R(t) is a non-increasing function between these limits. Table 2 shows the trends of reliability of the designed system with respect to the time when all the failure and repair rates have some fixed values. From the graph (Figure 4), it is concluded that the reliability of the system decreases more sharply with the passage of time. Reliability may be improved by clarity of expression, lengthening the measure, and other informal means. However, formal psychometric analysis, called item analysis, is considered the most effective way to increase reliability.
(iii) Table 3 shows that MTTF of the system with respect to various failure rates. A critical examination of Figure 5 shows that the MTTF decreases with increment in common cause failure, catastrophic failure and human failure and increases with increment in partial failure. From this study, one can conclude that the common cause failure, catastrophic failure and human failure much more controlled during the system operation, while it seems to be uncontrolled with respect to partial failure.
(iv) Table 4 and corresponding Figure 6 represent the cost function versus time. Here, one can easily observe that the increasing service cost leads decrement into expected profit. The study shows that the minimum service cost leads to maximum expected profit. On the other hand maximum service cost leads to minimum profit. It is concluded that by controlling service cost, high profit could be attained.
(v) Table 5 and Figure 7 show the MTTR as function of failure rates. A critical examination of Figure  7 shows that the MTTR increases with increment in common cause failure and catastrophic failure, and decreases with increment partial failure, human failure and waiting time to repair. Hence the study shows that the system takes more recovery time due to the common cause failure and catastrophic failure in comparison of partial failure, human failure and waiting time to repair. Sensitivity of MTTR t (vi) The sensitivity of reliability, MTTF, MTTR of the system has been evaluated. The sensitivity analysis of the described system reliability with respect to common cause failure, partial failure, catastrophic failure and human failure are shown in Table 6 and Figure 8. It reveals that sensitivity initially decreases and tends to increase as time passes. It is clear from the graph that system reliability is more sensitive with respect to partial failure. Moreover, Table 7 and Figure  9 show the sensitivity of MTTF with respect to common cause failure, partial failure, catastrophic failure and human failure, which demonstrate that it increases with increment in the common cause failure, catastrophic failure and human failure, and decreases with increment in partial failure. Critical observation of the graph point out that the MTTF of the system is more sensitive with respect to partial failure. Finally, Table 8 and Figure 10 show the sensitivity of MTTR with respect to common cause failure, partial failure, catastrophic failure and human failure, which indicate that the MTTR sensitivity increases with respect to variation in partial failure, catastrophic failure, human failure and remains constant with respect to variation in common cause failure. Critical observation of the graph point out that the MTTR of the system is more sensitive with respect to common cause failure.

Conclusion
With the beginning of complex manufacturing systems, reliability is an important performance measure for evaluating a system. The importance of early evaluation of reliability is recognized in concurrent engineering. The manufacturing system reliability is considered for three perspectives, i.e. part design, system design and the integration of both. This paper is aimed at evaluating the various reliability measures with the help of Markov process, supplementary variable technique and Laplace transformation. It is shown that the reliability of such manufacturing systems can be predicted under various types of failures. With the help of this study authors concluded that the performance of the manufacturing system can be improved by improving the procedures, proper training of employees and proper maintenance of the system. From the hypothetical point of view, the research of this paper is based mainly on system reliability theory, and stochastic processes. The results achieved in this paper are valuable in a study of improving the reliability of the systems. Additionally, they can be extensively used in many engineering disciplines.
The probability that the system is in state S i at time t and remains there in interval (t, t + Δt) or/and if it is in some other state at time t then it should be transit to the state S i in the interval (t, t + Δt) provided transition exist between the states and Δt → 0.