Reliability Importance of Components in a Real-Time Computing System with Standby Redundancy Schemes

Component importance analysis is to measure the effect on system reliability of component reliabilities, and is used to the system design from the reliability point of view. On the other hand, to guarantee high reliability of real-time computing systems, redundancy has been widely applied, which plays an important role in enhancing system reliability. One of commonly used type of redundancy is the standby redundancy. However, redundancy increases not only the complexity of a system but also the complexity of associated problems such as common-mode error. In this paper, we consider the component importance analysis of a real-time computing system with warm standby redundancy in the presence of Common-Cause Failures (CCFs). Although the CCFs are known as a risk factor of degradation of system reliability, it is difficult to evaluate the component importance measures in the presence of CCFs analytically. This paper introduces a Continuous-Time Markov Chain (CTMC) model for real-time computing system, and applies the CTMC-based component-wise sensitivity analysis which can evaluate the component importance measures without any structure function of system. In numerical experiments, we evaluate the effect of CCFs by the comparison of system performance measure and component importance in the case of system without CCF with those in the case of system with CCFs. Also, we compare the effect of CCFs on the system in warm and hot standby configurations.


Introduction
Nowadays, real-time computing systems are widely used in our daily lives, e.g., Anti-Lock Braking System (ABS) in cars, telephone networks, and patient care systems. A real-time computing system is a system in which timeliness is as important as correctness of its outputs (Laplante, 1997). A delayed output in real-time systems is not acceptable even if it has a correct value. Thus, the reliability of these systems is more important. To guarantee high reliability of real-time computing systems, redundancy has been widely applied, which is defined as the use of additional components or subsystems beyond the number actually required for the system to operate reliably, and plays an important role in enhancing system reliability. However, redundancy increases not only the complexity of a system, but also the complexity of associated problems such as common-mode error. In such systems, it is necessary to ensure that the critical components in the system are operational with high reliability. To detect the critical components in the system, the sensitivity analysis is effective. The sensitivity analysis is a method to estimate the magnitude of deviations of performance indices when system configuration changes. Generally, the parametric sensitivity is considered, which is the first derivatives of performance indices with respect to model parameters. The parametric sensitivity can also be applied to optimizing system performance by combining the mathematical programming as well as the evaluation of effects on parameters. Nevertheless, in the reliability engineering, the component importance analysis is more preferred than the parametric sensitivity analysis. The component importance analysis, called the component-wise sensitivity analysis, is to estimate the first derivatives of reliability measures of system with respect to reliability measures of components. Thus the component importance analysis can detect the critical components from the reliability point of view directly.
On the other hand, the system failures are always caused by the dependent failures among components in practice, such as the Common-Cause Failures (CCFs). The CCF is defined as any condition or event that affects several components inducing their simultaneous failures or malfunction (Fricks and Trivedi, 1997), and is synonymous with the simultaneous failures or multiple failures. When CCF occurs, all the components affected by the common cause event will fail. In fact, the dependent failure is known as a risk factor of the degradation of system reliability, and makes it difficult to evaluate the component importance measures analytically. In the past, some researches considered the real-time computing systems with failure dependencies. For example, Fricks and Trivedi (Fricks and Trivedi, 1997) studied the effect of failure dependencies in a real-time computing system using Stochastic Petri Nets (SPNs) and Continuous-Time Markov Chains (CTMCs). Also, they classified some different types of failure dependencies that can arise in the reliability model of real-time computing system, and illustrate how several of the failure dependencies can be incorporated in SPN model. Based on their research, it is realized that failure dependencies highly influence the system reliability and that failure dependencies therefore never should be ignored.
Moreover, Fricks and Trivedi (Fricks and Trivedi, 2003) considered three kinds of component importance measures for Markov Reward Model (MRM), in contrast to the common method of computing importance measures using combinatorial models (e.g., Fault Tree (FT) and Reliability Block Diagram (RBD)) and structure function which represents the relationship between components failures and system failure. Pan and Nonaka (Pan and Nonaka, 1995) presented a quantitative method to evaluate the importance of each CCF event. More precisely, they divided the CCFs into two groups; one with a clear relationship between the causes and effects and the other with no such relationship. For the first group of CCFs, they evaluate the structure function importance and probability importance of the common root cause events modeled using FT. On the other hand, they considered the Birnbaum importance for the second group of CCFs which are achieved by using parametric model. Furthermore,  applied a novel component-wise sensitivity analysis to derive the availability upgrading functions under which components are statistically independent and described by general CTMCs. In their paper, the presented method can derive the component importance measures only from a CTMC model without any structure function of system. Furthermore, in Zheng et al. , they introduced a CTMC model for a real-time computing system with a hot standby redundancy in the presence of CCFs, and applied the CTMC-based component-wise sensitivity analysis to evaluate three kinds of importance measures. This paper is an extension work of Zheng et al. . In , we have considered the real-time computing system in a hot standby configuration. In the hot standby redundancy, the redundant component is working in parallel with the active components. Thus the redundant component is able to take over the functions of the active component in very short time. However one of the disadvantages of the hot standby redundancy configuration is that the redundant component is subject to aging to the same extent as the active one, that means, during operation, the redundant component accumulates life-cycle operational hours that ultimately lead to the failure of that component (Ayers, 2012). Absolutely, the failure of redundant component decreases the system reliability of real-time computing system. To address this problem, we consider the warm standby redundancy configuration in which redundant component is partially active and can take over the functions of the active component in a short time. Therefore, this paper considers the real-time computing system in a warm standby configuration, which is represented by hybrid model consisting of RBD and CTMCs. The RBD is top level description for the system that illustrates how components and subsystem reliabilities contribute to the success or failure of a system. The RBD allow us to model the failure relationships of complex systems, but cannot be used to describe the dynamic reliability behavior of systems. On the other hand, the CTMC can well describe the dynamic behavior of system, and is used to model three subsystems in the real-time computing system. Based on these models, we evaluate three kinds of importance measures considered by Fricks and Trivedi (Fricks and Trivedi, 2003) of system components and subsystems. Also, we evaluate the effect of CCFs by the comparison of system performance measure and component importance in the case of system without CCF with those in the case of system with CCFs, and compare the warm and hot standby configurations.
The rest of this paper is organized as follows. Section 2 introduces the real-time computing system in a warm standby configuration. In particular, we model the failure relationships of system using RBD, and describe three subsystems modeled by CTMCs. In Section 3, we evaluate the system reliability from structure functions and CTMC analysis respectively. Section 4 is devoted to sensitivity analysis of reliability. In Section 5, we evaluate three kinds of importance measures based on structure functions and Markov-based component-wise sensitivity analysis. Section 6 is devoted to numerical experiments. Finally, we conclude this paper with some remarks in Section 7.

Redundancy Schemes
Redundancy is defined as the use of additional components or subsystems beyond the number actually required for the system to operate reliably, and is commonly used in system design to enhance system reliability, especially when it is difficult to increase component or subsystem reliability itself (Kuo and Zhu, 2012). In general, there are two types of redundancy: (i) active redundancy, where all ( 2) n  components in a parallel system are used simultaneously and only one component needs to be functioning in order for the system to function; and (ii) passive (standby) redundancy, in which components are set to have two states (active and standby). In this case, the standby components are applied only when an active component fails. Besides, a sensing and switching mechanism is used to monitor the operation of the active component. There are three types of standby redundancy shown below (Kuo and Zhu, 2012):  Hot standby: Standby component is also called active redundant component which has the same failure rate as the active one.
 Cold standby: Standby component does not fail while in standby state, i.e., the failure rate of cold standby component is zero.
 Warm standby: Standby component is not an active component but may fail while in the standby condition due to dormant failure, i.e., the failure rate of warm standby component is between 0 and the failure rate of active component.
In the system, the PM subsystem is implemented by a pair-and-a-spare fault-tolerant scheme (Johnson, 1988) with a warm standby configuration. Concretely, two processor modules operate online in synchrony, and a spare module runs simultaneously with the pair modules but will not process data or requests. However, data is mirrored in real time, thus both processor modules have identical data. Upon failure of the pair modules, the spare one immediately takes over, replacing the pair modules. On the other hand, the parallel (active) redundancy schemes are adopted to operate all the other critical system components (e.g., shared memories, input/output (I/O) bus, and digital switches). . Also, we assume that there is a single infallible repair station for each component.

Subsystem Models
In Fricks and Trivedi (Fricks and Trivedi, 1997), the behaviors of all subsystems are described by CTMCs which are commonly used to represent the variations caused by failures and repairs of components in the system structure. In particular, we consider the CCFs occurring among the components in PM subsystem.

Common-Cause Failure
As mentioned before, the Common-Cause Failure (CCF) is defined as any condition or event that affects several components inducing their simultaneous failure or malfunction. Generally, there are three types of common-cause failures, that is (i) human errors, which can result in damage to equipment and property or disruption of scheduled operations of the system; (ii) system environment, including the characteristics of the environment where the system operates and the natural factors such as earthquake, fire, and flood; and (iii) intercomponent, which means that the failure of a component may affect adversely other components as a result of a chain reaction or domino effect.
Since it is difficult to measure the probability of common cause event accurately, the parametric approach such as -factor model (Fleming, 1975;Hughes, 1987;Rausand and Høyland, 2004), α-factor model (Mosleh et al., 1994), has been widely used to quantitatively analyze the failure dependency model. These parameter values are given based on engineering experience and the published statistics of common cause failures. In this paper, the -factor model is applied to describe the intercomponent failure dependency, due to its simplicity.

-Factor Model
The -factor model was first introduced by Fleming (Fleming, 1975) and is still the most popular CCF model because of its simplicity, requiring only one extra parameter, . The factor gives the probability that a failure in a specific component causes all components to fail, and 1 − gives the probability that the failure will involve just the component. Suppose that is the independent failure rate which will not cause other component' s failure, and is the common cause failure rate which denotes all the component' s failures caused by a shared cause event. Then the total failure rate of a particular component can be written as the sum of independent and common cause failure contributions: (1) Thus is defined as the fraction of the total failure rate attributable to common cause failure: For example, consider two redundant components A and B. It is clear that, = in the case that components A and B are identical. However, dissimilar components may have different failure rates and different beta factors.

PM Subsystem
The CTMCs of PM subsystem are depicted in Fig. 3 and 4. More precisely, Fig. 3 illustrates the Markov model for PM subsystem without CCF. For the system with CCFs, the Markov model is shown in Fig. 4. In these figures, white and gray nodes represent active and failure states respectively. Table 1 shows the state notations which based on the current conditions of components. Concretely, each state is indicated by 3 characters. The first character means the state of component PM1. When PM1 is active, the character is given by `A', if failed, it is `F'. The second character represents the state of component PM2 in the same manner as the first character. The third character gives the state of standby component PM3, when PM3 is in standby state, the character is given by `S', when in active state, it is `A', if failed, it becomes `F''. In particular, we also define the states of simultaneous failures of components for the case that CCFs occur in the PM subsystem shown in Table 1.
The model parameters are represented in Table 2. For example, 1/ is Mean Time to Failure (MTTF) of an active component PM, and then is a failure rate which is a transition rate in the CTMC. For components PMs, we assume that they have the same beta factor which gives the probability that a failure in one component causes all components to fail. As seen in Fig. 4, the transitions of simultaneous failures of all PMs and two PMs are highlighted by dashed lines.

SM and DS Subsystems
Assume that there is no CCF occurring in the SM and DS subsystems. Then we have the 4-state CTMC models represented in Fig. 5 for SM and DS subsystems. In this figure, the state notations are given in the same manner as the PM subsystem, and shown in Table 3. The parameters of the 4-state CTMC models are also given in Table 2.

Reliability Function 3.1. Structure Function
The structure function is a binary function that indicates the state of the system (success or failure) given the state of each component (Jensen and Bard, 2003). Given the structure function of a system, we can compute its reliability. Generally, the structure function can be derived from FT and RBD.
be the state vector of real-time computing system, and the -th element of x is a binary variable which represents the condition of component , ∈ { 1, 2, 3, 1, 2, 1, 2}: The structure function represents the relationship between component failures and system failure.
In general, the structure function is defined by 1, if system is operational, For example, consider a system consisting of components. If the system is a series system, namely, the system failure occurs when any component fails, the structure function is given by If the system failure occurs only when all the components fail, so-called parallel system, then the structure function is given below, According to the RBD in Fig. 2, we obtain the structure function of real-time computing system as follow, be a certain probability mass function of the system being in state x at time t. Then the reliability function of system can be computed by where  is the state space of the system as shown in In practice, the above equation is often called the structure function which represents the effect of components reliabilities on the system reliability.

CTMC Analysis
Suppose two components A and B connected in a series configuration shown in Fig. 6, and the CTMC generator matrices of them are given as follows: .
In the case of CTMC, the tensor sum of matrices A and B is defined in terms of tensor products (Plateau and Stewart, 2000) as where 1 is the order of A, 2 is the order of B, and is the identity matrix of order .
Generally, in the availability modeling of CTMC, the states of system can be classified into two sets; U , the set of up (operational) states in which the system is available; and D , the set of down (failed) states in which the system is unavailable. We define and as the sets of states where the component is up or down, respectively. Also, and are the sets of states where the system is up or down. Then the reward vectors for component and system can be defined by respectively, where [•] means the -th element of a vector. Using the reward vectors, the reliability functions for component and system are given by where () t π is the state probability vector which can be computed by solving the following Kolmogorov differential equation (Trivedi, 2001) ( ) ( ) ; given (0) , in above equation, (0) π is a given initial probability vector.
The solution of Eq. (18) at any time point is a transient solution of CTMC, and can also be represented by

Sensitivity Analysis of Reliability
Sensitivity analysis is an important tool for the system design, and regularly defined as the first partial derivative of reliability with respect to model parameters. Thus, the sensitivity of reliability indicates the rate of variation of outcome measure with respect to input factor. This section deals with the sensitivity analysis of reliability function for the Markov model. given by the first derivative of () t π with respect to and the sensitivity function follows that Then the sensitivity of reliability function, ( ), with instantaneous reward, S r , is given by Note that the above sensitivity function of reliability becomes simple when the reward vector is not sensitive to the parameter . By taking account of the first derivative of sensitivity function as in Eq. (20) with respect to , then using Eq. (18), the following ordinary differential equation (ODE) is obtained, We then integrate Eq. (18) into the above ODE, and obtain Since the diagonal elements of ()  Q are same as those of , S Q we can apply the uniformization (Zheng et al., 2013) to the following matrix exponential form: Likewise, the sensitivities of subsystem reliability functions with respect to model parameters can also be computed by using the above method.

Component Importance Analysis 5.1. Birnbaum Importance Measure
Birnbaum (Birnbaum, 1968) defined the component importance from the reliability point of view. In Birnbaum (Birnbaum, 1968), the component importance is defined by the first derivative of system reliability function with respect to the component reliability: Let () k  x be the first derivative of structure function with respect to the state condition of component : By integrating Eqs. (10) and (27) In general, we compute the RIB by using the first derivative of system reliability with respect to the component reliability after obtaining the system reliability structure function as in Eq. (11).

Case I: There is no CCF Occurring in the System
Suppose that there is no CCF in the real-time computing system, that is, all components are statistically independent, and the CTMC of PM subsystem is shown as in Fig. 3. Thus the sensitivities of system performance index with respect to component performance indices can be obtained from the structure function analytically. For example, for the component PM1 in the system without CCFs, then using Eq. (11), the Birnbaum reliability importance of component PM1 is given by where the reliability of each component can computed from Markov analysis.

Case II: CCFs Occur Among Components in the PM Subsystem
However, in practice, the system failure often occurs due to the CCFs. For example, the real-time computing system where the intercomponent dependent failures occur among the components in PM subsystem as seen in Fig. 4. In such case, we cannot obtain the above sensitivities from structure function analytically. Then we consider the Markov-based component-wise sensitivity analysis , which can be used to compute ( where the sensitivities of reliabilities of components (or subsystems) and system are obtained by using the sensitivity analysis described in Section 4.
Then the estimate of

Criticality Importance Measure
The criticality measure was proposed by Henley and Kumamoto (1981) which means the probability that, when the system fails, the failure of component becomes a cause of the system failure. They defined the criticality importance of component as a fractional sensitivity given by Similarly, according to Frank (1978), the Eq. (32) can also be represented by the reliability functions of system and component, i.e., Essentially, these measures can be computed from

Upgrading Function
The upgrading function is the parametric sensitivity function with respect to a failure rate (Fricks and Trivedi, 2003). According to the definition, we have the reliability upgrading function (RIU) for component : where k  is the failure rate of component .
In fact, the failure rates of components in the system where CCFs occur change as time increases.
Thus, we use the time-dependent failure rate, i.e.,

 
. Generally, the relationship between reliability function and failure rate is given by Also, the sensitivity of ( ) with respect to For the systems in which components are statistically independent, the common method using combinatorial models (e.g., FT and RBD) and structure function, as well as the Markov-based component-wise sensitivity analysis method, can be applied to compute the importance measures . For the systems with CCFs case, the Markov-based component-wise sensitivity analysis is applied. Moreover, based on the numerical results, we evaluate the effect of CCFs on system performance and component importance, in contrast to the effect of CCFs on the system in a hot standby configuration in Zheng et al. . The model parameters are given in Table 5. In Fig. 7, we plot the time-dependent reliability curves for all components, subsystems, and system with warm standby redundancy under both cases I (no CCF) and II (CCFs occur).
According to the reliability curves in Fig. 7(a) > > > . For subsystems, obviously, the PM subsystem has higher reliability than others, thus it is the most reliable one because of pair-and-a-spare fault-tolerant scheme, e.g., in the case of system without CCF, the reliability of PM subsystem is quite high and approximately equal to 1 before 4200 t  hours. However, the SM subsystem is more prone to failures due to the unreliable SM components. More precisely, the reliabilities of components, subsystems, and system decrease as time increases under both two cases. In either of the cases, the reliability of component SM decreases sharply with increasing time, because of the highest failure rate, and the reliability of component PM3 decreases relatively slowly which is a warm spare having low failure rate.
In addition, we see that, the reliabilities of components PMs among which CCFs occur become smaller, compared to the reliabilities of PMs under the case of no CCF at the same time. This is due to the fact that, CCFs increase the risk of failure of all PMs, thereby decreasing their reliabilities. Therefore, the reliabilities of PM subsystem and whole system are also affected by the CCFs, and decrease (see Fig. 7(b)).
Moreover, since there is no CCF occurring in the SM subsystem, the variation in reliability of component SM under the two cases are the same, regardless of whether CCFs occur in the system or not. The same conclusion can be obtained when considering the DS subsystem.
For instance, Tables 6 and 7 show the reliabilities of components, subsystems and system with warm standby redundancy at time 8000 t  hours under different cases, respectively. From these tables, we see that, in either case, the warm standby component PM3 has the highest reliability due to the lowest failure rate in inactive state, and the PM subsystem is the most reliable subsystem. In addition, the reliabilities of components PMs among which CCFs occur become smaller, compared to the reliabilities of PMs in the case of system without CCF. For example, the CCFs drop the reliability of component PM3 from 0.8636083 to 0.8430424 , with a decrease of 2.38% which is larger than the decrease ratio of component PM1 (2) reliability (1.86% ). The reasons are twofold; (i) the risk of failure becomes higher due to CCFs; (ii) the warm standby component has a dormant failure rate, may fail while in inactive state, and the failure rate will increase when becoming active. In addition, the CCFs drops the system reliability from 0.8248021 to 0.7997196 , with a decrease of 3.04%. As mentioned earlier, since the CCFs are assumed to occur only in the PM subsystem, the reliabilities of SM and DS subsystems are not affected by the CCFs.

Reliability for Hot Standby Assumption
On the other hand, the reliabilities of components, subsystems and system with hot standby  Table 7. Reliabilities of subsystems and system at time = 8000 hours (warm standby) Table 8. Reliabilities of components at time = 8000 hours (hot standby) Table 9. Reliabilities of subsystems and system at time = 8000 hours (hot standby) redundancy at time 8000 t  hours under different cases are shown in Tables 8 and 9. In the hot standby redundancy, the standby component is also called active redundant component which has the same failure rate as active component. Also, in the real-time computing system, the components in the same subsystem are considered to be identical, thus, for each subsystem, we only consider one component in the hot standby case. In comparison with the reliability of PM subsystem with warm standby in Table 7, the reliability of PM subsystem with hot standby at Table 10. Failure rates of components at time = 8000 hours (warm standby) Table 11. Failure rates of components at time = 8000 hours (hot standby)

Effect of CCFs on Failure Rates
Furthermore, to investigate how CCFs affect the reliabilities of components, we evaluate the time-dependent failure rates of components shown in Tables 10 and 11. Concretely, Table 10 presents the failure rates of components in the system with warm standby redundancy at time 8000 t  hours, and Table 11 gives the results for hot standby case. From Table 10, we find that the failure rate of component PM3 in the case of system without CCF at 8000 t  hours increases due to the characteristic of warm standby, whereas the failure rates of other components remain the same, since the lifetime of each one of them follows an exponential distribution with a constant failure rate, in another words, these components' failure rates are time-independent. The failure rates of components PMs in the system with CCFs are higher than those in the case that there is no CCF in the system. This helps to confirm that the CCFs can increase the risk of failure of relevant components thereby increasing their failure rates. For example, the CCFs bring the failure rate of component PM3 from 2.281302e-05 to 2.530558e-05 , with an increase of 9.85%. Compared to the failure rate of active component PM in the system with warm standby redundancy, we see that, the CCFs have a larger effect on the failure rate of active component PM in the hot standby case (see Table 11). Obviously, the failure rates of components SM and DS remain the same regardless of whether the CCF occurs.

Importance Measures
This section considers the three types of importance measures (i.e., Birnbaum Importance (RIB), criticality importance (RICR and RICF), upgrading function (RIU)) of all components in the system with warm standby configuration under cases I (no CCF) and II (CCFs occur). In particular, these importance measures of components in the system with hot standby configuration are also taken into account. Based on numerical results, we compare the importance of components in the warm and hot standby configurations. Note that, RIB, RICR, and RIU are defined by the reliability functions of components and system, whereas RICF is another criticality measure defined by the unreliability functions that quantifies the probability of a component being responsible for system failure before a given time instant, that is, ( ) gives the probability of component being responsible for system failure before hours.

Importance Measures for Warm Standby Assumption
Tables 12 and 13 separately illustrate the importance measures of components and subsystems in the system with warm standby at time 8000 t  hours in Case I. For Case II, the importance measures are given in Tables 15 and 16. As seen in Table 12,      Table 13. From this table, we find that, there is consistency in the ranking of components for RIB, RICF, and RIU. From the viewpoint of RIB, RICF, and RIU, we can say that, SM subsystem is the most critical one, because component SM is the most important component which has the highest failure rate. In contrast to the SM subsystem, PM subsystem is the least critical subsystem, due to the highest reliability benefiting from the pair-and-a-spare fault-tolerant scheme, for example, the probability of PM subsystem being responsible for system failure before 8000 hours is only 6.48% , which is much smaller than the probability of SM subsystem given by 48.64% . However, from the viewpoint of RICR, it is found that PM subsystem as well as DS subsystem, is as important as SM subsystem. This is explained by the fact that, the three subsystems are connected in the series configuration (see Fig. 2), namely, the system failure occurs when any subsystem fails.  Tables 15 and 16 show the importance measures of components and subsystems in the system with warm standby at time 8000 t  hours for the case that CCFs occur in the system. Compared to the importance measures of components in the case of system without CCF shown in Table 12, the importance of components PMs among which CCFs occur become larger according to all importance measures, for example, the probability of component PM3 being responsible for system failure before 8000 hours is increased from 2.38% (under Case I) to 4.86% (under Case II), with an increase of 50.99% due to CCFs. However, the importance of components SM and DS regarding to RIB and RICF decrease in the CCFs case. For instance, the CCFs drop the probability of component SM being responsible for system failure from 48.64% to 41.25% , with a decrease of 15.18% . In addition, the importance of components SM and DS regarding to RICR and RIU do not change.
From Table 16, we find that the importance of PM subsystem with respect to RIB remains the same in Case II as that in Case I. This is due to two reasons; (i) all subsystems are connected in a series configuration, thus the RIB of PM subsystem is defined by the product of reliabilities of SM and DS subsystems, according to the definition of Birnbaum importance; (ii) the reliabilities of SM and DS subsystems cannot affected by the CCFs. However, the importance of PM subsystem regarding to RICF increases largely, that is, the probability of PM subsystem being responsible for system failure before 8000 hours is increased from 6.48% (under Case I) to 18.19% (under Case II), with an increase of 64.38% due to CCFs. Moreover, the importance of SM and DS subsystems regarding to RIB and RICF decrease in the CCFs case. Furthermore, the importance ranking of components and subsystems remains the same as that in the case of system without CCF shown in Table 14.

Importance Measures for Hot Standby Assumption
We next consider the importance measures for system with hot standby redundancy. In such case, we also evaluate the effect of CCFs on component importance. The importance measures of components and subsystems at time 8000 t  hours in the case of system without CCF are respectively illustrated in Tables 17 and 18. For the case of system with CCFs, we give the results in Tables 20 and 21. As seen in Table 17, component SM is also the most important component in the system with hot standby redundancy, similar to that in the warm standby case. In contrast to the values in Table 12, we see that the importance of components SM and DS regarding to RIB and RICR become smaller in the hot standby case, whereas the importance of active component PM with respect to each measure is higher than that in the warm standby case. From Table 18, the importance of SM and DS subsystems regarding to RIB and RICF also decrease slightly in the system with hot standby, compared to the importance measures in Table 13. However, the Birnbaum importance of PM subsystem with hot standby is almost the same as that of PM subsystem with warm standby. In addition, the probability of PM subsystem being responsible for system failure before 8000 hours is increased from 6.48% (under warm standby case) to 7.82% (under hot standby case), with an increase of 17.09% due to the hot standby configuration. This indicates that, the PM subsystem becomes more critical in the system with hot standby redundancy. Besides, the importance ranking of components and subsystems are given in Table 19.
Moreover, for the case that CCFs occur in the system, the importance measures of components and subsystems at time   Table 17. For example, the probability of component PM being responsible for system failure before 8000 hours is increased from 7.82% (under Case I) to 11.02% (under Case II), with an increase of 29.10% due to CCFs. In addition, the CCFs bring the probability of PM subsystem being responsible for system failure from 7.82% (under Case I) to 20.98% (under Case II), with an increase of 62.75% .

Effects of -Factor
In general, it is significant to investigate the change in the importance ranking of components resulting from the change of the value of . Thus we consider the case that = 15%. The importance measures of components and subsystems with warm standby redundancy at time 8000 t  hours are illustrated in Tables 22 and 23, respectively. Also, Table 24 presents the importance ranking of components and subsystems. Table 17. Importance measures of components at time = 8000 hours (Case I, hot standby) Table 18. Importance measures of subsystems at time = 8000 hours (Case I, hot standby) Table 19. Importance ranking of components and subsystems at time = 8000 hours (Case I, hot standby) International Journal of Mathematical, Engineering andManagement Sciences Vol. 3, No. 2, 64-89, 2018 ISSN: 2455-7749 87 Table 20. Importance measures of components at time = 8000 hours (Case II, hot standby) Table 21. Importance measures of subsystems at time = 8000 hours (Case II, hot standby) Table 22. Importance measures of components at time = 8000 hours ( = 15%) Table 23. Importance measures of subsystems at time = 8000 hours ( = 15%) Table 24. Importance ranking of components and subsystems (β=15%) International Journal of Mathematical, Engineering andManagement Sciences Vol. 3, No. 2, 64-89, 2018 ISSN: 2455-7749 88 As has already been discussed in Section 6.2.1, the importance ranking of components and subsystems in the case that = 5% are given in Table 14. Obviously, the importance ranking of components and subsystems are changed in the case that = 15%, compared to the importance ranking shown in Table 14. For example, in the case that = 15%, the component PM3 becomes more important than PM1 (2) according to RICR. Additionally, the PM subsystem becomes the most critical subsystem according to RIB and RICF. This is due to the fact that, the PM subsystem is more prone to failures than others caused by the CCFs with a big factor. It is thereby concluding that, the CCFs affect not only the reliabilities of components and subsystems, but also the importance ranking of components and subsystems.

Conclusions
In this paper, a Markov-based component-wise sensitivity analysis method is applied to evaluate the importance measures of components and subsystems thereby ranking them in order of importance according to distinct measures of a real-time computing system in a warm standby configuration. The relative importance ranking of components and subsystems helps suggest the most efficient way to optimize the system reliability by upgrading the weak components, and to diagnose system failure by generating a repair checklist for an operator to follow. In this system, component SM is the reliability bottleneck, thus the efforts in the improvement of failure rate of component SM is more efficient to enhance the system reliability. PM subsystem is the most reliable subsystem due to the pair-and-a-spare fault-tolerant scheme. Our numerical experiments show that the effect of CCFs on decreasing the reliability of component PM and system is larger in the hot standby case, in contrast to that in the warm standby case. In addition, the CCFs affect not only the reliabilities of components and subsystems, but also the importance ranking of components and subsystems.
It is worth noting that there are still some challenges to be addressed in our future work. That is; (i) Investigating the importance of Common Root Cause Event (CRCE) in the real-time computing system where CCFs occur to find efficient defense strategies against CCFs will be considered; (ii) the optimization policies for maximizing the system reliability improvements based on the obtained importance and relative ranking of components will be studied in our future work.