Reliability measures of a computer system with priority to PM over the H / W repair activities subject to MOT and MRT

Article history: Received September 18, 2014 Accepted 15 December 2014 Available online December 16 2014 This paper concentrates on the evaluation of reliability measures of a computer system of twoidentical units having independent failure of h/w and s/w components. Initially one unit is operative and the other is kept as spare in cold standby. There is a single server visiting the system immediately whenever needed. The server conducts preventive maintenance of the unit after a maximum operation time. If server is unable to repair the h/w components in maximum repair time, then components in the unit are replaced immediately by new one. However, only replacement of the s/w components has been made at their failure. The priority is given to the preventive maintenance over repair activities of the h/w. The time to failure of the components follows negative exponential distribution whereas the distribution of preventive maintenance, repair and replacement time are taken as arbitrary. The expressions for some important reliability measures of system effectiveness have been derived using semi-Markov process and regenerative point technique. The graphical behavior of the results has also been shown for a particular case. Growing Science Ltd. All rights reserved. 5 © 201


Introduction
Over the past few decades, the demand of reliable h/w and s/w components has increased manifolds due to their applications in every sphere of life, particularly in industrial management.Therefore, importance of reliable computer systems has been desired for the successful operation and to protect the integrity of stored information.The failure of computer system causes organizations several hours or days of downtime.Therefore, a major challenge to the industrialists is to provide a high reliability computer system for the customers.For this purpose, they are exploring new techniques for the improvement of reliability of their products.In spite of these efforts, a little work has been carried out for the reliability modeling of computer systems.In addition, most of the research work carried out so far in the subject of s/w and h/w reliability has been limited to the consideration of either h/w subsystem alone or s/w subsystem alone.However, there are many complex systems in which h/w and s/w components work together to provide computer functionality.Friedman et al. (1992) and Welke et al. (1995) tried to develop a combined reliability model for the whole system in which hardware and software components work together.Lai et al. (2002) proposed a model for availability analysis of distributed hardware/software systems.Recently, many researchers such as Malik and Anand (2010), Kumar and Malik (2011), Malik et al. (2009) and Malik (2013) studied reliability models of a computer system with different repair policies.Barak and Barak (2013) discussed a reliability model of a cloud under the concept of maximum operation and repair times.
Further, it is a common knowledge that the continued operation and ageing of operating systems gradually reduce their performance, reliability and safety.Moreover, a breakdown of such systems is costly, dangerous and may create confusion in our society.It is, therefore, of great importance to maintain the reliability up to a certain level of such systems with high reliability.It is also proved that preventive maintenance can slow the deterioration process of a repairable system and restore the system in a younger age or state.Thus, the method of preventive maintenance can be used to improve the reliability and profit of system.The concept of preventive maintenance has been used by Malik and Nandal (2010) while analyzing a redundant system with maximum operation time.Kumar et al. (2012) and Malik and Kumar (2012) proposed a reliability model for computer system introducing the concept of preventive maintenance of the unit after a maximum operation time and repair time.Further, the reliability of a system can be enhanced by making replacement of the components by new one in case repair time is too long i.e., if it extends to a pre-specific time.Singh and Agrafiotis (1995) analyzed stochastically a two-unit cold standby system subject to maximum operation and repair time.Kumar and Malik (2012) developed a reliability model for a computer system with priority to s/w replacement over h/w replacement under the assumption of maximum operation time.Sureria et al. (2012) established a reliability model of a computer system with priority to s/w replacement over h/w repair under the assumptions of independent h/w and s/w failures.Anand and Malik (2012) suggested a reliability model of a computer system with arbitrary distributions for h/w and s/w replacement time.
Keeping in view of the above facts and to fill up the gap, a stochastic model for computer system of two-identical units having independent failure of h/w and s/w components has been designed.Initially one unit is operative and other is kept as spare in cold standby.There is a single server who visits the system immediately whenever needed.The server conducts PM of the unit after a maximum operation time.If the server is unable to repair the h/w components in the unit in maximum repair time then components are replaced immediately by new one.However, only replacement of the s/w components has been made at their failure.The priority is given to the preventive maintenance over repair activities of the h/w.The time to failure of the components follows negative exponential distribution whereas the distribution of preventive maintenance, repair and replacement time are taken as arbitrary.The expressions for some important reliability measures of system effectiveness such as mean time to system failure (MTSF), availability, busy period of the server due to PM, busy period of the server due to h/w repair, busy period of the server due to h/w replacement, busy period of the server due to s/w replacement, expected number of h/w replacements, expected number of s/w replacements, expected number of visits of the server and profit function are obtained using semi-Markov and regenerative point technique.The graphical behavior of MTSF, availability and profit function has also been observed for a particular case.

E
The

Transition Probabilities and Mean Sojourn Times
Simple probabilistic considerations yield the following expressions for the non-zero elements where A= 1 and B= aλ1+bλ2+α0+β0.
Taking LST of Eq. ( 6) and solving for 0 ( ) The reliability of the system model can be obtained by taking Laplace inverse transform of ( 7).The mean time to system failure (MTSF) is given by

B t and ( )
HRp i B t be the probabilities that the server is busy in Preventive maintenance of the system, repairing the unit due to hardware failure, replacement of the software and hardware components at an instant 't' given that the system entered state i at t = 0.The recursive relations for ) (t

B t and ( )
HRp i B t are as follows: where j is any successive regenerative state to which the regenerative state i can transit through n transitions.Wi(t) be the probability that the server is busy in state Si due to preventive maintenance, hardware and software failure up to time t without making any transition to any other regenerative state or returning to the same via one or more non-regenerative states and so Taking LT of above relations (12) and solving for ) (t

Cost-Benefit Analysis
The profit incurred to the system model in steady state can be obtained as , the graphs for mean time to system failure (MTSF), availability and profit are drawn with respect to preventive maintenance rate (α) for fixed values of other parameters as shown respectively in Figs. 2 to 4. These figures indicate that MTSF, Availability and profit increase with the increase of PM rate (α) and repair rate (θ) of the hardware components.But the value of these measures decrease with the increase of maximum operation time (0).However, if we increase maximum constant rate of repair time (0), then the value of MTSF increases while availability and profit follow a decline trend.It is also observed that availability and profit decrease by interchanging the values of a and b i.e. a=.3 and b=.7.Hence, it is suggested that a computer system of two identical-units having independent failure of h/w and s/w components can be made more profitable (i) By reducing the maximum repair time of the h/w components.(ii) Making replacement of the hardware components by new one in case repair time is too long.(iii)By controlling the failure rate of the software.

Fig. 2 .
Fig. 2. MTSF Vs.Preventive Maintenance Rate 19) K0 = Revenue per unit up-time of the system K1 = Cost per unit time for which server is busy due preventive maintenance K2 = Cost per unit time for which server is busy due to hardware failure K3 = Cost per unit replacement of the failed software component K4 = Cost per unit replacement of the failed hardware component K5 = Cost per unit replacement of the failed hardware K6 =.Cost per unit replacement of the failed software K7 = Cost per unit visit by the server