Optimizing preventive replacement schedule in standby systems with time consuming task transfers

doi:10.1016/j.ress.2020.107227

Reliability Engineering & System Safety

Volume 205, January 2021, 107227

https://doi.org/10.1016/j.ress.2020.107227 Get rights and content

Highlights

•
Warm standby systems with predetermined preventive replacements are considered.
•
The replacement process includes task transfer during which two elements must operate.
•
The replacement time depends on amount of work completed before the replacement.
•
The mission succeeds if all its parts are completed and no operating elements fail.
•
The optimal preventive replacement scheduling problem is solved.

Abstract

In many industrial and technological systems, due to factors such as deterioration, corrosion, attacks, etc., preventive replacement is typically performed according to a predetermined schedule to renew the worn or aged online element using a standby element, enhancing the mission success probability. This paper models such a standby system with its mission time being divided into a certain number of mission parts (MPs). During different MPs, according to a pre-specified sequence, different system elements are activated to perform the mission operation. Upon completing each MP, following the preventive element replacement, a time-consuming transfer procedure must be conducted to start the new MP. The mission is successful when the last MP can be accomplished by an available system element. An event transition-based method is proposed to evaluate the mission success probability (MSP). The optimal preventive replacement scheduling problem is then solved, which finds the number of MPs and their durations maximizing the MSP. In the case of heterogeneous system elements, a combined optimization problem that finds the optimal element activation sequence and the optimal preventive replacement schedule to maximize the MSP is formulated and solved. Effects of several model parameters (the number of MPs, the number of system elements, task transfer time) are investigated through examples.

Introduction

In many industrial and technological systems (e.g., power, aviation, healthcare, railway), it is a common practice to perform preventive maintenance to remove or reduce accumulated deterioration of system elements, thus improving the overall system reliability [1], [2], [3]. The preventive maintenance can be triggered based on a certain pre-determined schedule or by the presence of some system condition [4,5]. In this work, we focus on the scheduled preventive maintenance for systems with standby elements.

There exists a rich body of works on modeling and optimizing the preventive maintenance (PM) policy. For example, in [6] a reliability-based periodic PM model was proposed for systems with deteriorating elements. In [7], the optimal PM scheduling problem was solved for a composite power system balancing maintenance, reliability, and failure costs. In [8] the optimal PM policy was determined for a k-out-of-n system considering the threshold number of malfunctioned elements to avoid the entire system failure. In [9], the optimal periodic PM policy was investigated for a single-element system working under time-varying operating conditions modeled by a continuous-time Markov process. In [10] the optimal PM policy was investigated for a parallel system subject to common-cause failures. In [11], a hybrid age-based and condition-based PM policy was proposed for systems operating in random environments undergoing Poisson-distributed external shocks. In [12], periodic PM and warranty policies were co-optimized for repairable products. In [13], the optimum preventive replacement interval was studied for a parallel redundant system with the damage self-healing mechanism. In [14], three different age-based PM models (replace first, replace last, and replace next) were studied and compared.

Recently some researchers have investigated the preventive maintenance planning for standby systems, where one or multiple elements are initially online and operating with extra elements staying in the standby mode; in the case of an online element failing, an available standby element is activated to take over the mission task [15], [16], [17]. For example, in [18] a degradation level-based PM model was suggested for a two-unit standby cooling system. In [19], the optimal PM interval was studied for a two-unit priority standby system to maximize the expected profit value per unit time. In [20], a backup-based PM policy was modeled and optimized for a standby system undergoing periodic backups (checkpointing), preventive replacements, and corrective replacements. In [21], the model of [20] was extended by allowing different overheads (cost and time) incurred by corrective and preventive replacement actions. In [22], a standby system undergoing random inspections and state-based PM was modeled and optimized. In [23], the periodic inspection and PM policies were investigated to maintain degrading standby elements in acceptable conditions, maximizing the overall system reliability. In [24], a shock-based preventive replacement policy was studied for a heterogeneous standby system to maximize the mission availability over a finite time horizon.

Despite the rich literature on modeling and optimizing PM policies, to the best of our knowledge, none of the existing works have explicitly addressed the task transfer incurred by the preventive replacement. The task transfer often takes a significant amount of time and may fail due to the malfunction of involved system elements, affecting the system reliability greatly. For example, some cargo should be transported over a certain distance. The transportation units (i.e. ships) deteriorate and can fail during the transportation, which causes the loss of cargo (ship sinking). To reduce the failure probability the route is divided into several parts and different units should cover different parts of the route (relay race). To change the units, the time-consuming failure prone cargo reloading procedure should be performed. Several units can wait for the cargo in each reloading point (port) in the standby mode. When one of the standby units appears unavailable, the next one can take over the transportation task. The mission succeeds when the cargo is delivered to the destination point.

As another example, a product should undergo some technological process in reactors operating in corrosive environment. Each reactor can deteriorate during the process. The failure of reactor causes loss of the product. To reduce the time during which each reactor is exposed to the corrosive environment, the process is divided into several stages performed in different reactors. Between the stages the product should be reloaded from previous reactor to the next one. The amount of product can change during the process. Thus, the reloading time depends on the elapsed process time.

The time-consuming task transfers also take place in distributed systems performing computational tasks. Each processor performing the task attracts the attention of attackers and can be corrupted with probability increasing with operation time. The intensity of attacks increases with time since the attack beginning as more attackers get information about the operating processor's location. The idle processors can also be attacked and corrupted, but with lower probabilities. To increase the chance of task completion the software migrates among processors. The processor completing its part of the task should transfer the software and/or produced data to the next one. The amount of data to be transferred depends on the number of computation operations performed from the mission beginning. If during processing or data transfer any operating processor is corrupted, the computational task fails.

In this paper, we explicitly model the time-consuming task transfer procedure in an event transition-based reliability analysis of standby systems undergoing preventive replacements per a predetermined mission time schedule. Based on the reliability (mission success probability) evaluation, we make a further contribution by solving the optimal preventive replacement scheduling problem to maximize the mission success probability (MSP). In the case of heterogeneous standby elements, the element activation sequence matters. Therefore, we also solve a joint optimization problem that finds the optimal replacement schedule and element activation sequence, maximizing MSP.

The remainder of the paper is organized as follows. Section 2 presents the standby system model and assumptions made by the proposed solution method. Section 3 presents the MSP evaluation method for the considered system. Section 4 describes the optimal element replacement scheduling problem and the optimization method. Section 5 presents examples to illustrate the proposed method and optimization. Section 6 concludes the paper and points out a few directions for future research.

Section snippets

System model

The system consists of K heterogeneous elements characterized by increasing failure rates. It has to perform a mission that requires time T of operation. To reduce the mission failure probability, preventive element replacements are implemented in predetermined time instances. Specifically, the mission time is divided into J parts where different elements perform the mission operation in a predetermined sequence. The duration of the jth mission part (MP) is τ_j such that $\sum_{j = 1}^{J} τ_{j} = T$ . When a

Evaluation of MSP

An event transition-based method is suggested in this section to evaluate the successful completion probability of a multi-phase mission with each phase/MP performed by a different element and time-consuming task transfers in-between MPs.

Optimal replacement scheduling

To obtain the optimal element replacement schedule, which includes the number of MPs and their durations, we apply the genetic algorithm (GA) heuristic [29,30]. The GA requires representing solutions in the form of strings. Given the number J of MPs, the optimal replacement scheduling problem becomes the optimal MP durations problem and the solution can be encoded by a string consisting of J integer numbers (x₁,…,x_J) ranging from 0 to H each. The duration τ_j is determined as $τ_{j} = T x_{j} / \sum_{j = 1}^{J} x_{j}$ . It

Illustrative example

Consider a tank that should contain a pressurized aggressive liquid during time T = 100. The liquid can cause corrosion of the tank, leading to penetration of its shell and serious damage to environment. To reduce the risk of the damage, the liquid is periodically transferred to other tanks. During the transfer the pressure is reduced, which causes milder condition for the tank. The empty tanks waiting in the standby mode can also be corroded by the ambient factors (air humidity, pollution

Conclusion and future directions

This paper evaluates and optimizes the success probability of a mission composed of multiple MPs performed by different system elements. At the end of each MP, a preventive element replacement is performed followed by a time-consuming task transfer to the newly activated system element before starting the next MP. An event transition-based method has been suggested to evaluate the MSP. Applying the GA heuristics, the optimal preventive replacement scheduling problem has been solved to determine

Author statement

The paper has been revised according to the reviewers’ comments.

Declaration of Competing Interest

There is no conflict of interests associated with this paper.

References (31)

E. Zio et al.
Evaluating maintenance policies by quantitative modeling and analysis
Reliab Eng Syst Saf
(2013)
J. Lin et al.
Reliability analysis for preventive maintenance based on classical and Bayesian semi-parametric degradation approaches using locomotive wheel-sets as a case study
Reliab Eng Syst Saf
(2015)
Z.L. Lin et al.
Non-periodic preventive maintenance with reliability thresholds for complex repairable systems
Reliab Eng Syst Saf
(2015)
N.C. Caballé et al.
A condition-based maintenance of a dependent degradation-threshold-shock model in a system with multiple degradation processes
Reliab Eng Syst Saf
(2015)
M. Doostparast et al.
A reliability-based approach to optimize preventive maintenance scheduling for coherent systems
Reliab Eng Syst Saf
(2014)
J. Hu et al.
Periodic preventive maintenance planning for systems working under a Markovian operating condition
Comput Ind Eng
(2020)
M. Nourelfath et al.
Integrating production, inventory and maintenance planning for a parallel system with dependent components
Reliab Eng Syst Saf
(2012)
L. Yang et al.
Hybrid preventive maintenance of competing failures under random environment
Reliab Eng Syst Saf
(2018)
Y.-.S. Huang et al.
Cost analysis of two-dimensional warranty for products with periodic preventive maintenance
Reliab Eng Syst Saf
(2015)
W. Dong et al.
Reliability modeling and optimal random preventive maintenance policy for parallel systems with damage self-healing
Comput Ind Eng
(2020)

M. Hamidi et al.

New one cycle criteria for optimizing preventive replacement policies

Reliab Eng Syst Saf

(2016)

G. Levitin et al.

Optimal structure of series system with 1-out-of-n warm standby subsystems performing operation and rescue functions

Reliab Eng Syst Saf

(2019)

D. Yang et al.

Reliability and availability analysis of standby systems with working vacations and retrial of failed components

Reliab Eng Syst Saf

(2019)

X. Ma et al.

Reliability analysis and condition-based maintenance optimization for a warm standby cooling system

Reliab Eng Syst Saf

(2020)

R. Hirata et al.

Study on preventive maintenance for priority standby redundant system

IFAC-PapersOnLine

(2019)

Cited by (24)

Optimal structure of multiple resource supply systems with storages
2023, Reliability Engineering and System Safety
Production systems with storages have recently attracted considerable attention from the reliability research community. The existing models assume a single type of resource consumed for product generation. This work extends the state of the art by considering a more general and practical model, where multiple resource supply subsystems (RSS) provide different kinds of necessary resources during the production cycle. Each RSS has a storage with limited capacity to save surplus resource, which may be supplied to the production unit when the RSS fails. The product unit may operate with different productivity levels, and RSS and storages can be chosen from multiple types. We formulate a new optimization problem, which chooses the productivity level of the production unit and the types of RSS and storage for each kind of resource required for the production to maximize the expected manufacturer's profit. A new numerical algorithm is proposed to evaluate the expected profit and the genetic algorithm is implemented to solve the proposed optimization problem. A detailed case study of a chemical reactor system with four RSSs supplying two reagents, catalyzer, and cooling water is provided to demonstrate the proposed model and influences of several model parameters on the optimization solutions.
Minimum downtime operation and maintenance scheduling for resource-constrained system
2023, Reliability Engineering and System Safety
Traditional reliability models typically assume that a system can operate until it is failed or not needed due to the mission completion. In practice, the system may have limited resource (e.g., battery power, fuel) and thus have certain maximum operation time due to the resource depletion. This paper makes contributions by modeling and optimizing the expected mission downtime (EMD) for a resource-constrained system that requires replenishing the resource according to a pre-specified operation and maintenance schedule (OMS) for accomplishing a mission of certain duration. Different types of maintenance options characterized by different maintenance durations, efficiencies in reducing the system's cumulative exposure time, and resource replenishing amounts are considered. Based on a new numerical method suggested for the EMD evaluation, the optimal OMS problem is formulated and solved minimizing the EMD. A detailed case study on an unmanned aerial vehicle performing a surveillance mission is conducted to investigate the influence of several system parameters on the EMD and optimization solutions, including the maximum amount of resource, system reliability, resource consumption during system activation and operation, resource replenishment speed, as well as maintenance duration, efficiency, and failure probability.
Availability analysis of shared bikes using abnormal trip data
2023, Reliability Engineering and System Safety
The users’ cancelling rental data in the bike-sharing system (BSS) is usually regarded as abnormal trip data and is ignored. Abnormal trip data may have implicit information about the availability of shared bikes. So this paper presents an approach based on functional principal components analysis (FPCA) and clustering to advance the shared-bike availability analysis and maintenance strategy optimization using the abnormal trip data. In the proposed approach, the ratio of the cancelling rental number to the total rental number is scored as an index. Their values reflect a smooth variation in availability. The FPCA method is performed to explore the long-term availability variation modes of shared bikes. Then the dominant modes of availability variations are determined using the k-means algorithm. The effectiveness of the proposed approach is illustrated on the real-world trip data of a BSS. The analysis result indicates that the long-term availability level of the referred BSS has decreased from the initial 0.907 to 0.861. In the definite availability variation modes, the availability of one of the variation modes even has decreased to 0.709. Finally, the preventive maintenance model is presented to prevent the deterioration or availability decrease of shared bikes based on the mean functions of availability variation modes.
Loading policy minimizing cumulative unsupplied demand of production system with storage
2022, Reliability Engineering and System Safety
The loading level applied during a system's operation can greatly affect the system's productivity and time-to-failure distribution. The optimal loading problem aims to determine loading levels leading to the best system performance. While this problem has been solved for many types of technical systems, very little work was devoted to production systems (PSs) with storage and the existing model assumed the storage is fully reliable and failed to consider effects of the storage's loading levels. This paper makes contributions by solving the optimal loading problem for an imperfect PS having an unreliable storage with possibility to choose different load levels that determine the storage's uploading and downloading paces and load levels that determine the productivity of the PS. Moreover, the downloading load level is chosen dynamically depending on the beginning time of the storage downloading. We evaluate and minimize the expected cumulative unsupplied demand (EUD) during the mission, and perform a detailed case study of a water supply system to demonstrate the proposed model and optimal loading policy solutions. Influences of several model parameters (mission time, PS's reliability, storage's reliability, capacity and initial filling) and their interactions on the EUD and optimization solutions are also examined through examples.
Minimizing mission cost for production system with unreliable storage
2022, Reliability Engineering and System Safety
Considerable research efforts have recently been devoted to the reliability analysis and optimization of production systems with product storage. The existing models, however, failed to address a crucial design factor of cost. This paper makes contributions by modeling the expected mission cost (EMC) of an imperfect production system with storage subject to failures during uploading and downloading processes. To perform a successful mission, the system must meet a certain demand during a specified mission time. The EMC modeled encompasses operation and standby cost of the production unit, downloading and uploading cost of the storage, cost of losses associated with the failure of the production unit and storage, as well as penalty cost associated with the mission failure. The optimal EMC problem is formulated and solved, which determines the scheduled storage uploading time to minimize the EMC. A detailed case study of a coal feeding system is conducted to demonstrate the proposed probabilistic evaluation method and influences of different model parameters on the mission success probability and EMC as well as on the optimization solutions. The advantage of using the production storage is also illustrated through a comparison to a dual-production unit system without storage.
Optimizing the maximum filling level of perfect storage in system with imperfect production unit
2022, Reliability Engineering and System Safety
Citation Excerpt :
When the PU is activated after its PM, the APS is Cmax-ωD. To consider effects of PM and repair on the PU's failure probability, the concept of equivalent age associated with the cumulative exposure model (CEM) is used [26,27]. According to CEM, the cumulative failure probability is a function of cumulative exposure time (CET) where the time of an element during the non-operation mode is multiplied by a deceleration factor.
Reliability of production systems with storage has recently attracted lots of research attentions. While the existing works have assumed certain maximum capacity of the storage, no models are available to examine the effects of the storage's maximum filling level C_max on the mission success probability (MSP). This paper contributes by modeling two-sided effects of C_max on the MSP of an imperfect production system subject to repairs and preventive maintenance (PM) during the specified mission time. In particular, a larger value of C_max enables the storage to supply the system demand during a longer time when the production system is under repair or PM (enhancing the MSP); on the other hand, it leads to longer operation periods and consequently more frequent failures of the production system (reducing the MSP). To balance these conflicting effects, we formulate and solve the optimal storage filling problem, which determines the optimal value of C_max to maximize the MSP. The optimization solution encompasses a new probabilistic model-based numerical algorithm proposed for the MSP evaluation. A case study of a water pump system is performed to demonstrate the influences of several system parameters and their interactions on the MSP and optimized C_max.

View all citing articles on Scopus

View full text

Optimizing preventive replacement schedule in standby systems with time consuming task transfers

Highlights

Abstract

Introduction

Section snippets

System model

Evaluation of MSP

Optimal replacement scheduling

Illustrative example

Conclusion and future directions

Author statement

Declaration of Competing Interest

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Comput Ind Eng

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Comput Ind Eng

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf

IFAC-PapersOnLine