Research Online Research Online Optimization of maintenances following proof tests for the final element of Optimization of maintenances following proof tests for the final element of a safety-instrumented system a safety-instrumented system

Abstract 2019 The Authors Safety-instrumented systems (SISs) have been widely installed to prevent accidental events and mitigate their consequences. Mechanical final elements of SISs often become vulnerable with time due to degradations, but the particulars in SIS operations and assessment impede the adaption of state-of-art research results on maintenances into this domain. This paper models the degradation of SIS final element as a stochastic process. Based on the observed information during a proof test, it is essential to determine an optimal maintenance strategy by choosing a preventive maintenance (PM) or corrective maintenance (CM), as well deciding what degree of mitigation of degradation is enough in case of a PM. When the reasonable initiation situation of a PM and the optimal maintenance degree are identified, lifetime cost of the final element can be minimized while keeping satisfying the integrity level requirement for the SIS. A numerical example is introduced to illustrate how the presenting methods are used to examine the effects of maintenance strategies on cost and the average probability of failure on demands (PFDavg) of a SIS. Intervals of the upcoming tests thus can be updated to provide maintenance crews with more clues on cost-effective tests without weakening safety. Safety-instrumented systems (SISs) have been widely installed to prevent accidental events and mitigate their consequences. Mechanical final elements of SISs often become vulnerable with time due to degradations, but the particulars in SIS operations and assessment impede the adaption of state-of-art research results on maintenances into this domain. This paper models the degradation of SIS final element as a stochastic process. Based on the observed information during a proof test, it is essential to determine an optimal maintenance strategy by choosing a preventive maintenance (PM) or corrective maintenance (CM), as well deciding what degree of mitigation of degradation is enough in case of a PM. When the reasonable initiation situation of a PM and the optimal maintenance degree are identified, lifetime cost of the final element can be minimized while keeping satisfying the integrity level requirement for the SIS. A numerical example is introduced to illustrate how the presenting methods are used to examine the effects of maintenance strategies on cost and the average probability of failure on demands (PFD avg ) of a SIS. Intervals of the upcoming tests thus can be updated to provide maintenance crews with more clues on cost-effective tests without weakening safety.


HIGHLIGHTS
 Developing a specific algorithm for calculating average probability of failure on demand for the final element of a safety-instrumented system in degradation.
 Improving the decision-makings on initiating and completing preventive maintenances with utilizing the collected degradation information in proof tests of SISs.
 Optimizing the intervals of incoming proof tests of SISs to save maintenance cost while keeping safety integrity.

Introduction
Considering production safety and environment protection, many safetyinstrumented systems (SISs) have been employed in different industries. For example, on an offshore oil and gas production platform, emergency shutdown (ESD) systems are installed to protect the facility in case of an undesired event. Normally, a SIS, like the ESD system, consists of sensor(s) (e.g. pressure transmitters), logic solver(s) and final element(s) (shutdown valves) [1]. The final element performs one or more safetyinstrumented functions (SIFs), by closing itself down to stop the gas flow in a pipeline if an emergency occurs in production. The facility protected by the ESD system is called equipment under control (EUC) in this context.
An ESD system is a typical SIS operating in a low demand mode, where the activation frequency is less than once per year in general. The final elements of such a SIS are mainly in a dormant state unless there is a proof test or a real shock on the equipment being protected by the SIS, or equipment under control (EUC) [1]. Therefore, some failure modes of final elements will stay hidden until the time to be activated. These hidden failures are called dangerous undetected (DU) if they can result in serious accidents. The average probability of failures on demands (PFD avg ) is a common-used measure in the evaluation of unavailability of SISs in the low demand mode [2], and DU failures are the main contributors for PFD avg . In IEC standards, the value of PFD avg will be used to determine the safety integrity level (SIL) of a SIS.
Many researches have paid attention to the calculation of PFD avg , using: simplified formulas [1,3], Markov methods [4][5][6][7] and Petri Nets [8][9][10]. Common for most of these methods is the assumption of constant failure rates of all elements in a SIS. In practices, such an assumption is always valid for electronic components, but its validity for mechanical components is in question.
Mechanical components, such as many final elements of SISs, incl uding shutdown valves, are operated in harsh conditions, and they are rather vulnerable to creeping or other degradation processes [11]. Thus, their failure rates, namely the conditional probability of failure in the next short time period, always increase with time. Several authors have assessed unavailability of SISs in consideration of non-constant failure rate [11,12]. Meanwhile, several dynamic reliability method, e.g. multiphase Markov process, have been applied to SISs for reliability assessment [5,[13][14][15][16]. Their findings show that PFD avg is changing with time and becomes different from one proof test interval to the next. The changing PFD avg makes the updating of proof test interval necessary based on the requirement from SILs.
With the development of sensor technologies, more data about operation conditions and system degradation status can be collected in periodic proof tests. Information about degradation is helpful for the assessment of system performance [17]. Numerous parameters, such as lubricant ingredient, corrosion extent and so on, can be measured and utilized for failure prediction and diagnosis [18]. When any deviation from the normal, or early-phase signal of failure is identified, the upcoming tests and following maintenance actions need to be re-scheduled.
In terms of the final elements of an ESD, they can suffer several failure mechanisms, including erosion, corrosion, cracks etc., which can lead the capacity of performing safety functions to degrade with time [19]. For example, closing time on demand is an indicator of the performance of a shutdown valve. Once degradation of the valve reaches a certain level, the final element will be in a faulty/failed state. Such a DU failure will be hidden until a proof test identifies that closure of the valve needs too much time.
However, even though the shutdown valve is qualified in a proof test, the final element may be not as-good-as-new. Namely, the closing time is under the acceptable maximum value, but it is still longer than that when the SIS is just put into operation. As-good-as-new after each proof test is the extension of the constant failure rate assumption, meaning that PFD avg remains a fixed value in each test interval [20]. Since, the unavoidable gradual degradation of mechanical components challenges the constant failure rate assumption, the unavailability of final element should be supposed to increase by time.
In the simple calculation of PFD avg , more frequent proof tests are regarded to lower risks, but some practical issues can weaken such a conclusion. If a proof test of SISs fully stops the process, or complete a whole trip of shutdown, stoppage and restart of the process will cause production loss, especially in offshore engineering and facilities [1]. In addition, such a whole shutdown trip may damage the valve (e.g. wear of the valve seat area) in some degree due to high stress level [11,21]. Hence, it is reasonable to consider how to utilize given proof test information to schedule future tests more effectively (e.g. to avoid unnecessary tests), while keeping the SIS availability meeting in the required level.
With the observation in a proof test of a shutdown valve, three options of follow-ups are possible: (1) No action if the valve in test is working well; (2) preventive maintenance (PM) if a certain degradation has been identified; (3) repair or replacement of the valve if it is failed. Repair/replacement can be regarded as perfect, leading the SIS to work as-good-as-new. For a PM, degradation of the valve can be mitigated but not be eliminated, so that the probability of failure by the next test is reduced. The mitigation degree can be naturally assumed positively correlated with the resources and time spent in the PM, namely the cost of PM. However, it is challenging to decide what is the optimal degree of PM that can balance the cost and the SIS availability. In addition, questions exist in the level of degradation initiating a PM. In other words, when closing time of a valve is a bit longer than the design value, a decision needs to be made whether the degradation can be ignorable, or some actions should be taken immediately. Ignoring means to take more risks to EUC, but actions are costly especially when they are not needed.
It should be noticed that even though many studies on maintenance optimization with degradation have been conducted, they are not naturally suitable for SIS final elements. As aforementioned, failures and degradations of SISs are hidden and only can be observed periodically. Decision-making on maintenances is not based on instantaneous availability but should be based on the estimation of system performance in the next test interval. In addition, to comply with international standards, the effects of maintenances should be connected with the average unavailability of a SIS in a period (PFD avg ) and should always be a strict constraint when making any testing and maintenance strategies. Considering those maintenance models for renewal systems having some similarities with SISs, they assume perfect PM or CM [22][23][24][25][26] and focus on the average long-run cost rate [27][28][29]. However, for SISs, the total cost in the designed service time (e.g. 20 years) is more of interest, and perfect PMs are often not practical or necessary.
Therefore, the main objective of this paper is to deal with both the challenges by degradation to SIS assessment and the challenges by SISs to maintenance optimization, to identify the optimal PM strategies of a SIS. Specifically, the optimal combination of the two threshold values of a SIS final element is in search: the degree of degradation initiating a PM (ω a ), and the degree of degradation where completing of this PM (ω b ) can be acceptable.
The remainder of this paper is organized as follows: Section 2 explains how a SIS final element operates and what are the assumptions in the analysis; Section 3 investigates the calculation of instantaneous unavailability of SIS, PFD avg and expected cumulative maintenance cost; Section 4 discusses the optimal values of two thresholds PMs based on the minimum expected cost and the SIL requirement respectively; Section 5 illustrates a method to update the test interval and conclusions are in Section 6.

System states and performance requirements
Without losing generality, we use an ESD system to study behaviors and operations of SISs. The ESD system is designed to maintain or achieve the EUC in a safe state, e.g. a normal pressure in process. One of main SIFs of an ESD valve is to cut off the flow when the high pressure occurs. To keep the risk of EUC within acceptable level, the valve is designed with a specific closing time, for example, 12 seconds. The actual performance requirement for this valve is, normally, the designed target value with acceptable deviations, e.g. 3 seconds. It means that the valve is considered to be functioning (with respect to this particular function) as long as the closing time is within the interval (9, 15) seconds.
If the valve closes too slowly, e.g. 18 seconds, it, as a safety barrier, will not meet the performance requirements for risk mitigating of EUC. A failure occurs on this valve since the required function is terminated. The corresponding failure mode is called "closing too slowly", which is one of dangerous failure modes of ESD valve [1]. Degradation like corrosion or erosion due to the harsh environment is the reason of such a failure. Meanwhile, even the closing time is still within the acceptable interval, the criticality of the failure will obviously increase with the deviation from the target value (12 seconds) [20]. In most cases, it is not possible to observe such kind of failure without activating the valve, and so the failure mode "closing too slowly" is a DU failure. Therefore, closing time checked in proof tests can be collected and reflect the valve status/degradation [30].
It is obvious that when the closing time is beyond 15 seconds, the valve is in a failed state. When the closing time is shorter than a certain value, e.g.14 seconds, we can regard the valve in a good condition. While if the closing time is between 14 and 15 seconds, we can consider the valve with a degraded performance but still functioning. Therefore, we can consider the valve with three different states: working, degraded and failed, as shown in Table 1. It should be noted that degradation still can exist in state 0, but it can be accepted without any maintenance action. State description 0 Working The system is functioning as specified 1 Degraded The system has a degraded performance but functioning 2 Failed The system has a fault Because maintenance or replacement after each proof test is often expensive, no action is welcomed when the estimation based on the observed situation has shown that failure probability of the SIS by the next test is rather low. Specifically, when the valve is at the working state (state 0), no maintenance will be executed. When the valve in a degraded state, even it is still functioning, a PM with reasonable costs will be employed. The degradation is mitigated but is not eliminated considering a perfect maintenance is too costly. When the valve in a failed state, replacement is needed.

System operation and test
Possible causes of "closing too slowly" failure mode may be because of the loss of stiffness of a spring [1,31,32]. According to [33,34], such kind of degradation could be described by stochastic process. Gamma process has been justified by practical applications for modeling degradations [35,36] due to its strongly monotone increasing property [37][38][39].
The final element of such a SIS is assumed to be subject to a homogeneous gamma degradation process, and a hidden failure occurs when the degradation level exceeds a predefined threshold L. The SIS is periodically tested at τ, 2τ, …, where τ is the test time interval, e.g. one year. In a proof test, degradation level is checked. As shown in Figure  1, at 4τ, the degradation level is found beyond the failure threshold, L, then the failed system is replaced by a new one. When the degradation level is found beyond ω a L in a proof test, PM is needed. For example, at 6τ or 8τ in Figure 1, PM is executed and the degradation level goes back to a specific level (ω b L) rather than 0. Consider a one-unit system that is subject to a continuous aging degradation process. The degradation process is modelled by a Gamma process with the initial state X 0 =0. Then, the degradation X(t) follows a gamma probability density function (PDF).
Then, the mean and variance of X(t) are αt/β and αt/β 2 , respectively. Periodic proof tests are executed. Proof tests are assumed perfect in this study and have no direct influence on the degradation process. In addition, we assume that the time spent in repair and test is negligible compared with the much longer test intervals.

Maintenance modeling of a final element
The SIS is periodically tested with an interval τ and with cost C PT . During each proof test, if the observed the degradation level X(t) of the final element is less than the predefined ω a L, no action is carried out and total cost is only C PT . If the degradation level is higher than ω a L but less than L, a PM is performed with cost C PM and C PM > C PT . However, if the system is found failed, it will be replaced by a new one with C CM , where C CM > C PM . In addition, the cost (C D ) related with risks of EUC needs to be considered in the downtime of SIS, C D is calculated by the product of demand rate λ de and the possible loss in an EUC accident.
The long-run cost rate could be calculated with the renewal theorem [29].
where C(t) is the cumulated maintenance cost by time t, and S 1 is the length of the first renewal cycle.
The designed service time of most SISs is not infinite, and thus the steady-state assumption may not be accepted. We estimate the cost rate over a SIS lifetime as where N i (t), N CM (t), N PM (t) and T d (t) are, respectively, number of proof tests, number of CMs, PMs and the expected downtime in [0, t].
It is not hard to understand that the ( ) is a function of maintenance parameters, including the degradation level L, PM coefficient (ω a , ω b ) and test interval τ.
Here, minimization of cost over the designed life (e.g. 20τ) is the criterion of selecting a suitable maintenance strategy.

Unavailability calculation
We start from estimation availability (A(t)) of the maintained final element at time t, namely the conditional probability that the component is working at time t given X 0 = x, with x[0, ω a L]. A(t) is the probability that the system performs its required function at time t, when the degradation level is less than the predefined failure threshold L.
In the case t ≤ τ, there is no maintenance action on [0, t). So, From the second interval, the prior test result acts as the condition to estimate the instantaneous availability. For i ≥ 2, we have the conditional knowledge given the degradation level µ at time τ, for τ < t ≤ 2τ: Similarly, we can get A(x,t), for (i-1)τ < t ≤ iτ as, The valve will fail to function when the degradation level reaches or overpasses a predefined critical threshold L. PFD avg , the widely measure of a low demand SIS, is not the long-term approximation here, but the average proportion of time where the system is not able to perform the required safety function within one test interval [1]. PFD avg in the first test interval is While PFD avg in the second interval (τ, 2τ) with known degradation level µ at time τ can be calculated as Similarly, PFD avg in the i-th interval can be calculated using Eq. (8).
Each SIF should comply with the specific SIL. IEC 61508 [2] specifies four SILs, with SIL4 most strict in terms of safety. SILs and their associated values of PFD avg are shown in Table 2. 10 -3~1 0 -2 SIL1 10 -2~1 0 -1 To estimate degradation of the SIS element in each test interval, Monte Carlo simulation is implemented here by generating random events to obtain the probability distributions for the variables of the problem. A number of papers can be found using Monte Carlo methods in the domains of reliability, availability, maintainability and safety (RAMS) [40][41][42][43].
The main idea here is to randomly generate M degradation paths to simulate M possible components and use the average value in each test interval to estimate the performance.

Optimization criteria
As mentioned in Eq. (4), the cost is a function of several parameters, including failure threshold, L, test interval, τ, PM coefficient factors (ω a , ω b ). It is difficult to obtain exact values of cost parameters [44], especially those related with production loss of shutdown process and the potential effects of hazardous event due to the failure of a SIS. Therefore, cost ratios, instead of absolute costs, are used here in optimization. Taking C PT as the unit cost, C D , C CM , C PM , can be expressed as k 1 C PT , k 2 C PT , and k 3 C PT respectively, where k 1 > k 2 > k 3 ≥ 1.
For a SIS, the optimal (ω a , ω b ) should find a trade-off between the minimum lifetime cost and the required SIL. For an ESD valve as an example, its required SIL is SIL3 (see Table 2), meaning that PFD avg should be in the range of (10 -4 ,10 -3 ).

Numerical example
To illustrate the proposed method for optimizing maintenance strategy, a numerical example is employed with the degradation and operation parameters listed in Table 3.

Instantaneous Availability
The degradation level X(t), availability A(t) and PFD avg of such an element can be plotted based on Eq. (1), Eq. (6)-(8) and Eq. (9)-(11) respectively, as depicted in Figure  2. At the starting point, X 0 = 0, and A (0) = 1. With time elapsing, the degradation level X(t) is accumulating, meanwhile, A(t) is decreasing and PFD avg is increasing. Given the periodic proof tests, the system status will be updated after each proof test. A(t) curve has a certain periodicity but A(t) reduces faster due to the accumulation of degradation. PFD avg curve indicates that even the valve is functioning at each proof test, PFD avg is increasing with time. It implies that the final element is becoming more fragile compared to that at the beginning. Given that the accumulated degradation level, X(t), exceeds PM threshold, ω a L, at 8τ, a PM is applied. After that, the degradation level is set back to ω b L, the correspondingly instantaneous availability is improved. In other words, the SIS goes back to a situation performing its SIF well. But due to the existing degradation, PFD avg is still higher than that in the first test interval. At 12τ, the degradation level X(t) goes beyond failure threshold L, and then replacement is executed. The system availability, A(t), is improved while PFD avg decreases as low as the first test interval. Another similar process is the execution of a PM at 18τ.

Scenarios with different maintenance strategies
With the parameters given in Table 3, the expected cumulative costs in 20τ under three scenarios are compared: (1). Scenario 1: The valve is only be repaired as-good-as-new once the failure has occurred, ω a = 1, ω b = 0.
(3). Scenario 3: The initial state is X 0 = 0, under the proposed maintenance strategy with ω a = 0.8, ω b = 0.1. Two maintenance strategies are considered: One is reflected by Scenario 1, without PM; the other is reflected by Scenarios 2 and 3, with PMs. For the latter two, they are indicating different initial degradations occurred in manufacturing or installation. More specially, Scenario 3 means higher manufacturing and installation quality.
With the parameters in Table 3, the cost curves of these 3 scenarios are obtained as shown in Figure 3. It can be found that maintenance costs of the three scenarios are almost same until around 10 . By this time, PM or CM is seldom carried out. Then the cost of Scenario 1 increases significantly mainly due to the potential downtime cost. For Scenarios 2 and 3, their cost curves are very similar, with that of Scenario 2 a bit higher. By comparing the cumulative costs of Scenario 1 and Scenarios 2&3 in the total 20 test intervals, it can be found that PMs reduce the total lifetime cost dramatically, but the cost difference between Scenario 2 and Scenario 3 is quite small.  Figure 4. At beginning, PFD avg increases with time (app. by 10τ) because of the continuous degrading process. For Scenario 1, PFD avg still increases after 10τ without PM and the SIS is within SIL1 most time, while for Scenarios 2 and 3, PFD avg is always lower, no worse than SIL2. Obviously, PMs improve SIS availability effectively, especially after the half of designed service time.
In practices, due to materials or mis-operation in the manufacturing or installation process, zero degradation is too ideal for a valve even it is new. In comparison of Scenarios 2 and 3, initial degradation is only found a slight negative effect on performance during the overall cycle. When rescheduling proof tests, it is not necessary to prioritize the considering of initial degradation.

Effect of PM strategies on lifetime costs
With the parameters in Table 3, the expected maintenance cost of the final element is calculated based on Eq. (4). The expected lifetime cost is a function of (ω a ,ω b ) with different (k 2 , k 3 ) as shown in Figure 5. Figure 5 Mesh plot the expected total maintenance cost on (k 2 , k 3 ) The CM cost is fixed as k 2 = 10, and Figure 5 illustrates the impact of k 3 on the lifetime cost, i.e., the expensiveness of PMs. In general, when k 3 is larger, a PM is more costly, and the lifetime cost in 20 test intervals increases as well.
In Figure 5(a), k 3 = 1 means that PM cost is very low, same as the test cost. Given a fixed ω a , the total lifetime cost slightly increases with respect to ω b . Even the higher ω b can lead to more PMs, but due to the quite low PM cost in each time, the expected lifetime cost almost keeps unchanged under the same ω a . However, given a fixed ω b , the expected lifetime cost increases significantly with ω a . When ω a closes to 1, it means that the PM threshold ω a L is near the failure threshold L, namely PMs are being avoided. CM cost is thus dominant for the increasement of lifetime cost.
In Figure 5(b), compared to CM cost, PM cost is still quite low, so the overall tendency of lifetime cost is similar to that shown in Figure 5(a). Within this assumed range of k 3 and (ω a ,ω b ), it can be obtained that the optimal value of (ω a ,ω b ) is (0.70,0).
In Figure 5(c) and Figure 5(d), PMs are more expensive. The lifetime cost increases with respect to ω b , while decreases firstly and then increases with respect to ω a . There is a trade-off between PM cost and the potential downtime cost. Because a smaller ω a increases the PM expenses, but it results in a higher failure possibility that can increase CM and downtime costs. This phenomenon becomes more obvious in Figure 5(d) when PM cost is equivalent to 80% CM cost.
The findings can help the decision-making of maintenance crew of SISs. If PM costs are much lower than those led by a SIS failure, it is reasonable to take more PMs to keep the system safe. Otherwise, if PM costs are close to CM costs, many PMs are not essential.
However, we have an assumption so far that PM cost is same no matter what the value of ω b is. In practices, when a system is aging, the PM cost often increases as well.
The PM factor ω b should link with system installation time and actual healthy status.
Meanwhile  When L = 1.45×10 -3 , the lifetime cost has minor increase from ω a = 0.7 to ω a = 0.9. This is because such a threshold is so high that the chance of a failure event is very low. When the value of L is lower, e.g. 1.05×10 -3 , the lifetime cost differences between the solutions of ω a = 0.7 and ω a = 0.9 is more apparent. For lower failure threshold with higher value of ω a , the degradation level can exceed the failure threshold with higher possibility.
Given a fixed ω a , the lifetime cost decreases with a higher threshold L, because a smaller threshold L will increase downtime.
The failure threshold L can be affected by manufacturing process and risk acceptance criteria. In manufacturing, high-quality material could lead to higher degradationtolerant threshold. In operations, when it is acceptable to tolerate more risks to the EUC, the failure threshold also could be set higher.
In determining the optimal value of ω a , failure threshold should also be considered. When the failure threshold is quite high, from the perspective of maintenance cost, ω a could be set a higher value as of the low failure probability.

Effects of PM strategies on PFDavg
Here we study how PM strategies with different (ω a , ω b ) influence PFD avg . The PFD avg of such a SIS can be obtained using simulation based on Eq. (9)- (11). PFD avg in each test interval is illustrated in Figure 7. It is obvious that the PFD avg has a strong correlation with parameters, (ω a , ω b ). The effect of ω a on PFD avg in Figure 7(a) is analyzed with setting with ω b = 0.1. At early stage, for example, t is around t = 8τ, PFD avg increases over time but still remains within SIL3. After 8τ, PFD avg falls into SIL2 for ω a = 0.9. PFD avg starts to keep stable in each interval and just fluctuates in a small range (same SIL). These curves show that the value of PFD avg in each test interval decreases with ω a . With the lower ω a , the earlier PM will be taken. After a PM, the degradation is mitigated so that the probability of failure is reduced.
The effect of parameter ω b on PFD avg in Figure 7(b) is evaluated with ω a = 0.75. Compared to ω a , parameter ω b has slight impact on system PFD avg .
The combined effect of (ω a , ω b ) on system PFD avg in several intervals are then depicted in Figure 8. The overall tendency of PFD avg in each test interval is almost same. Meanwhile, PFD avg in each test interval is limited mainly in SIL3 and SIL2. Give a fixed ω b , PFD avg increases with ω a . However, given a fixed ω a , PFD avg keeps almost the same value for different ω b .
The values of failure threshold L are set [1.05,1.15,1.25,1.35,1.45] ×10 -3 , respectively, to observe the effect of threshold on PFD avg . The mesh plot is shown in Figure 9. Given a same threshold L, PFD avg is going down with lower ω a . This finding can be regarded as a guideline for maintenance management. For the same SIS, the earlier the PM is executed, the more liable the system is. Without considering the PM cost, ω a should be as small as possible.
Meanwhile, for a fixed ω a , PFD avg is going up with lower threshold L. For threshold L= 1.45×10 -3 , ω a = 0.8 is enough for the system to be limited within SIL3, whereas, ω a = 0.7 should be taken for L= 1.05×10 -3 .

Updating test intervals with the information from tests
For low demand SISs, it might not be always worthwhile running proof tests periodically, especially if the shutdown and restart of process is costly. In this case, the date of the next proof test can be determined based on degradation state observed in the current test. Interval to the next test can be longer if the SIS element is very healthy, and the interval should be shorter as the element deteriorates. When the degradation level is closing to PM threshold, more tests are expected.
Having considered degradation and diverse maintenance strategies, it is interesting to introduce non-periodic proof tests. According to the study of [45], to keep system safety, 3 years is roughly set as the maximum length of a proof test interval.
In consideration of degradations, PM parameters are set as ω a = 0.75 and ω b = 0.05. The general expected test interval length is generated by Monte Carlo simulation.
The main steps of simulation algorithm for the expected test intervals are shown here.


Step 1: Set X t = 0 and N = 1. If N ≤ N max the process goes to steps.


Step 2: Generate n degradation paths. Then the arrival time of the first reach failure threshold L can be obtained.


Step 3: Get the 5-th percentile value as potential arrival time τ 1 . Compare τ 1 and 3 years. If τ 1 < 3 years, then take τ 1 as the new test interval of the system; if τ 1 ≥ 3 years, then 3 years are used as the new test interval.


Step 4: Use the mean value and variance of Gamma process in Section 2.2 to estimate the increment X 0~τ1 between (0, τ 1 ). At the same time, safety margin is also considered. The 97.5-th percentile (ρ = 0.975) is used as the potential increment in (0, τ 1 ).


Step 5: Compare the potential degradation level at time τ 1 , X τ1 , with PM threshold or CM threshold to decide whether a maintenance strategy is required here. The X τ1 after comparison is the new starting point.


Step 6: Repeat Step 2~ Step 5 and set N = N +1. The time to failure threshold L from X t = 0 is verified to follow normal distribution. Different increment percentiles are investigated as the result shown in Table 4. The updated general lengths of each test interval are listed in Table 4. We can see that with different percentile values, test interval length becomes different from the third updated test. With ρ = 0.975, a PM is executed after the second interval and the degradation is mitigated. When ρ is set as 0.90 or 0.825, the third test interval is shorter with the length of 0.5τ and 1.2τ, respectively.
It is worth mentioning that the degradation parameters (α, β) affect the degradation rate directly. The simulation results in Table 4 are based on assumed (α, β) in Table 3. It only acts as a reference method for updating test intervals.
If the exact degradation level μ can be observed in each proof test. When updating the test lengths, the main constraint is the required SIL. Considering the degradation process, the first interval τ 1 can be calculated based on Eq. (12) with the given limit values of PFD avg .
For calculating the second interval, the degradation level µ 1 at τ 1 is taken into consideration.
Using Eq. (13), the value of τ 2 can also be updated. By following the similar solution process for the latter intervals, the flexible test interval can be calculated and updated.

Conclusion
A stochastic process-based availability analysis for the final element of a SIS is carried out, and three states of the element are considered. This forms the basis for determining the maintenance strategies following proof tests. The algorithms of instantaneous availability of the SIS element and expected lifetime cost in the SIS operation are developed. PFD avg of the SIS element is calculated based on the homogeneous gamma process.
The findings in the case studies have shown that PM strategies, i.e. the optimal values of (ω a , ω b ), and the expensiveness of PMs to CMs, are influential factors of the lifetime cost and SIL of a SIS.
PFD avg of the SIS is affected by the PM threshold ω a significantly, especially after half of the service lifetime, but not too much affected by ω b . Effects of ω a on PFD avg are becoming more obvious with lower threshold L. When the failure threshold L is quite high, the value of ω a has slight effects on PFD avg given the low possibility of failure.
Based on the above findings, suggestions on updating test intervals are given. Maintenance crews can be beneficiary of these suggestions, by saving maintenance costs through reducing frequency of proof tests.
For further studies, it would be interesting to consider the availability and maintenance cost on k-out-of-n architectures.