Economic Statistical Design of Integrated X-bar-S Control Chart with Preventive Maintenance and General Failure Distribution

The application of Preventive Maintenance (PM) and Statistical Process Control (SPC) are important practices to achieve high product quality, small frequency of failures, and cost reduction in a production process. However there are some points that have not been explored in depth about its joint application. First, most SPC is performed with the X-bar control chart which does not fully consider the variability of the production process. Second, many studies of design of control charts consider just the economic aspect while statistical restrictions must be considered to achieve charts with low probabilities of false detection of failures. Third, the effect of PM on processes with different failure probability distributions has not been studied. Hence, this paper covers these points, presenting the Economic Statistical Design (ESD) of joint X-bar-S control charts with a cost model that integrates PM with general failure distribution. Experiments showed statistically significant reductions in costs when PM is performed on processes with high failure rates and reductions in the sampling frequency of units for testing under SPC.


Introduction
Control charts are tools of Statistical Process Control (SPC) that monitor the state of a production process, identifying when the quality attributes of a product change. The concept of ''control'' is related to the quality attribute that is within specified limits (control limits) to ensure production stability and quality of products. If the attribute (i.e., weight, length, dimensions, etc.) is not within these limits, then the process is in an ''out-of-control'' state. In such case, is necessary to find and correct the assignable cause that originated this state (failure).
A control chart is defined by three main parameters: the size of the sample (n), the sampling interval between samples (h), and the coefficient of the control limits (k). These parameters are selected based on economic and statistical restrictions because there are costs and times associated with sampling and searching of assignable causes: high sampling frequency would take more time from the process cycle time, and depending on the nature of the item, product loss. Also, close control limits would increase the frequency of failure alarms and rejection of products which not necessarily would be of low quality. The chart parameters must be selected following a methodology in order to minimize the ''cost of quality'' [1].
The Economic Design (ED) of control charts (the estimation of the parameters) considers the costs (in time and money) associated with sampling and searching/repairing of assignable causes. On the other hand, the Economic Statistical Design (ESD) additionally considers the statistical requirements, such as the probabilities of error Type I (detecting an out-of-control state when the process is fine) and II (not detecting an out-of-control state when the process is not fine) in the estimation of the parameters.
The ED of control charts was introduced in 1956 by Duncan [2] for X-bar ( X X ) charts that monitor the mean of the quality characteristic of produced items. It had the following assumptions: the failure mechanism of the process had an Exponential probability distribution, there was only one assignable cause, and the sampling interval was constant. Other works extended the ED to ESD and covered other control charts: R, S, and EWMA control charts were proposed to monitor variability [3][4][5][6][7]; p and np control charts were proposed to monitor proportion or number of nonconforming units within samples [8].
Variability is an important factor to control in a process because raw material, operators skills, machine calibration, etc., increase variability without affecting the process mean [9]. To keep control in both the mean and variability of a process the X X {R control chart has been used, although the R chart loses reliability when nw10 [10]. In this case the X X {S or X X {S 2 control charts are more suitable. Collani, Sheil [9], and Yang [7] proposed the ESD of S charts, considering the importance of the error Type I and II for minimization of costs. The ED and ESD of X X {R and X X {S control charts was proposed by Davis and Saniga [6,11,12], pointing out the importance of controlling the mean and variance of the process. However in these cases, it was assumed that the sampling intervals were constant and that the process failed with an Exponential distribution.
An extension of these works was presented by Chiu [13] who considered the importance of Preventive Maintenance (PM) in the ED of X X {S 2 control charts to reduce long-term variability and failures that are only evident when the process reaches an out-ofcontrol state. In [14] the ED of an X X control chart combined with an age-replacement PM policy was presented. It was observed that reduction in operating costs was superior to the reduction achieved by using only the control chart or the PM policy. The relationship between SPC and PM has been recognized in other studies as in [15][16][17][18][19][20] identifying a link between equipment maintenance and product quality: ''Equipment maintenance, either corrective or preventive in nature, has a direct impact on the reliability of the equipment, and thus the performance of the equipment. Under the assumption that the equipment is used to manufacture some type of product, with improved performance of the equipment comes increased product quality'' [1].
This paper extends on the application of SPC with PM as some points were not completely covered by previous studies. First, most SPC is performed with the X X control chart which does not consider the variability of the production process [1,14,15,[20][21][22][23]. Second, many studies of design of control charts consider just the economic aspect while statistical restrictions must be considered to achieve charts with low probabilities of false detection of failures [14][15][16]19,20]. Third, the effect of PM on processes with different failure probability distributions has not been explored as most of the studies consider one distribution (i.e., Exponential [18,19,23] or Weibull [20,24]).
Hence, this paper presents the Economic Statistical Design (ESD) of joint X X {S control charts to monitor mean and variability in a production process. In addition, the cost model integrates PM with general failure distribution (cases with Exponential, Gamma, and Weibull distributions are presented) and constant and variable sampling intervals. Experiments showed that PM decreases costs for processes with high failure rates and reduces the sampling frequency of units for testing under SPC.

Background
Reliability Function. Consider that a cumulative distribution function F (t) represents the probability that a unit, randomly taken from a population, will fail at most in time t [25]. Now consider that, instead of taking one unit, n units are taken at the end of a time interval h. If it is of particular interest to get the distribution of the survival of the process, then the cumulative distribution function F(h) can be defined as the probability of the process failing (changing to an out-of-control state) at the end of the sampling interval h.
Because the reliability (or survival) function of the process, R(t)~1{F (t), represents the probability that a unit will be working beyond time t [25], the probability that a process will be working properly (in-control state) after the sampling interval h can be expressed as R(h)~1{F (h).
Hence, the following probabilities are associated with the control states of a process: F (h) out À of À control state at most in h: Detection of States: Significance Level and Power. The Significance Level a is the probability of the error Type I (false positive), which is the detection of an out-of-control state when the true state is in-control. Thus, if the null hypothesis H 0 = process is in-control state: a~Pr( Reject H 0 jH 0 is true )~false positive alarm : ð3Þ The probability of the error Type II (false negative), represented as b, consists in the null detection of the out-of-control state when the process is truly in out-of-control state. Using Eq. 3 and 4 as reference: b~Pr( Do not Reject H 0 jH 0 is false)~false negative alarm : ð5Þ 1-b is also known as the Statistical Power of the control chart, which represents the ability of the chart to detect the out-of-control state when the process is indeed in such state. Thus, the levels of a and b must be low and controlled when designing the control chart.
Significance Level and Power for X X {S Control Charts. Although the cost model of a process may be used with different control charts, the definitions of a and b are dependent of the control chart. For the X X control chart, b in terms of the control limits (Upper Control Limit UCL, Lower Control Limit LCL), is expressed as: where the random variable of interest is X X with N(m, s 2 =n) distribution. If m and s 2 are known, the control limits are expressed as: and, if m and s 2 are unknown, these can be estimated from the m samples of size n as:m giving the following control limits: In Eq. 9 and 10, S S is the mean standard deviation of the m samples, and C 4 is a constant that depends on the size of the sample (n). Note that m 1 in Eq. 7 is the value that represents the change in the mean of the process, which is equal to mzds, where d is the magnitude of that change. Hence, the error Type II probability of Eq. 7 can be expressed as: The error Type I for the X X chart, a X X can be expressed as: For the S control chart, the Power in terms of the control limits can be expressed as: where the random variable is S with x 2 distribution with n{1 degrees of freedom, and s 1 is the change in the standard deviation of the process (s 1 ws 0 , where s 0 is the initial value). The control limits can be expressed in terms of the known standard deviation (s), or an estimation of the same ( S S), as follows: s unknown : Commonly, k in Eq. 8, 10, 15, and 16 is restricted to 3 [26]. The relationship between the probability of the error Type II and the parameters of the S control chart then is expressed as: Similar formulations for b S have been used by Saniga [6] and Collani [9]. The error Type I for the S chart, a S , then can be expressed as: Finally, the joint error probabilities for the X X {S control chart are defined as: 1{bP r( Reject H 0 under the X X control chart jH 0 is false )| Pr( Reject H 0 under the S control chart jH 0 is false): bP r( Do not Reject H 0 under the X X control chartjH 0 is false)\ Pr( Do not Reject H 0 under the S control chartjH 0 is false): Note that for the X X {S control chart two control limits coefficients are considered: k X X for the X X control chart, and k S for the S control chart. Also, because in the X X {S control chart two variables are monitored, two changes are considered: m?m 1 , and s?s 1 . Hence, Eq. 12 is extended for the estimation of b X X in Eq. 24 as follows: By integrating Eq. 1 and 2 with 20 and 24, the probabilities associated with the possible states of a process are obtained, and these are presented in Table 1.

Base Cost Model with Constant Sampling Intervals
A production cycle is defined as the interval from the starting production time (in-control state) until the time when a change, caused by an assignable cause, occurs. This cycle includes the time required to detect and repair the assignable cause. Because a production cycle can be also defined as the time between successive in-control periods [27], the process can be considered as a serial of equally distributed independent cycles, a renewal process.
Under this assumption, the cost per cycle can be estimated as the accumulated cost from the beginning until the end of one cycle, and the average cost per unit of time can be estimated as the ratio of E(C)=E(T), where E(C) is the Expected Cost per Cycle and E(T) the Expected Cycle Length. The objective of the ESD is to minimize the costs per unit of time of a process: min Z~E(C)=E(T) [21,27,28]. The Renewal Theory Approach proposed by Rahim and Banerjee [28] was presented as an alternative to obtain the equations for E(T) and E(C) for Markovian and non-Markovian stochastic processes considering these assumptions.
A stochastic process has the Markov property if the conditional probability distribution of future states of the process depends only upon the present state, not on the sequence of events that preceded it. The Renewal Approach [28] studies the state of the system at the end of the first sampling interval. Depending upon the state of the system, the expected residual cycle length and cost can be computed. Then these values, together with the associated probabilities, define the renewal equations for E(T) and E(C).
The basic model studied by Duncan [2] had the Markov property and considered that a production cycle was integrated by the following components: (1) the in-control period; (2) the out-ofcontrol period; (3) the time required to take a sample and interpret the results; and (4) the time needed to find the assignable cause. In [28] these components led to define the following states of the system at the end of the first sampling interval: (1) in-control state and no alarm; (2) in-control state and false alarm; (3) out-of-control and no alarm; and (4) out-of-control and true alarm. Then the equations for E(T) and E(C) were obtained as the sum of the ð23Þ expected residual cycle length and cost multiplied by the probability associated with each of these states.
The expressions for E(C) and E(T) obtained with this approach in [28] were confirmed with those obtained with traditional approaches as that of Lorenzen and Vance [29] and Heikes et al. [30] for Markovian and non-Markovian models respectively. The approach also has been used to derive the equations of cost models with specific elements as that of Yang [27] which considered two assignable causes. This made the Renewal Approach suitable for the development of the cost models presented in this paper that are adaptations of the model of Rahim and Banerjee [28] which considered Exponential failure distribution and constant sampling intervals for the ED of X X control charts.
The adapted base cost model under the Renewal Theory Approach assumes the following issues about the process: 1. The process starts in a stable in-control state with mean m and variance s 2 . The event of an assignable cause changes the variance of the process from s 2 to d 2 1 s 2 , where s 2 1 w1 is the magnitude of the change and is known. 2. When a data point of the control chart is outside the control limits an alarm is generated, then the process is stopped and the search and repairing of the assignable cause starts. After the assignable cause is repaired the process returns to the in-control state, starting a new production cycle. The process is stopped also when there is a false alarm. 3. There is only one assignable cause and the process does not self-repair. 4. The time between failures has a general distribution. 5. The states of the system at the end of the first sampling interval are identified as: (1) S 00 -in-control state and no alarm; (2) S 01in-control state and false alarm; (3) S 10 -out-of-control and no alarm; and (4) S 11 -out-of-control and true alarm. The probabilities associated with each state are presented in Table 1 and the details of the expected residual cycle length and cost associated with each state are presented in the following sections.

Renewal Equations of the Expected Cycle Length E(T).
N State S 00 : the state of the process is evaluated at the end of the first sampling interval h, and depending on this the expected residual cycle length is estimated. As shown in Figure 1, in this case the process is in-control state with no alarm. Because there are no other events associated with this scenario, the expected residual cycle length is E(T).
N State S 01 : in this case there is a false out-of-control alarm which causes the process to be stopped, an action that involves loss of time and money. This scenario is shown in Figure 2, where the variable Z 0 represents the time used to search the assignable cause when there is a false alarm. After that time the process is restarted and the expected residual cycle length is equal to E(T)zZ 0 which considers the delay caused by the false alarm.
N State S 10 : in this case the process is in out-of-control state and there is no alarm (no detection). Here it is important to consider the necessary time or intervals to detect the failure. Because each sampling interval is constant with length h, the necessary time to detect the failure can be expressed in terms of the number of samples before the alarm is generated. As show in Figure 3, this number is a geometric random variable with mean 1=(1{b), which is known as the Average Run Length (ARL) [3].
N Hence, the necessary time to detect the out-of-control state is h(ARL), or h=(1{b). Observe that hARL~ATS (Average Time to Signal), the average time to produce an alarm. When the out-of-control is detected, the procedure to find the assignable cause and restore the process to an in-control state is performed. In Figure 3, Z 1 is the time associated with these tasks. When the process is restored a new cycle begins. Hence, the expected residual cycle length is equal to h=(1{b)zZ 1 .
N State S 11 : as shown in Figure 4, in this case the alarm is generated at the end of the interval where the process changed to the out-of-control state, thus there was a correct detection. In such scenario the only action that has to be performed is to find the assignable cause and restore the process, which only requires a time Z 1 . Hence, the expected residual cycle length is equal to Z 1 .
The total E(T) thus can be expressed as the sum of the expected residual cycle lengths, multiplied by their associated probabilities, of all states [27,28]: Renewal Equations of the Expected Cost. E(C). The process has the following associated costs: N Sampling: always that a sample of size n is taken, the cost azbn takes place, where a is the constant cost, and b the variable cost of the sample.
N Producing in-control and out-of-control states: the cost per hour for producing in-control state is defined as D 0 and the cost of producing in out-of-control state as D 1 (D 1 wD 0 ). N State S 00 : as presented in Figure 5, in this scenario only the costs associated with sampling and producing in-control state in the first interval D 0 h are considered. Thus, the expected cost is azbnzD 0 hzE(C), where E(C) is the expected residual cost for this state.
N State S 01 : as shown in Figure 6, in this case besides the costs described above, there is a cost associated with a false alarm (Y ), which implies losses because the process is stopped unnecessarily (for a time Z 0 ). Thus, the expected cost for this state is azbnzD 0 hzY zE(C), where Y zE(C) is the expected residual cost. N State S 10 : observe in Figure 7 that an assignable cause occurs within the first sampling interval in time t which changes the process to an out-of-control state. t is a variable that was introduced by Duncan [2] for the case of the ED of a X Xcontrol chart when the failure mechanism had an Exponential distribution (f (t)~le {lt , where l is the number of failures per unit of time). For a general f (t), t is defined as: N In Figure 7 observe that, in the interval from t~0 to t~t the process is in-control state, and that from t~t until t~h (the end of the interval h) the process is in out-of-control state.
Because of this, in the first sampling interval h there are the following costs: -Sampling cost: azbn; -Cost for producing in-control state: D 0 t; -Cost for producing in out-of-control state: D 1 (h{t).
N The evaluation of the process is performed at the end of the interval h (sampling), however in this case the out-of-control state is not detected (there is no alarm). Hence, in the following intervals the process will continue producing in out-of-control state until the detection is successful, which happens after 1=(1{b) samples (ARL). Meanwhile, during these intervals there are sampling costs (azbn) and losses for producing in out-of-control state (D 1 h). Thus, the cost of producing in outof-control state until the detection takes place is given by ARL(azbnzD 1 h).
N When detection is performed, the process is stopped and searching and repairing of the assignable cause is done with an associated cost W . Finally, the expected cost for this state is defined as:  The process is in-control state but there is a false out-of-control alarm which causes the process to be stopped. This involves a time Z 0 required to search for an assignable cause. After Z 0 the process is restarted and the expected residual cycle length is equal to E(T)zZ 0 . doi:10.1371/journal.pone.0059039.g002 N State S 11 : as presented in Figure 8, in this case the detection of the out-of-control state is performed successfully at the end of the first sampling interval h, hence prompt procedures to find the assignable cause and restore the process are implemented with a cost W . Thus, the expected cost for this scenario is azbnzD 0 tzD 1 (h{t)zW , where W is the expected residual cost.
The total E(C) thus can be expressed as the sum of the expected costs, multiplied by their associated probabilities, of all states [27,28]: ð29Þ

Base Cost Model with Variable Sampling Intervals
In the model given by Eq. 27 and Eq. 30 when all sampling intervals are constant or fixed, h j~h for all j samples. When the sampling intervals are variable, h j is different for each j sample. In [21] Rahim et al. proposed to consider a specific number of m samples (sampling intervals) in the production cycle, j~1,2,:::,m, so the production cycle could be considered as truncated [31]. A truncated production cycle starts when a new component is installed and ends with a repair or after a fixed number of m sampling intervals (at a given age w m ). The cost model derived in this section is the model of Rahim and Banerjee [21] for general failure distribution and variable sampling intervals. The deduction was important to understand the model in order to develop the integrated cost model with PM.
The model makes the following assumptions: 1. The first interval h 1 is randomly chosen.
2. The length of the next sampling intervals are chosen as h j~r h j{1 , where h j is the sampling interval for sample j, and r is a decrement factor. The sampling intervals h j are computed by applying the decrement factor to the successor sampling interval, thus h 1 wh 2 wh 3 w:::wh m , because as time continues The process is in out-of-control state and there is no alarm (no detection). The necessary time to detect the failure can be expressed in terms of the number of samples before the alarm is generated. This number is a geometric random variable with mean 1=(1{b) which is known as the Average Run Length (ARL). Thus, the necessary time to detect the out-of-control state is h(ARL) or h=(1{b). When the out-of-control is detected, the procedure to find the assignable cause and restore the process to an in-control state is performed with a time Z 1 . The expected residual cycle length is equal to h=(1{b)zZ 1 . doi:10.1371/journal.pone.0059039.g003 The process is in out-of-control state and detection is performed at the end of the interval where the process changed to this state. The only action that has to be performed is to find the assignable cause and restore the process. This only requires a time Z 1 which also represents the expected residual cycle length. doi:10.1371/journal.pone.0059039.g004 the sampling frequency must increase given the natural wear and tear of the components of the process [21]. 3. The number of sampling intervals is fixed and given (m § 2). 4. The objective is to find n, h j (j~1,2,:::,m), and k that minimize Z~E(C)=E(T). 5. There is an additional cost S(w m ) in E(C) which is associated with the salvage cost of an equipment of age w m . 6. F (w j ) is the cumulative distribution function of failure when the equipment (process) is of age w j , which is accumulated accordingly to the sampling over time. Hence, the age of a process at a given sampling interval j is given by: 7. The failure probability (out-of-control probability) for a specific interval j can be estimated as: Renewal Equations of the Expected Cycle Length E(T) N State S 00 ,S 01 : when the process is in-control state, the stop condition is given by (1) an alarm (true or false), or (2) by the age of the equipment ( = w m ). When there is no alarm at all, the stop condition is given only by w m . When the sampling intervals are variable, the probability to be in-control state cannot be generalized as 1{F (h) (Eq. 1), because each interval has an associated probability which is dependent on the age of the equipment (Eq. 32).
N In Figure 9, F (w 1 ) represents the probability of being in outof-control state at most in time w 1 , and 1{F (w 1 ) the probability of being in-control state from time w 1 . However this does not represent the probability of being in-control state in the interval h 1 . To include this interval, which starts in w 0 and ends in w 1 , the corresponding probability must be 1{F (w 0 ). Hence, for the range of intervals from w 0 to w m , the following probabilities are defined for each sampling interval h j : h 1 ?1{F (w 0 ); h 2 ?1{F (w 1 ); :::; h m ?1{F (w m{1 ).
N From these probabilities, the expected time when the process is in-control state and no alarm is generated (State S 00 ) can be expressed as:   The process is in out-of-control state and there is no alarm (no detection). The assignable cause occurs within the first sampling interval at time t which changes the process to an out-of-control state. Thus, in the interval from t~0 to t~t the process is in-control state, and from t~t until t~h the process is in out-of-control state. Because of this, in the first sampling interval h there are sampling costs (azbn), in-control production costs (D 0 t), and out-of-control production costs (D 1 (h{t)). Then, sampling and out-of-control production costs take place while there is no detection (number of intervals estimated by ARL~1=(1{b)). Finally, when detection is performed there is a cost W associated with interrupting the process, searching the assignable cause and repairing the process. Hence, the expected cost for this state is defined  The process is in out-of-control state and detection is performed at the end of the interval where the process changed to this state. In addition to sampling costs (azbn), in-control production costs (D 0 t), and out-of-control production costs (D 1 (h{t)) there is a cost W associated with interrupting the process, searching the assignable cause and repairing the process. Thus, the expected cost for this state is defined as azbnzD 0 tzD 1 (h{t)zW , where W is the expected residual cost. doi:10.1371/journal.pone.0059039.g008 N The probability of a false alarm when the process is in-control state (State S 01 ) is represented by a (Eq. 3), and is generated at the end of the first sampling interval. Because of this, it is not necessary to consider the in-control probability for this interval, and the in-control probability associated with other intervals can be expressed as: N Thus, the expected time to find an assignable cause when there is a false alarm (State S 01 ) is expressed as: N State S 10 , S 11 : in State S 10 the process is already in out-ofcontrol state (has failed), but there is no alarm. To derive the renewal equations some points must be considered: -the interval where the process changed to the out-of-control state; -the interval where the out-of-control state would be detected.
N Suppose that the process changes to the out-of-control state at some point within the interval h 2 and it is not detected at the end of the same interval. By considering Eq. 32, the out-ofcontrol probability in h 2 is given by +F(w 2 )~F(w 2 ){F(w 1 ).
When the out-of-control probability is determined, it is necessary to consider the next intervals where detection can be performed (in this case, h 3 , h 4 , h 5 ,..., h m ). Thus, in general, if the assignable cause occurs in h j , the detection can be performed in any interval h i where i~jz1,:::,m. If detection is performed in h 3 (i~3), the no-detection probability can be expressed as b g~0 , because the state was detected in the immediate following interval after the assignable cause occurred (thus there were g~0 intervals with no detection). If however, detection takes place in interval h 5 (i~5), this means that the state was not detected in intervals h 3 and h 4 (g~2), and thus there were two consecutive intervals where no detection was performed with probability b 2 . The index g of b g follows the sequence g~i{j{1, so in the case that the detection takes place until the end of the sampling intervals in h m , the probability of no detection would be b m{2{1 . In general terms, the expected time to detect the out-of-control state can be expressed as P m i~jz1 h i b i{j{1 for each interval j where an assignable cause occurs with a probability +F (w j ).
N Thus, the expected time to detect the assignable cause when the process is in out-of-control state is given by: N In State S 11 detection is successful at the end of the interval where the assignable cause occurred, thus the expected time consists of only Z 1 .
N The total E(T) thus can be expressed as the sum of Eq. 33, 35, 36, and Z 1 : Renewal Equations of the Expected Cost E(C).
N Costs of producing in-control states (S 00 ,S 01 ): Eq. 33 provided the time that the process was in-control state with no false alarm (S 00 ). Because in Eq. 30 D 0 is the cost per hour of producing in-control state, then the expected cost of producing while the process is in-control state with no false alarm can be expressed as: Figure 9. F (w 1 ) and 1{F (w 1 ) when the Sampling Interval is Variable. F (w 1 ) represents the probability of being in out-of-control state at most in time w 1 , and 1{F (w 1 ) the probability of being in-control state from time w 1 . However this does not represent the probability of being in-control state in the interval h 1 . To include this interval, which starts in w 0 and ends in w 1 , the corresponding probability must be 1{F (w 0 ). Hence, for the range of intervals from w 0 to w m , the following probabilities are defined for each sampling interval h j : h 1 ?1{F (w 0 ); h 2 ?1{F (w 1 );...; h m ?1{F (w m{1 ). doi:10.1371/journal.pone.0059039.g009 When there is a false alarm (S 01 ) the process is stopped, and the expected time to search an assignable cause is given by Eq. 35. Because now it is required to consider the associated cost, Z 0 in Eq. 35 it can be replaced by the cost Y which corresponds to a false alarm: N Costs of producing in out-of-control states (S 10 ,S 11 ): when there is a transition from the in-control to the out-ofcontrol state, the following events are considered: -The process is initially in-control state until the assignable cause occurred at some point within the sampling interval.
As in the case of t in Eq. 30, it is important to know the cost associated with the period of time in which the process was still in-control state. Because the process has a failure distribution given by f (t), the mean expected probability for the interval of time from 0 to w m is: N Thus the cost: represents the expected cost associated with the fraction of time within the interval in which the process is in-control.
-The process is in out-of-control state, and in this case, the costs depend on the age of the equipment w j at the moment of the failure. As the age increases there will be intervals h j where the out-of-control probability will be more significant.
Note that the process can change to an out-of-control state in any h j with a probability of +F (w j ). The associated cost of producing in out-of-control state can be expressed as: -As there is no detection of the out-of-control state, it is important to consider the cost associated with the intervals where no-detection is performed (the number of intervals until detection is successful). For this, Eq. 36 gives the time expected to detect the out-of-control state. Because during this time the process is in out-of-control state, the associated cost for this period can be expressed as: -Detection is successful and the repairing procedure starts. In this situation, the costs only consist of finding and repairing the assignable cause (W ).
N Sampling Costs: sampling is performed when the process is in-control state and while there is no detection (true alarm) of the out-of-control state. With this in mind, the first cost would be: which corresponds to the first sampling interval which is performed independently of the state of the interval. At the evaluation point of this interval a decision is made about continuing or not (in the case of a false alarm) with the process. For these in-control intervals the corresponding sampling costs are: N Observe that j~1,:::,m{2, because the first and last intervals are not considered. The last one is not considered because there is already a stop condition given by w m .
N Now the associated costs of samples taken when the process is in out-of-control state and there is no alarm (no detection) are considered. Rahim et al. [21] defined this cost as Q j , which is the expected number of samples taken after w j considering that the process is in out-of-control state from this time and there is no detection: N As in Eq. 45, the first and the last intervals are not considered.
Because there is no detection, it is necessary to consider the error Type II probability together with the out-of-control probability in the interval h j given by +F (w j ). Hence, for each interval there is an associated cost Q j , and the sampling cost when there is no detection is given by: N The total sampling cost is then expressed as the sum of Eq. 44, 45, and 47: N Salvage value for a machine of age w m : The model of Rahim et al. [21] considers a salvage value for the equipment used, allowing the possibility of replacement of the equipment depending on its age w m before a failure. This is only significant when the replacement produces an economic benefit. The salvage value for the equipment S(w m ) exists only when the process is in-control state within w m , and so the corresponding cost during this period is: Observe that this value represents a saving and not a cost.
The total Expected Cycle Cost E(C) is expressed as the sum of all costs described in this section which are given by Eq. 38, 39, 41, 42, 43, 48, and 49: Eq. 37 and 50 match the model presented by Rahim et al. [21], which gives confidence about the deduction of the cost equations and hence, of the understanding of the cost function model to integrate the preventive maintenance.

Integrated Cost Models with Preventive Maintenance
Preventive Maintenance (PM) has been proposed by diverse studies to increase the long-term reliability of equipment in a production process by reducing failure rates and age of the system [23,32]. Chiu [13] integrated PM in a cost function for the Economic Design (ED) of X X {S 2 control charts assuming the following: 1. The process had increasing failure rate. 2. PM is performed at the evaluation point of constant sampling intervals. If the process is in-control state in time h j , then PM is performed with an associated cost. 3. M includes costs associated with small adjustments or changes in machines or in other parts of the process (Mv repairing cost). 4. PM does not restore the process from an out-of-control state to an in-control state. 5. The process is stopped when the PM is performed.
These assumptions were similar to those presented by more recent studies which also had significant additional considerations. In [15] Ben-Daya and Rahim also considered performing PM at the evaluation point of constant sampling intervals. However Chen et al. [23] stated that performing PM at each evaluation point would increase costs. As an alternative they proposed a ''threshold'' for the quality characteristic measured during a sampling interval to decide whether or not to perform PM. Rahim et al. [32] proposed that PM activities could be performed at L integer multiples of evaluation points, considering also that production ceases during PM.
Mehrafrooz and Noorossana [18] proposed different types of maintenance: Preventive, Corrective, Compensatory, and Planned maintenance. In their work, ''true'' out-of-control signals require Preventive maintenance while ''false'' alarms require Compensatory maintenance. Corrective maintenance is performed whenever process stops due to a failure, and Planned maintenance is the one scheduled to be performed after (mz1)h in-control intervals. However, a common assumption of some works (i.e., [18,32]) is that PM is capable of restoring an equipment to a ''good-as-new'' condition, something that is not a realistic situation as discussed in [23]. Also, a single failure distribution is considered (i.e., Exponential [18,23]) and thus, the effect of PM is not fully studied.
In this paper is assumed the following: 1. PM does not restore the process to a ''good-as-new'' condition although it decreases the failure rate after each implementation [32]. Failure rate was considered to be reduced by extending the period of time between failures. For this, a constant h was defined as the possible gain in the life expectancy of the process and was integrated in the period of time between failures. It was considered to be at least of 10% of the original time between failures. 2. In terms of [18], Corrective maintenace is implicit in the activity of searching/repairing an assignable cause. PM is performed at each evaluation point while the process is detected to be in-control state (thus, Preventive~Compen-satory~Planned maintenance). 3. The process has general failure distribution and the following are considered: Exponential, Weibull, and Gamma. 4. Sampling intervals are constant and variable. 5. The process can continue or be stopped while performing PM: M 1 is the cost of PM if the process continues, and M 2 the cost of PM if the process is stopped (M 1 vM 2 ). 6. Taking as reference the cost of repairing the process from an out-of-control state, the PM cost was set to 10% and 30% of W if the process continues while performing PM or if is stopped respectively.
Thus, the study of Chiu [13] about PM is extended for the ESD of X X {S control charts with the more complex cost function model of Rahim et al. [21] for variable sampling intervals and general failure distributions. The work of Linderman [33] was also reviewed to allow, by means of a binary variable (l~0,1), the modelling of the situation of performing PM without interrupting the process. Thus, a more comprehensive insight is presented about the effect of PM on the reliability of a process. In order to keep consistency with the base models of Rahim et al. [21,28], the proposed models share the same terminology for E(T) and E(C).
Depending on the kind of process, if it is necessary to stop the process while performing PM (l~1), then a delay Z 2 is added to each sampling interval if the process is in-control state. This is common in situations when some machine parts are worn-out and need to be replaced, or too much waste is accumulated in a machine. Another scenario that requires attention, independently if the process is stopped or not, is lubrication of mechanic parts, which can be performed with the process working without any delay Z 2 , although it still implies a cost. Thus, Z 2 represents the ð50Þ Figure 10. Effect of PM on the Exponential distribution: h~2:0, Z 2~2 :5, Initial Unit = 20. l is the main parameter of the distribution and represents the known number of failures per unit of time. When PM is performed a gain h in the life expectancy can be obtained. This would increase the Initial Unit of time where failures are likely to take place. In addition, if the process is stopped during the performance of PM (l~1) the associated delay lZ 2 can be considered as another gain in the life expectancy of the process. Thus, the unit of time where failures would occur can be expressed as Unit of Time = Initial Unit + h + lZ 2 . It is observed that for l~l l~1 and l~l l~0 (PM with/without interruption of the process), the failure probabilities decreased at time t. The lowest failure probability is accomplished when the process is stopped while performing PM, and the highest when there is no PM. doi:10.1371/journal.pone.0059039.g010 Figure 11. Effect of PM on the Weibull distribution: h~5:0, Z 2~6 :0, m~3, Initial c = 40. c represents the time where the process would fail, identifying in this way the life expectancy of the process. c is known as the scale parameter and m as the form parameter. When m~1 the Weibull distribution is approximated to the exponential distribution with l~1=c. When PM is performed a gain h in the life expectancy can be obtained. This would increase the Initial c, and if the process is stopped during the performance of PM (l~1), the associated delay lZ 2 can be considered as another gain in the life expectancy of the process. Thus, the time at which the system would fail is expressed as: c = Initial c zhzlZ 2 . Although all cases achieve the same probability level by t~100, there is a marked delay when PM is performed. While in the original case with no PM the failure probability is 50% by t~35, when PM is performed without interruption the probability at t~35 is 40% (50% is reached when t~40). When there is PM with interruption, in t~35 the failure probability is 27%, reaching 50% in t~45. doi:10.1371/journal.pone.0059039.g011 expected time to perform PM. In the following sections the integration of these concepts is presented.
Constant Sampling Intervals. The modified Eq. 26 for the Expected Cycle Length E(T) is: Note that, for state S 10 , h=(1{b) is the time required to detect that the process is in out-of-control state. Thus, lZ 2 =(1{b) is defined as the time that PM was performed while the process was in out-of-control state before detection. Eq. 51 for E(T) is reduced to the following expression: In similar way, the expression for the Expected Cost E(C) of Eq. 29 with PM is derived: ð53Þ Variable Sampling Intervals. When the sampling intervals are constant, PM is performed at the end of the sampling interval Figure 12. Effect of PM on the Gamma distribution: h~2:0, Z 2~2 :5, Initial Unit = 20. r is termed as the form parameter, and l the scale parameter. For convenience in this work r~2 was used. l is related to the life expectancy of the process which is considered to be increased by h when PM is implemented. If PM is performed with interruption of the production process (l~1) then the associated time to this task Z 2 can be considered as another gain in the life expectancy. Because l = 1/(Unit of Time until Failure), then 1/l = Initial Unit of Time + h + lZ 2 . As presented, PM with interruption presents the lower probability of a failure for t~1,::,100. For example, for t~40, the failure probability is approximately of 48% and 53% for PM with l~1 and l~0 respectively. However if no PM is implemented, the failure probability is near 60%. doi:10.1371/journal.pone.0059039.g012 h as long as the process is in-control state (or the out-of-control is not detected). For such case, a new sampling interval is established that includes the associated PM: If the process is not interrupted (l~0) while performing the PM then there is no delay, hence h 0~h from the original cost model. Now, for variable sampling intervals, the same principle can be applied: Because it is considered that PM is constant for each sampling interval, the Eq. 31 for w j is adjusted as follows: Thus, Eq. 37 for E(T) with PM is modified as: Eq. 50 for E(C) is modified when adding PM given by P~(1{l)M 1 zlM 2 : where: is the cost of performing PM when the process is in-control state.
N Pb P m{2 i~1 +F (w j )Q j is the cost of performing PM when the process is in out-of-control state and there is no detection. Hence, this cost depends on the number of samples taken while there is no true alarm. Table 3. Cont.

Effect of PM on the Failure Distribution
It is expected that, as PM involves continuous adjustments and replacements of soon-to-be faulty parts, it would increase the reliability of the process in the long term. This could be reflected as a decrease in the number of failures in a given time period. Hence, this can have a direct effect on the life expectancy of the process, which is associated with the parameters of the failure distribution modelled by f (t).
In this paper three probability distributions are considered for the failure mechanism of the process: Exponential, Weibull, and Gamma, and the effect of PM on these distributions are presented in the following sections.
Exponential Distribution. For the Exponential distribution: where l is the main parameter of the distribution and represents the known number of failures per unit of time. When PM is performed it is assumed that the life expectancy of the process can be increased, changing the length of the unit of time in which failures would occur. It is considered that, by performing PM a gain h in the life expectancy is obtained. Additionally, if the process is stopped during the performance of PM (l~1) the associated delay lZ 2 can be considered as another gain in the life expectancy of the process. Thus, the unit of time where failures would occur can be expressed as: Unit of Time~Initial Unit zhzlZ 2 : Hence, the adjusted parameter l for the time between failures can be expressed as: l~k nown number of failures Initial Unit zhzlZ 2 : Because F (t) represents the probability that a unit selected randomly from a population will fail at most in time t, for the Exponential distribution F (t) is expressed as: In Figure 10, it is observed that for l~l l~1 and l~l l~0 (PM with/without interruption of the process), the failure probabilities decreased at time t. Thus a process with such patterns would be more reliable. Note that the lowest failure probability is accomplished when the process is stopped while performing PM, and the highest when there is no PM.
Weibull Distribution. For the Weibull distribution: where c represents the time when the process is likely to fail, identifying in this way the life expectancy of the process. c is known as the scale parameter and m as the form parameter. Note that when m~1 the Weibull distribution is approximated to the Exponential distribution with l~1=c. Similar to Eq. 62 with PM, the time at which the system would fail is expressed as: As in the Exponential case, in Figure 11 the behavior of the Weibull failure distribution is shown when PM is performed. Although all cases achieve the same probability level by t~100, there is a marked delay when PM is performed. While in the original case with no PM the failure probability is 50% by t~35, when PM is performed without interruption the probability at t~35 is 40% (50% is reached when t~40). When there is PM with interruption, in t~35 the failure probability is 27%, reaching 50% in t~45.
For real purposes, h depends on the type of process, and it can be estimated from experiments performed to measure the strength or resistance of the system before and after the PM. In this work Z 2~0 .75 and h~0.5c were used.
Gamma Distribution. For the Gamma distribution: where r is termed as the form parameter, and l the scale parameter. For convenience, in this paper r~2 is used, which gives the following expression for F (t): l is related to the life expectancy of the process which is considered to be increased PM by h when PM is implemented. If PM is performed with interruption of the production process (l~1) then the associated time to this task Z 2 can be considered as another gain in the life expectancy. If l = 1/(Unit of Time until Failure) then: As presented in Figure 12, the failure distribution follows the same pattern as in the Exponential and Weibull cases. PM with interruption presents the lower probability of a failure for t~1,::,100. For example, for t~40, the failure probability is approximately of 48% and 53% for PM with l~1 and l~0 respectively. However if no PM is implemented, the failure probability is near 60%.

Effect of the PM on the ESD of X X {S Control Charts
Matlab 2008b was used as the programming platform required to compute the cost models and perform the algorithm to achieve the ESD of the joint X X {S control chart. Diverse algorithms have been used to optimize the chart parameters for a given case. Among them, Genetic Algorithms (GAs) [8,24,31] and Tabu Search (TS) [32] have shown success for these tasks.
Previously a TS algorithm was developed to solve cost models for the ED and ESD of X X , X X {S charts. This algorithm improved the ratio E(C)=E(T) when compared with GAs, Hooke and Jeeves (HJ) and Combinatorial Methods (CB) as presented in [34]. Because the TS algorithm was validated with different cost models (Rahim et al. [21,28], Ruvalcaba [31], Saniga et al. [6]), it was used for the optimization of the X X {S with PM presented in this work. In Table 2 the data used for the ESD of the X X {S control chart with PM and constant/variable sampling intervals with Exponential, Weibull, and Gamma failure distributions is presented. The results, presented in Tables 3 and 4 were obtained with 20 iterations of the solving algorithm, and these are discussed in the following sections.
Constant Sampling Intervals. The results of the tests with constant sampling intervals are presented in Table 3, where BASE represents the solution of the base cost function model (Eq. 27 and 30) applied for the ESD of X X {S control charts. PM represents the integrated cost function model (Eq. 52 and 54) with l~0,1.
In Table 3 for the Exponential distribution with l = 0.0505 there are reductions (savings) in the costs (3.73% and 1.12%) when PM is implemented without interruption (l~0) or with interruption (l~1) of the process. These reductions are higher when the failure rates increase: 5.47% and 7.64% for l = 0.1010; 6.79% and 16.08% for l = 0.2525; and 6.74% and 21.93% for l = 0.5050.
For the Weibull distribution, when the failure rate is small (1=c = 0.0505) and m~2,3,4, small or no reductions are obtained when PM is performed without interruption of the process: 1.88%, 0.30%, and 20.93% respectively. Reductions are obtained when the failure rate increases to 1=c = 0.1010: 3.67%, 1.36%, and 0.91% respectively. However, if PM is performed with interruption of the process for 1=c = 0.0505 and 1=c = 0.1010, the costs are higher than the baseline (BASE) and negative reductions are obtained.
A similar pattern is observed for the Gamma distribution, where there are reductions when the failure rate is small and the process is not interrupted during PM: 2.11% for l = 0.0505 and 4.24% for l = 0.1010. Negative reductions are obtained when the process is interrupted with the same failure rates: 27.31% and 2.86% respectively. Consistent reductions are obtained when l = 0.2525 and l = 0.5050: 7.13% and 9.65% when l~0, and 8.30% and 20.14% when l~1 respectively. Note than in all cases with PM the length of the sampling interval (h) was increased. A paired Student's t-Test was performed to determine the statistical significance of the results presented in Table 3. The overall reduction obtained with l~0 was significant with a p-value of 0.000035481 v 0.05, 0.01. For l~1 the reduction was significant with a p-value of 0.003052452 v 0.05, 0.01.
A factorial analysis was performed on the data presented in Table 3 to assess the effect of PM on the overall cost reductions when considered with the other factors in the cost models. Minitab ver.15.1.30.0. was used for this purpose and in Figure 13  The first plot shows that overall costs decrease based on the failure distribution used to model the failure behavior. The cost model with Exponential distribution has the higher costs (given by the ratio E(C)=E(T)) while the Gamma has the lowest costs considering all the other factors (failure rate and PM). The second plot shows that as failure rate increases from 0.0505 to 0.5050 the cost increases considering the other factors (failure distribution and PM). The third plot shows that, considering failure distributions and failure rates, PM is responsible for decreasing costs from the base model where no PM is performed (-). The maximum reduction is achieved when PM is performed with interruption of the production process (l = 1).
In Figure 14 the Interaction Plots for E(C)=E(T) are presented and the following is observed: N Failure Distribution vs. Failure Rate. All costs increase as failure rate increases. Costs are the highest for the Exponential distribution, and the lowest for the Gamma distribution.  The same analysis was performed for the results obtained with the Weibull distribution. This was performed to assess the effect of PM when considered with the form parameter of the failure distribution. In Figures 15 and 16 the Main Effects Plots and the Interaction Plots for E(C)=E(T) are presented. As presented in Figure 15 the cost increases as the failure rate does. However there is an inverse relationship between the form parameter and the cost given by E(C)=E(T). Considering the failure rate and the form parameter, performing PM decreases the ratio E(C)=E(T).
When analyzing the interaction plots ( Figure 16) it is observed that as m increases the cost decreases for all failure rates. Performing PM has no significant effect on cost reduction for small failure rates (1=c = 0.0505, 1=c = 0.1010). Reductions are achieved for higher failure rates as 1=c = 0.2525 and 1=c = 0.5050. There is PM -There is no significant difference in cost when m varies from 3 to 4 and thus no relationship between PM and the form parameter is evident. When m~1 the most significant reduction is achieved, however this is the case where the Weibull distribution is approximated to the Exponential. When m~2 the reduction is less evident. doi:10.1371/journal.pone.0059039.g016 no noticeable difference in cost when m varies from 3 to 4 and thus no relationship between PM and the form parameter of the Weibull distribution is evident. When m~1 the highest reduction is achieved, however this is the case where the Weibull distribution is approximated to the Exponential. When m~2 the reduction is less evident.
Thus, the m parameter of the Weibull distribution has no significant effect on the performance of PM. Also, for all distributions with failure rates over 0.15 the PM generates reductions in E(C)=E(T).
Variable Sampling Intervals. The results of the tests with variable sampling intervals are presented in Table 4, where BASE represents the solution of the base cost function model (Eq. 37 and 50) applied for the ESD of X X {S control charts. PM represents the integrated cost function model (Eq. 58 and 59). As presented in Table 4, Weibull and Gamma distributions with small failure rate (0.0505) have some instances where the reductions in costs are very small or even negative when PM is implemented with l~1. For higher failure rates the reductions increase to approximately 10% of the BASE model when l~0.
In general, the results presented in Table 4 for PM with l~0 were statistically significant with a p-value of 0.0000033257 v 0.05, 0.01. For PM with l~1, the results were significant with a pvalue of 0.0096247835 v 0.05, 0.01.
In the Main Effects Plot of Figure 17 is observed that the cost model with Gamma distribution has a lower ratio E(C)=E(T) than the Weibull distribution. Also, in general terms, the ratio increases as the failure rate does. Note however that, in comparison with constant sampling intervals, when failure rate is within 0.1010 and 0.2525 the highest reduction is achieved when PM is performed without interruption of the production process (l~0). In addition, the length of the sampling intervals is increased (in this case, starting from h 1 ).
In Figure 18 the Interaction Plots for E(C)=E(T) are presented and the following is observed: N Failure Distribution vs. Failure Rate. All costs increase as failure rate increases. Costs are higher for the Weibull distribution.
N Failure Distribution vs. PM. PM decreases costs for the Weibull and Gamma distributions, being the highest for the model with Weibull distribution. For the Gamma distribution there is no observable difference in the performance of PM with or without interruption in the production process.
N Failure Rate vs. PM. In overall, if the failure rate is small (0.0505) there are no savings or cost reductions. As the failure rate increases the savings increase when PM is performed. For failure rates of 0.1010 and 0.2525 the highest reduction is obtained when PM is performed without interruption of the process. However, for the highest failure rate (0.5050) the maximum reduction is achieved when PM is performed with interruption as observed in the case of constant sampling intervals.
The TS algorithm and the estimation of parameters led to lower levels of a and b than those specified for the restrictions in the ESD Figure 17. Main Effects Plots for E(C)=E(T) and Variable Sampling Intervals -All Failure Distributions. Two main factors were considered: Failure distributions (two levels: Weibull and Gamma), Failure rate (four levels: 0.0505, 0.1010, 0.2525, and 0.5050), and PM (three levels: no implementation -, implementation with l~0, and implementation with l~1). The first plot shows that overall costs decrease based on the failure distribution used to model the failure behavior. The cost model with Weibull distribution has the highest costs (given by the ratio E(C)=E(T)) while the Gamma has the lowest costs considering all the other factors. The second plot shows that as failure rate increases from 0.0505 to 0.5050 the cost increases considering the other factors. The third plot shows that, considering failure distributions and failure rates, PM is responsible for decreasing costs from the base model where no PM is performed (-). In contrast with constant sampling intervals, when failure rate is within 0.1010 and 0.2525 the most significant reduction is achieved when PM is performed without interruption of the production process (l~0). doi:10.1371/journal.pone.0059039.g017