The properties of the geometric-Poisson exponentially weighted moving control chart with estimated parameters

The geometric-Poisson exponentially weighted moving average (EWMA) chart has been shown to be more effective than the Poisson EWMA chart in monitoring the number of defects in the production processes. In these applications, it is assumed that the process parameters are known or have been accurately estimated. However, in practice, the process parameters are rarely known and must be estimated from reference sample to construct the geometric-Poisson EWMA chart. The performance of the given chart, due to variability in the parameter estimation, might differ from known parameters’ case. This article explored the effect of estimated parameters on the conditional and marginal performance of the geometric-Poisson EWMA chart. The run length characteristics are calculated using a Markov chain approach and the effect of estimation on the performance of the given chart is shown to be significant. Recommendations about the proposer choice of sample size, smoothing constant, and dispersion parameter are made. Results of this study highlight the practical implications of estimation error, and to offer advice to practitioners when constructing/analyzing a phase-I sample. Subjects: Science; Mathematics & Statistics; Statistics & Probability; Statistics; Statistical Computing; Statistics & Computing; Statistical Theory & Methods; Statistics for Business, Finance & Economics


PUBLIC INTEREST STATEMENT
The semiconductor industry utilizes expensive production equipment and has rigorous requirements for its production environment, which results in a high production cost. The current paper is an application of statistics in industrial engineering. In these applications, it is assumed that the process parameters are known or have been accurately estimated. However, in practice, the process parameters are rarely known and must be estimated from reference sample to construct the geometric-Poisson EWMA chart. The effect of estimation on the performance of the given chart is shown to be significant. Recommendations regarding sample-size, smoothing constant, and clustering parameter are provided. Finally, a real example is used to highlight the practical implications of estimation error, and to offer advice to practioners when constructing/ analyzing a phase-I sample.

Introduction
Statistical process control (SPC) is a collection of various tools, which are used to examine a process and improve the quality of its products (see Montgomery, 2009). Among these methods, control charts are known to be more effective tools for process monitoring because they allow practitioners to draw conclusions about the state of the process (in-control or out-of-control). These conclusions about the state of process depend on whether the applied monitoring approach is a phase-I or phase-II method. In phase I, the historical data of the process is used to test the process stability and to estimate the in-control parameters. In phase II, the process is monitored in real time to quickly detect shifts from the baseline established in phase I. Some researchers like Quensberry and Geometric (1995) recommend self-starting charts that bypass phase I and begin monitoring quickly.
Most SPC research has been carried out on developing phase-II control charting methods, where it is assumed that the in-control parameters are known or can be accurately estimated. However, the parameters are rarely known with certainty in practice. Therefore, accurate estimates of the parameters are required to make the statistical performance of the control charts reliable. Also, it is helpful to provide practitioners with phase-I guidelines, such that the effect of estimation error on the performance of control charts in phase II can be better understood (see Woodall & Montgomery, 1999). Along the variable control charts, in the current decade, the performance of estimated control limits of the attribute charts is a burning issue. Jensen, Jones-Farmer, Champ, and Woodall (2006) and Szarka and Woodall (2011) provided a detailed review on the effect of estimation error on control chart performance. Yang, Xie, Kuralmani, and Tusi (2000), Tang and Cheong (2004), Zhang, Peng, Schuh, Megahed, and Woodall (2013) studied the performance of the geometric charts with estimated control limits. They concluded that a large phase-I sample size is required for low in-control proportion of non-conformity. Other related studies on the estimated attributes' control charts performance include Shu, Tsung, and Tsui (2004), Chakraborti and Human (2006), Testik, McCullough, and Borror (2006), Testik (2007), Ozsan, Testik, and Weiβ (2010), Lee, Wang, Xu, Schuh, and Woodall (2013), Chiu and Tsai (2013), Mahmoud and Maravelakis (2013), Saleh, Mahmoud, and Abdel-Salam (2013), , Saghir and Lin (2013), etc. In fact, the study of the statistical performance of control charts with estimated control limits is a general research issue of importance.
The geometric-Poisson distribution is a natural extension of the Poisson distribution and an adequate model to monitor the number of defects over time in production processes. Chen, Randolph, and Liou (2005) developed CUSUM control charts based on the geometric-Poisson compound distribution. Chen (2012) proposed an exponentially weighted moving average (EWMA) control chart for monitoring the number of defects over time. The geometric-Poisson distribution (compound distribution) was used to develop the proposed chart. The result of the study reveals that the proposed EWMA chart, namely geometric-Poisson EWMA chart, is very effective in monitoring and improving quality in production environment than the usual Poisson EWMA chart. The performance of the geometric-Poisson chart has been investigated by Chen (2012), under the assumption that the parameters of the geometric-Poisson process are known. However, in practice, the parameters are unknown and estimated from the historical data.
The current article investigates the effect of estimation error on the performance of the geometric-Poisson EWMA chart. The run length (RL) properties, such as average run length (ARL), the standard deviation of the run length (SDRL), and percentiles of the RL distribution are analyzed using the Markov chain approach following Saghir and Lin (2013) and Chen (2012). The conditional and marginal performances of the RL metrics of the given chart are evaluated. The conditional analysis allows us to understand the effect of overestimating or underestimating the parameters on the RL performance of the chart. While the marginal performance is useful in providing recommendations regarding minimum sample size, choice of smoothing constant, and dispersion parameter. Because, it considers the distribution of the estimated parameters and thus accounts for the variability introduced through parameters estimations.
The rest of the article is summarized as follows. In Section 2, the geometric-Poisson EWMA control chart with estimated parameters is given. Section 3 describes different performance evaluation measures. Section 4 evaluates the performance of the estimated control limits of EWMA chart under two different conditions. Finally, the conclusion of the study with discussion is made in Section 5.

The geometric-Poisson EWMA chart with known parameters
Let Y(t) be the random variable of the number of defective items, and X(t) be the random variable of the number of defects that occur up to t, where t > 0. According to Chen (2012), the density function of the geometric-Poisson compound distribution with parameters λ (rate) and ρ (dispersion) for any t > 0 is where λ > 0, 0 < ρ < 1. The expected value and variance of total defects in a fixed unit t = 1, are derived by Chen et al. (2005)  (1− ) 2 , respectively. Clearly, the variance of the geometric-Poisson distribution is greater than or equal to the mean. If the variance equals the mean, the geometric-Poisson reduces to the Poisson.
Assume that the sequence of X 1 , X 2 , … forms a repetitive production process of i.i.d. compound geometric random variables with probability mass function defined in Equation 1. To detect the changes from the in-control mean μ = μ 0 to an out-of-control mean μ = μ 1 , Chen (2012) proposed an EWMA control chart. The EWMA statistic is defined as with Z 0 = μ 0 and w(0 < w ≤ 1) being the smoothing constant. Since the EWMA can be viewed as a weighted average of all past and current observations, it is very sensitive to the normality assumption. It is therefore an ideal control chart to monitor individual observations and could effectively detect small and moderate changes in the manufacturing processes. For an in-control process, the mean and variance of the EWMA statistic are The control limits for the EWMA control chart based on the geometric-Poisson compound distribution are defined as: For large values of i, the asymptotic limits in Equation 4 reduced to where K l and K u are control chart constants and 0 (1− 0 ) = 0 is the target mean value of the geometric-Poisson compound process. Chen (2012) provided the values of K l = K u = K for various combinations of w, ρ 0 , λ 0 and desired in-control ARL using Markov chain approach. The value of the lower control limit LCL should be set to zero, when its computed value is less than zero. This is because the quality characteristic of interest X i is a compound random variable and therefore the EWMA statistic Z i in Equation 2 will be non-negative. The choice of the smoothing constant w typically depends on how fast a mean shift of given size should be detected. It is generally accepted that smaller values of w are more effective in rapidly detecting smaller mean shifts and vice versa.
After setting the control limits for the negative binomial EWMA chart, the EWMA statistic given in Equation 2 is plotted against each i. For an in-control process, all of the Z i s should lie inside the control limits, whereas an out-of-control process is signaled by one or more of the Z i s, which exceeds the LCL and UCL. The process will never leave the manufacturing process unless it is modified manually. In other words, the process is at the "absorption" state.

The geometric-Poisson EWMA chart when parameters are unknown
The control chart constant K for the geometric-Poisson EWMA control statistic can be calculated using the Markov chain approach, if λ 0 and ρ 0 are known. However, when λ 0 and ρ 0 are unknown, their values must be calculated prior to any calculations. The method-of-moments estimators are generally used to estimate λ 0 and ρ 0 from phase-I samples. We have the method-of-moment estimates (see Chen et al., 2005).
(X i −X) 2 ∕m − 1 are the sample mean and variance, respectively, for initial samples of size m. The sample variance S 2 must be greater than sample average X , so that the value of ρ is positive. Therefore, the average number of non-conformities based on moment estimate is ̂0 =X, and the central limit ensures that its sampling distribution is approximately normal with mean μ 0 and variance μ 0 /m. Accordingly, these estimates can be used in any of the RL calculations for the geometric-Poisson EWMA chart.
The estimated control limits for the given EWMA chart based on MM estimates are defined as: The aim of this article is to determine the effect of the phase-I sample on the geometric-Poisson EWMA chart's performance. Statistical properties of a EWMA control chart are usually evaluated in terms of average run length (ARL), which is the mean of RL distribution. The ARL of a control charting procedure is defined as the expected number of sampling stages until an out-of-control condition is signaled. An effective and efficient control chart can provide optimal ARL. More specifically, the ARL (5) of an optimal control scheme should be large when a process is in control and small when a shift occurs (Chen & Chen, 2007). In following section, we provide some information about our calculations.

RL distribution with estimated parameters
Chen (2012) used a Markov chain approach (proposed by Brook & Evans, 1972) to study the RL distribution of a geometric-Poisson EWMA chart. In this section, we will extend the Markov chain approximation for assessing the performance of geometric-Poisson EWMA Chart, when the estimated parameters differ from their actual values. Similar to , we have considered both conditional and marginal RL properties. We have provided the Markov chain method and the equations used to obtain the RL properties in the coming subsections.

The Markov chain approach
Suppose D is the number of defects, then the EWMA of D is The corresponding EWMA control scheme would signal, if Z i >ĥ u or Z i <ĥ u and a remedy action should be taken. To visualize the transitioning process, the decision interval [ĥ l ,ĥ u ] is divided into N subintervals as explained in Chen (2012). The transition of Z i in the interval [ĥ l ,ĥ u ] is a random walk and the ith subinterval is the ith state, denoted by E i , and is represented by the mid-point S i .
If Z i is within the decision interval, then the process is in-control state and no out-of-control signal would be given. On the other hand, if Z i moves outside the control limits (above ĥ u or below ĥ l , then the process enters the out-of-control status. Thus, the (N + 1)st state is absorbing and represents the out-of-control region.
Let P ij denote the probability of transition from state i to state j in one step. Then, the transition probability matrix, P, is defined as Note that all rows sum to unity, and that the last row consists of zeros, except with the last element that is equal to 1 because E m is an absorbing state. In addition, the matrix R is a N × N matrix, including the probabilities of moving from one transient state to another, I in an identity matrix of order N × N, u is a (N × 1) vector of ones, and the (I − R)u vector includes the transition probabilities of moving from one transient state to an absorbing state. The transition probabilities for the Markov chain are determined as follows: This transition probability, P ij , can be written as

Conditional performance of the RL distribution
The RL of the geometric-Poisson EWMA chart is the number of steps taken starting from the initial state E 1 to reach the absorbing state E N + 1 . Using the Markov chain approach, the approximate ARL and SDRL performance measures are computed as follows: where each of ARL and SDRL is a N × 1 vector, including ARLs and SDRLs corresponding to all possible states. Assuming that N is an odd number, then the ((N + 1)/2)th elements of these vectors correspond to the zero-state ARL and SDRL.
The percentile of the RL distribution is another important performance measure of the control charts. The percentiles of the RL distribution may be determined using the cumulative probability for RL. Let F r denote the N × 1 cumulative probability vector, where each of the N entries is for one starting value Z 0 and the index r = 1, 2, … represents a value of the RL. Then The cumulative probability times 100 give the percentile corresponding to the RL value r. For example, the 30th percentile for the case of Z 0 =̂0 is the smallest value of r for the middle entry of F r being greater than or equal to 0.3.

Marginal performance of the RL distribution
The marginal performance can be obtained by integrating the conditional performance measures with respect to the density of parametric space as shown in the following equations: where the g(μ) is an approximated normal distribution of random variable μ. The marginal performance measures are weighted averages of the conditional performance over all the values that the estimation may yield for the in-control mean μ 0 . These integrals can be solved using a numerical integration procedure. In our calculations, we have followed the approach of Ozsan et al. (2010) and used the Simpson's quadrature method in Matlab. (10)

Results and discussion
In this section, we have evaluated the conditional and marginal RL performance of the geometric-Poisson EWMA control chart, when the control limits are estimated. The conditional performance is summarized in Section 4.1 and marginal performance is provided in Section 4.2. All the computations are done in Matlab.

Conditional performance of the geometric-Poisson EWMA chart
In this section, to investigate the performance of the geometric-Poisson EWMA chart in phase II and observe the impact of parameters estimation, some hypothetical cases of estimated mean are considered. Three different situations are considered in estimating the mean; 25th (underestimation), 50th (nominal), and 75th (overestimation) percentile of the sampling distribution of estimated mean. This is equivalent to the actual in-control process mean being equal to the estimated in-control process mean (nominal), lower (overestimation), or upper (underestimation). Let the true in-control rate parameter be λ 0 = 2 with dispersion rate ρ = 0.20. The mean and various percentiles of the sampling distribution of ̂0 are calculated for various samples and provided in Table 1. It is clear that these estimates deviate heavily from the true value λ 0 = 2 for samples ≤500.
The conditional RL performance of the given chart is calculated for various parameters values, smoothing constant, and samples. The estimated conditional RL distribution is provided in Tables 2-4 for some choices. The various amounts of shifts in the average defects in terms of standard deviation of the in-control process are considered i.e. a = 0 + √ 0 .
The results of Tables 2-4 indicate that the effect of parameter estimation is significant. The actual in-control ARL 0 is significantly deviated from expected (500.00), when the parameters are estimated from m initial samples. In the above tables, the nominal case refers to known parameters, so, the performance of the given control chart does not depend on sample size. Overestimation (when parameters assume a value in the 75th percentiles) results in an increase in the number of false alarms and a decrease in the variability of the RL as compared to nominal summaries. On the other hand, underestimation (25th percentiles) results in large EWMA variance than expected one. Therefore, the average run length (ARL 0 ) is expected to be less than the nominal values and in more frequent false alarms. In case of out-of-control scenario, shift in the average non-conformities due to an assignable cause are detected faster for underestimation case than for the overestimation case, as is well known.
It is apparent that the performance of the geometric-Poisson chart is significantly affected due to estimated parameters, however, the magnitude of the effect decreases as m increases. In both over-or underestimation cases, large reference samples, more than 1,000, are required to achieve the desired in-control ARL 0 of 500.00. In most of the quality control applications, 25-30 samples for different control charts is recommended, but this study suggests that more than 1,000 samples are needed for the given chart to monitor quality characteristics. Therefore, the implementation of the geometric-Poisson EWMA chart requires minimum 1,000 reference samples before monitoring.
The choice of smoothing constant or EWMA weight has also significant effect on the performance of chart, as is obvious from Tables 2-4. Considering a sample size constant, it is observed that the actual ARL 0 value is decreased by increase in smoothing constant in case of overestimation, while inverse hold for underestimation. To achieve the desired ARL 0 of 500.00 with a large reference sample size, a large smoothing constant is required. However, when there exists a positive shift, the ARL increases by increase in smoothing constant, as is obvious from Tables 2-4. Therefore, smaller smoothing constant is better than larger one in detecting small shifts of the average nonconformities. Also, the dispersion parameter has significance effect on the performance of the given chart. The larger the dispersion parameter, the less effect on the ARL when the rate parameter is fixed (see Tables 2 and 4). This is due to the smaller value of dispersion with fixed rate parameter that converges to Poisson distribution. However, the more the rate parameter with the fixed dispersion value, less influenced is the yield on the performance (see Tables 2 and 3). Thus, choice of dispersion parameter ρ has also significant effect on the performance of geometric-Poisson EWMA chart.

Marginal performance of the geometric-Poisson EWMA chart
In practice, it is often not possible to know how the estimated mean compares to the true in-control mean. Therefore, it would also be useful to evaluate the marginal performance of a chart, chart, which considers the distribution of the estimated parameters to take into account the random variability introduced through parameter estimation. The marginal performance for the geometric-Poisson EWMA charts under different parameters values, sample sizes, smoothing constants, and shift magnitudes. The ARL and SDRL for the given chart based on the estimated parameters for in-control λ 0 = 2.0 and ρ 0 = 0.20 and 0.30 values are calculated for in-control and out-of-control situations and given in Table 5. The corresponding ARL for λ 0 = 3 and ρ 0 = 0.20 for different smoothing constant values are given in Figure 1(a-c), which are known as ARL curves. The RL characteristic for any other combination of the parameters could be similarly obtained.  In Table 5 and Figure 1(a-b), the performance metrics are weighted averages over all the values that the estimation may yield for the in-control parameters. To study how large the sample size should be to perform essentially like the known parameter case, the values 30, 100, 1,000, and ∞ for n are evaluated. The infinite sample size (n = ∞) corresponds to the known parameters or the nominal case.
Comparing the values in Table 5 and Figure 1(a-b) with their nominal values, it is obvious that estimating control limits can cause both ARL and SDRL to be large or smaller than their desired values, when the process is working in-control. When a shift in the average number of defects occurs, the ARL performance is almost similar for various sample sizes. For a fixed sample size and dispersion parameter, the chart produces a large in-control ARL as the EWMA smoothing constant increases. A smaller sample size increases the in-control ARL. The choice of smoothing constant depends on how fast one wants to detect a shift of given size in average non-conformities. Table 4 shows that larger sample size for estimating the parameters of the process would generally require getting fairly close in-control ARL as desired. Also, the larger value of dispersion parameter would be required to minimize false alarms. Similar behavior has been observed for other choices of parameters and smoothing constant.

Conclusion and discussion
The Poisson distribution is often used to model the count data in all fields. However, the Poisson distribution is not only underlying distribution for counting data. For production processes, the geometric-Poisson EWMA control chart, proposed based on geometric-Poisson compound distribution, is very useful to detect the process variation rapidly to reduce the lost cost. This chart could be used and should be used if small shifts from normal conditions are important to detect quickly.
In real application, actual values of process parameters for designing the given chart are often unknown. In this situation, a typical approach is to conduct a phase-I study, where a reference sample of m observations is obtained and then used for estimating these unknown parameters. However, the performance of the control chart may significantly be different than expected performance if the parameters are not well estimated. This article investigates the performance of the geometric-Poisson EWMA chart, when the process parameters are estimated based on m reference samples. The effect on the RL characteristics, such as ARL and SDRL, has been shown to be significant even with sample sizes as large as 1,000. Furthermore, for smaller EWMA smoothing constants, say 0.05, the chart with estimated parameters produces more false alarm rate, which results into large in-control ARL and SDRL than the chart with known parameters. This study suggest more than 1,000 sample size and smoothing constant greater than 0.05. However, this choice depends on the sensitivity of the chart with respect to detecting changes in wafer quality. The larger value of dispersion parameter is better to get the desired in-control ARL and SDRL. The results of the study are very useful for practioners and researchers to design a geometric-Poisson EWMA chart for detecting minor process variations in production processes and improving the process quality in phase-I sample.