Statistical testing for sufficient control chart performances during monitoring of grouped processes

With ISO 7870‐8, a standardized application of charting techniques for short runs and small mixed batches was presented in 2017. Similar to various scientific approaches, it requires that sample values from grouped processes follow nearly identical distributions. In practice, however, there tend to be differences between distribution parameters. Moreover, equal parameters do not ensure that distributions are properly aligned to the center line and control limits of the chart. These facts can lead to undesired control chart performances which can be expressed by average run lengths (ARL) during in‐control and out‐of‐control conditions. In this work, a statistical test for sufficient control chart performances during monitoring of grouped processes based on preliminary samples is proposed. Control chart performances are defined as sufficient when they deviate within acceptable ranges from usual performances during single process monitoring in mass production. The ARL resulting from estimated distributions and planned production sequences is used as test statistic and calculated via the Markov chain approach. Exemplary tests are executed for scenarios with individuals and cumulated sum (CUSUM) charts. A simulative determination of error rates resulting from the ARL‐based testing demonstrates its effectiveness in testing for sufficient control chart performances compared to an indirect testing with Levene's test and a one‐way analysis of variance (ANOVA).

Nonrepetitive processes refer to production with completely different machines and setups. Overviews of suitable SPC techniques such as control charts with modified control limits, self-starting control charts, and change-point models are provided by Capizzi and Masarotto, 2 Elam, 3 and Marques et al. 4 Repetitive processes are characterized by the fact that the same machine is used to produce different products and quality characteristics without requiring major setup changes. Assuming a similar behavior of repetitive processes, sample values can be plotted within a joint control chart. For this, new developed chart types are presented in the scientific literature such as Q-charts proposed by Quesenberry 5,6 and enhanced by Castillo and Montgomery 7 or the tchart proposed by Gu et al. 8 These have some drawbacks that include difficulty in interpretation of charting statistics and worse performance than the control chart types, which perform well in large batch and mass production such as Shewhart, cumulative sum (CUSUM), and exponentially weighted moving average (EWMA) charts. Control chart performances can be expressed by the average run length (ARL), which is the average number of samples required to receive an alarm. Typically, high control chart performances mean a large ARL during in-control conditions and small ARLs during out-of-control conditions. In order to ensure industrial acceptance, the application of proven chart types is preferable.
Charting techniques for short runs and small mixed batches based on Shewhart-styled control charts are proposed under ISO 7870-8, 9 which was published in 2017 and is partly based on the British standard BS5702-3. 10,11 Their application is explained in three steps. In the first two steps, repetitive processes are identified and grouped. This includes the identification of process influencing characteristics and the determination if differences in characteristics cause processes to behave significantly different. In the third step, the monitoring of grouped processes begins. Plotted samples are measurements of quality characteristics or process parameters, in the following briefly denoted as characteristics. Thus, groups of processes can also be denoted as groups of characteristics. Samples are required to have a size of one and to be approximately normally distributed. In case of different aim values or process spreads, sample values are transformed to deviations from their aim values and normalized through division by their standard deviations.
Alternative grouping approaches to ISO 7870-8 can be found in the scientific literature. These can be based on coding systems such as those proposed by Al-Salti and Statham 12 and Zhu et al 13 or on clustering algorithms as introduced by Wiederhold. 14 Shewhart-styled or similar charting techniques for grouped characteristics are also presented by Bothe, 15 Evans and Hubele, 16 Kimbler and Sudduth, 17 Koh et al, 18,19 Lin et al, 20 Koons and Luner, 21 and Wheeler. 22 A main problem with these charts is the requirement that samples from different processes follow nearly identical distributions. Through this, it shall be indirectly ensured that resulting control chart performances are similar to usual performances during single process monitoring in large batch or mass production. Thus, it is widely proposed to statistically test distributions from grouped processes for equal means and variances based on preliminary samples. [12][13][14]16,23 In practice, however, parameters tend to be different. Even in case of equal parameters, distributions are not necessarily aligned to the center line and control limits of the control chart. The reason is that the transformation of samples as well as the construction of the control chart rely on predefined aim values, experience, or distribution parameters, which are estimated based on strongly limited sample values. Possible consequences are undesired control chart performances. Quesenberry 24 shows that estimated control limits can lead to drastic decreases of control chart performances and thus may cause high financial losses, for example, due to machine downtime. Hence, before monitoring a group of processes in a joint control chart, it is desirable to statistically evaluate the control chart performances that may be achieved.
Since there exists no suitable evaluation approach, a new statistical test for sufficient control chart performances based on preliminary samples is proposed as follows. Testing results serve as decision base for or against a monitoring of a formed group of processes in a joint control chart. It can be combined with several existing grouping approaches and control chart types. In the following, the testing is presented as an intermediate step of the procedure according to ISO 7870-8. The standard itself proposes to support the process grouping with suitable statistical analyses besides simulations, experiments, or expert knowledge. Sufficient control chart performances are defined as acceptable deviations from usual performances during single process monitoring in mass production for in-control conditions and relevant process shifts. The ARL resulting from estimated distributions and the planned production sequence is taken as test statistic and calculated via the Markov chain approach. The testing method is applied but not limited to exemplary scenarios with Shewhart's individuals and CUSUM charts plotting samples of size one. Its effectiveness in testing for sufficient control chart performances is demonstrated through simulative determinations of error rates (type I and type II) and compared to the indirect testing with Levene's test 25

ARL-BASED TESTING FOR SUFFICIENT CONTROL CHART PERFORMANCES
The integration of the developed testing method into the procedure of ISO 7870-8 as an intermediate step 2.5 is depicted in Figure 1. After repetitive processes are identified and grouped, the desired control chart types are selected as they determine achievable control chart performances. For example, an individuals chart performs well in quickly detecting large process shifts. A CUSUM chart is preferable if small process shifts have to be detected early. 26 ISO 7870-8 recommends the use of individuals, moving mean, and moving range charts. After selection, conditions for sufficient control chart performances are formulated as null hypothesis 0 and the test statistic as well as critical values are calculated. The estimated ARL is used as test statistic and results from estimated distributions and the planned production sequence. As according to the ISO standard, it is assumed that samples from the same process follow an identical normal distribution. However, normal distributions from different processes may be nonidentical. For the estimation of parameters, at least two preliminary samples per process during in-control conditions are required.
The statistical test is two-tailed, leading to two critical values. At the end of step 2.5, the determined test statistic is compared to critical values. If it falls between the critical values, it is expected that the combination of grouped characteristics and the control chart types will lead to sufficient control chart performances. Otherwise, this expectation must be rejected and the validity of grouping conditions is recommended to be revised. The subprocedure Define sufficient control chart performances (H 0 ), calculate test statistic and critical values is depicted in detail in Figure 2. It can be divided into two different tasks. The first task is to calculate the test statistic. In the second task, critical values are calculated which also includes the formulation of the null hypothesis 0 . All single substeps are described below.

Define production sequence
For the developed testing method, we consider that samples from different processes tend to follow different normal distributions. Thus, the planned production sequence determines the sequence of distributions from which plotted sample values are drawn. Considering that the sequence has an impact on the ARL-based test statistic, it must be previously defined. It is assumed that production sequences are infinitely often repeated. A sequence can be simply described with a vector Each of the vector elements refers to one of grouped characteristics. For example, considering = 2 grouped characteristics, an example production sequence can be defined with = (1, 1, 1, 2). This means that the first characteristic is produced and measured three times before the second one is produced and measured. Afterwards, the sequence starts again from its beginning. In this way, different proportions of different produced characteristics can be considered.

Calculate test statistic
The test statistic is the estimated ARL, which results from the chosen control chart type, the grouped distributions with estimated parameters, and the previously defined production sequence. At the beginning, the estimated mean̄and standard deviation for each characteristic are calculated. It is where ≥ 2 is the number of preliminary obtained samples and , is the th preliminary sample of characteristic .

2.2.1
Calculation of for single process monitoring ( = 1, = (1)) via the Markov chain approach With the Markov chain approach, the selected control chart is interpreted to determine a discrete state space. With every new plotted sample, the control chart can change from its current state to another state. For example, the individuals chart is characterized by two different states. The first state refers to a sample lying between the control limits and the second state is achieved when one of both control limits is exceeded by a sample. More complex state spaces result from Shewhart charts applying further sensitizing pattern tests such as proposed by ISO 7870-2. 27 Champ and Woodall 28 describe a procedure of how to derive all available states for different combinations of applied rules. Brook and Evans 29 and Woodall 30 describe state spaces for one and two-sided CUSUM charts based on artificial discretization.
Assuming the existence of ℎ + 1 different states, probabilities , referring to the transition from state to state can be summarized in a transformation matrix : For the calculation of , each probability is determined by the estimated distribution of the sample values. The (ℎ + 1)-th state represents the out-of-control condition of the chart. Taking the example of the individuals chart, this would be the second state. It is called absorbing state as the state cannot be left anymore. Thus, the lower right probability of is set to ℎ+1,ℎ+1 = 1. Brook and Evans 29 show that the ARL of the control chart can be calculated based on the submatrix . It is the first element of the vector The matrix represents the identity matrix and 1 is a column vector with elements all equal to one. The index of the infinitive sum represents the run length, which typically starts from one. ( −1 )1 is a vector in which the first element describes the probability of changing from the initial state to the absorbing state after plotted samples. Within the infinitive sum, each of these vectors is weighted with the run length . Thus, the first element of is the test statistic F I G U R E 3 ARLs resulting from two grouped processes monitored in an individuals chart with control limits CL = ± 3 and a production sequence q = (1, 2) . A simple example can be given for an individuals chart with samples drawn from a standard normal distribution. With only one state next to the absorbing state, it is ≈ 99.73%. This leads to = ≈ 370.37.

2.2.2
Calculation of for monitoring of grouped processes ( ≥ 2, = ( 1 , 2 , … , ), ≥ 2) via the Markov chain approach Let with = 1, 2, … , describe the transformation submatrix of characteristic q( ) within the production sequence. Each submatrix is derived based on the discrete state space given by the control chart and on the estimated sample distributions of grouped characteristics. Let further , be a product of matrices defined with The vector M can now be formulated in a similar way as shown by Brook and Evans. 29 The only difference is that Q matrices can change for every new plotted sample as determined by the infinitive production sequence and the estimated sample distributions of characteristics: The transformation of the formula to the simplified expression is shown in detail in the last section (Appendix). Example ARLs resulting from known distribution parameters of two grouped characteristics monitored in an individuals and a CUSUM chart are visualized in Figures 3 and 4. Each diagram shows a heatmap consisting of 100 × 100 ARLs generated for different normal distributions ( 1 , 2 1 ) and ( 2 , 2 2 ). For both cases, the production sequence is set to = (1, 2), meaning that both characteristics are produced and measured in an alternating sequence. The control limits of the individuals chart are set to = ±3. The monitored statistic are individual sample values , which are directly drawn from both distributions according to the production sequence. For the CUSUM chart, the monitored statistics are An out-of-control condition is met when + ≥ ℎ or − ≤ −ℎ. For the values in Figure 4, it is ℎ = 5 and the reference value is set to 0.5, which is typically chosen when process shifts of ±1 are desired to be quickly detected. 26 As a CUSUM chart does not naturally provide discrete zones that can be used for the determination of a discrete state space, an artificial discretization is required. For this, the discretization parameter proposed by Woodall 30 31. Also in these cases, small differences of distribution parameters partly lead to similar ARLs.
Considering that small deviations from ideal ARLs for in-control and out-of-control conditions can be acceptable from industrial perspective, conditions for sufficient control chart performances are formulated as null hypothesis 0 within the following two substeps.

Define relevant process shifts
As grouped distributions with unequal parameters lead to deviations from ideal ARLs, the user can define relevant process shifts. Corresponding control chart performances are desired to be similar to ideal ARLs. In this approach, process shifts are understood as absolute. After a process shift , distribution means change to , = + .

Define acceptable deviations from ideal ARLs
In this substep, the understanding of sufficient control chart performances is formulated as null hypothesis 0 of the statistical test. At first, a maximum acceptable deviation ∈ (0, 1) from ideal ARLs during in-control conditions and outof-control conditions is defined. Considered out-of-control conditions are determined by the relevant shifts in . The null hypothesis 0 is defined as follows: It implies that ARLs during out-of-control conditions are desired to be smaller than ideal ARLs while accepting a small range of greater ARLs. The ARL during in-control conditions is limited in both directions. While it is typically desired to be as large as possible, the upper limit is set in order to consider the continuous development of the ARL with increasing or decreasing process shifts. A very large ARL during in-control conditions would also lead to large ARLs for process shifts that are not defined as relevant but are between the zero shift and the nearest relevant process shift in positive or negative direction. Thus, with the upper limit, very large ARLs for small process shifts in positive and negative directions can be prevented. The alternative hypothesis 1 applies in any situation in which at least one of the inequations is not true.

Define significance level
As typical for statistical hypothesis tests, a significance level must be defined in order to determine the rejection area of the statistic distribution, which results from a true null hypothesis. In the following, a conventional value of = 5% is chosen.

Calculate critical values
Since the developed testing is two-tailed, two critical values 1 and 2 must be determined: Thus, 0 is not rejected as long as the following condition is met: Both critical values 1 and 2 are derived through simulations written in Python. Within each simulation loop, random sample distribution parameters for each characteristic are generated until a group of distributions is found, which fulfils the null hypothesis. For the calculation of ARLs, the selected control chart type and production sequence are considered. For each characteristic , random preliminary samples are drawn from its representing distribution. The samples are then used to calculate estimated distribution parameters and the test statistic . After multiple simulation loops, calculated statistics are sorted to an ascending order. 1

F I G U R E 5
Estimated normal distributions of measured and transformed x-positions from three grouped drilling processes (see Table 1)

APPLICATION SCENARIOS AND TESTING POWER
For an exemplary application of the ARL-based testing method, drilling processes for three different products are grouped. Measured x-positions of drilled holes are planned to be plotted within a joint control chart. Three preliminary sample values from each of three processes are listed in Table 1 (see also Figure 5).
x-positions of each process are assumed to be normally distributed. Since there are different expected means and standard deviations, all measured positions are transformed. Estimated means and standard deviations are derived based on Equations (2) and (3). With the application of the testing method, we investigate if sufficient control chart performances can be expected for the use of individuals or CUSUM charts. All testing parameters and results are given in Table 2. Although only three preliminary sample values per process are available, further testing results are generated for varying numbers of preliminary samples resulting in six different scenarios. For every scenario, it is assumed that estimated distribution parameters are the same as in Table 1. The specific example from Table 1 is considered with scenarios 1 and 4. As indicated by the production sequence, we assume that different products are produced and measured one by one. Measured ARLs are calculated based on estimated parameters given in Table 2. Individuals charts are typically applied if only large shifts shall be quickly detected. Thus, relevant shifts are set to = (−2, 2) as indicated by the null hypothesis. CUSUM charts offer high performances for smaller shifts. For them, relevant shifts are set to = (−1, 1). As numbers of preliminary samples vary among different scenarios, corresponding critical values are derived. Through comparison of test statistics with critical values, it can be concluded that the null hypotheses are only rejected in case of larger numbers of preliminary samples. For scenarios 1, 4, and 5, considered charts are expected to provide sufficient control chart performances.
Considering that sufficient control chart performances are a crucial decision criterion for the monitoring of grouped processes, the ARL-based testing method seems to deliver a useful decision base. For a quantitative assessment of the testing effectiveness, error rates of type I and type II ( , , , ) are determined based on simulations hereafter. In order to demonstrate that the direct testing for sufficient control chart performances based on the test statistic is more effective than the indirect approach requiring equal distribution parameters, the null hypotheses shown in Table 2 are also tested with a combination of Levene's test and the one-way ANOVA. Resulting error rates are denoted as , and , . With consideration of error cumulation, the significance level for each of both tests is set to = 1 − √ 1 − . Each simulation result presented in the following is based on a parameter scenario selected from Table 2. Within 50 000 simulation loops, normal distributions ( , 2 ) with random parameters ∈ [ min , max ] and ∈ [ min , max ] are generated. Considering the scenario-specific control chart type and production sequence, it is determined whether generated distributions lead to sufficient control chart performances or not.
For the determination of error type I rates, only distributions leading to sufficient control chart performances are required. Therefore, in each loop, new distributions are randomly generated until a suitable set is found. Chosen intervals for random parameter generation are expected to cover nearly all suitable combinations of distribution parameters. The determination of error type II rates requires the generation of distributions leading to insufficient control chart performances. For random parameter generation, we choose once narrow intervals and once wide intervals.
After drawing random sample values from corresponding distributions ( , 2 ), the ARL-based testing method as well as the combination of Levene's test and the one-way ANOVA are applied. Errors of both types are detected through comparison of sample-based testing results with the known control chart performances based on known distributions ( , 2 ). In the following both Tables 3 and 4, resulting error rates of type I and II are listed. Error rates of type II are also converted to powers expressing the probability of detecting real process shifts.
In both tables, the ARL-based testing mostly outperforms the combination of Levene's test and the one-way ANOVA when testing for sufficient control chart performances. Error rates of type I are always smaller with differences of up to more than 90%. Differences increase with increasing numbers of preliminary samples. As expected, , is always close to 5% as determined by the significance level . The powers of the new statistical test are mostly higher except for the CUSUM chart in scenarios 5 and 6 for distributions with ∈ [−0.5, 0.5] and ∈ [0. 5, 1.5]. In all other cases, powers are higher up to more than 60%. TA B L E 4 Rates of error type II while testing for sufficient control chart performances with the test statistic ARL est ( = 5%) and a combination of the Levene's test and the one-way ANOVA ( Eq = 2.53%)

CONCLUSION
The presented method allows to test for sufficient control chart performances before deciding for a monitoring of grouped processes within a joint control chart. Sufficient control chart performances are defined based on acceptable deviations from usual performances during single processes monitoring for in-control and selected out-of-control conditions. For the calculation of the ARL-based test statistic, a Markov chain approach was presented, which considers both the selected control chart type as well as the planned production sequence. For industrial application, the method can be integrated as an intermediate step into the procedure proposed by ISO 7870-8. Besides Shewhart-styled control charts, the application can be extended to other types such as CUSUM charts. The application of the testing method was demonstrated for a group of drilling processes. Through simulations of exemplary scenarios, it was shown that an ARL-based testing for sufficient control chart performances can be more effective than the indirect approach requiring equal distribution parameters.
In future research, the method applicability will be extended to nonnormal distributions such as folded normal or Rayleigh distributions. Also, the consideration of random production sequences shall be enabled. For the determination of critical values, an alternative approach for random generation of distributions is required in order to reduce calculation time.

APPENDIX
In the following, the transformation of Equation (7) is demonstrated in detail. It is ) where the infinite sum is known as the Neumann series. 31 Now, putting (14) into (7), it is