Tr(R2) control charts based on kernel density estimation for monitoring multivariate variability process

Abstract The multivariate control charts are not only used to monitor the mean vector but also can be used to monitor the covariance matrix. The multivariate variability charts are used to guarantee the consistency of products in the subgroup. Many researchers have been studied the multivariate control chart for variability. Nevertheless, those conventional methods have several drawbacks because it is developed based on the determinant of the covariance matrix and not free of the measurement unit. To overcome such issues, this paper proposes the multivariate control chart for variability based on trace of the squared correlation matrix. Kernel Density Estimation is used to improve estimated control limit. The kernel density estimation method is used to calculate the control limit. Through simulation studies, the performance of the proposed chart is evaluated using the average run length (ARL). The control limits of the proposed chart are produced in control ARL at about 370 for α = 0.00273. Meanwhile, the proposed chart demonstrated better performance to detect the shift for the large value of quality characteristics and sample size. The proposed chart also produces a better performance than the conventional generalized variance chart when used to monitor the real case data.


PUBLIC INTEREST STATEMENT
A control chart is one of the improvement tools to monitor how a process changes over time. A good control chart will provide precise conclusion if the special causes of variation affect the actual process. The multivariate variability charts are used to guarantee the consistency of products in the subgroup. However, the conventional methods have several drawbacks because it is developed based on the determinant of the covariance matrix and not free of the measurement unit. This paper proposes the multivariate control chart for variability based on trace of squared correlation matrix to overcome such issues. Kernel density estimation is nonparametric techniques used to improve estimated control limit. Furthermore, the proposed control chart draws a correct decision to monitor real problem compared to the conventional method.
Keywords: Multivariate variability process; trace squared correlation matrix; control chart; kernel density estimation; average run length

Introduction
One of the most powerful tools in statistical process control (SPC) is the control chart, which has been widely used in industries and services. Based on the type of monitored quality characteristics, there are two kinds of control charts: variable for interval or ratio scale (Page, 1961;Roberts, 1959;Shewhart, 1924) and attribute for categorical scale (Ahsan, Mashuri, & Khusna, 2017;Wibawati, Purhadi, & Ahsan, 2018;Wibawati, Purhadi, & Irhamah, 2016). At present, the demands of the consumers for the quality of the product are increasing, both in terms of the quality level and in terms of quality characteristics number. The quality of a product is not only determined by one characteristic but also by more than one quality characteristics. Therefore, to monitor the quality of a product, the multivariate control chart is needed in order to monitor several quality characteristics simultaneously. The multiple quality characteristics should be monitored together using a multivariate control chart. The recent development for multivariate control chart includes Hotelling's T 2 control chart (Abu-Shawiesh, Kibria, & George, 2014;Ahsan, Mashuri, Kuswanto, Prastyo, & Khusna, 2018a;Alfaro & Ortega, 2009;Ali, Syed Yahaya, & Omar, 2013;Alkindi & Prastyo, 2016), MCUSUM control chart (Arkat, Niaki, & Abbasi, 2007;Issam & Mohamed, 2008;Khusna, Mashuri, Ahsan, Suhartono, & Prastyo, 2018;Noorossana & Vaghefi, 2006), and MEWMA control chart (Chen, CHENG, & Xie, 2005;Pirhooshyaran & Niaki, 2015).
The multivariate control charts are not only used to monitor the mean vector but also can be used to monitor the covariance matrix. The control charts are not only used to monitor the multivariate mean but also used to monitor multivariate variability. Multivariate mean vector charts are useful in seeing product uniformity between subgroups of observation, while the multivariate variability charts are used to ensure the uniformity of products in the subgroup.
The advantage of using the correlation matrix chart (Sindelar, 2007) in monitoring variability process is not affected by units of measurement from the quality characteristics because it uses a correlation matrix, while the other charts based on covariance matrices or successive difference covariance matrices are not free of the measurement unit. However, the chart developed based on the determinant of the correlation matrix has weaknesses because of the matrix determinant nature. The first weakness is the quality characteristics with a small variance will result in decreasing value of multivariate variability. Second, a quality characteristic is a linear combination of the other variables, if it is used then the multivariate variability becomes small. To overcome these weaknesses, Suwanda and Djauhari (2003), Huwang, Yeh, and Wu (2007), and Djauhari (2010) proposed the trace operator. Mashuri, Haryono, Wibawati, Khusna, and Ahsan (2016) have succeeded in developing a Trace Rho 2 chart based on the squared correlation matrix using the trace operator, for the simplify henceforth it is called Tr(R 2 ). Besides having a relatively good performance, this chart has two other advantages, which are free of unit measurement and can avoid the weaknesses of the determinant properties of a matrix. Also, there are two essential advantages of trace operators compared to determinant. First, trace statistics can deal with determinant weaknesses, that is, trace values will not produce small value. Even if one of the correlations between variables is very small or near to zero. Second, the trace of a matrix can still be calculated and will not be zero, even though the covariance matrix is definite negative.
Based on the aforementioned reasons, this paper focuses on developing the multivariate variability control chart based on trace squared of correlation matrix R. The KDE method is employed to estimate the control limit of the proposed chart. The performance of the proposed chart is also evaluated using the average run length (ARL) criteria for several numbers of quality characteristics p and number of sample size n. Section 2 presents some of related work to this study. In Section 3, a brief review of the proposed Tr(R 2 ) statistics and KDE method are presented. The methodology of simulation study is exhibited in Section 4. Section 5 contains the illustrative example of Tr(R 2 ) control chart. Finally, Section 6 is devoted to the conclusion and future research.

Related works
Many works studied the utilization of a control chart to monitor the shift in variance. In univariate case, the simplest method is the Shewhart sample range (R) and sample standard deviation (S) charts. Riaz (2008) proposed Q chart based on interquartile range (IQR), for monitoring changes in process dispersion under normality assumption. For non-normal distribution, Abbasi and Miller (2012) inspected the performance of some univariate variability charts such as R, S, IQR, Downton's estimator, median absolute deviation (MAD), as well as Sn and Qn. Kang, Lee, Seong, and Hawkins (2007) developed coefficient of variation (CV) chart using rational groups of observations. Castagliola, Celano, and Psarakis (2011) developed new method to monitor the CV by means of 2 one-sided EWMA charts of the coefficient of variation squared and later improved by Yeong, Khoo, Tham, Teoh, and Rahim (2017). Calzada and Scariano (2013) developed synthetic coefficient of variation (SynCV) chart. An adaptive Shewhart control chart implementing a variable sample size strategy in order to monitor the coefficient of variation is developed by Castagliola, Achouri, Taleb, Celano, and Psarakis (2015). For short-run production, Amdouni, Castagliola, Taleb, and Celano (2015) introduced an adaptive Shewhart control chart using a variable sample size (VSS). Variable sampling interval (VSI) Shewhart chart can also be used to monitor the coefficient of variation in a short production run context (Amdouni, Castagliola, Taleb, & Celano, 2017). For detecting small and moderate shifts, Khaw, Khoo, Yeong, and Wu (2017) proposed variable sample size and sampling interval (VSSI) feature to improve the performance of the basic CV chart. The side sensitive group runs chart for the CV (SSGR CV) chart is developed. The chart surpasses the other charts under comparison, for most upward and downward CV shifts (W C Yeong et al., 2017).
The most popular multivariate variability control chart is the GV control chart (Alt & Smith, 1988) which was refined by Djauhari (2005a). The properties of this method are studied by Aparisi, Jabaioyes, and Carrion (1999). Aparisi, Jabaloyes, and Carrión (2001) developed the new design of the GV chart |S| with adaptive sample size. Grigoryan and He (2005) introduced the multivariate double sampling (MDS) |S| control chart scheme for controlling shifts in a covariance matrix. Doǧu and Kocakoç (2011) proposed the GV control chart based on maximum likelihood estimation which can be used to monitor the multivariate process dispersion and detect the time of the change in covariance matrix. Hamed (2014) improved the performance of GV chart |S| to monitor the multivariate process. Lee and Khoo (2017) developed the multivariate synthetic generalized sample variance |S| (synthetic |S|) chart. The comparative studies show that the synthetic |S| chart outperforms the conventional |S| chart in detecting shifts in the covariance matrix of a multivariate normally distributed process. The combination of the double sampling, variable sample size and variable sampling interval features (DSVSSI |S| chart) is applied to monitor the shifts in the covariance matrix of a multivariate normally distributed process .

Tr (R 2 ) statistics
Let the p dimension sample random vectors, X 1 ; X 2 ; . . . ; X n are taken from the population which follow multivariate normal distribution N p μ; Σ ð Þ: The sample correlation matrix R of those random vectors can be written as follows: (1) where r ii ¼ 1 r ii ¼ 1, for i = 1, 2, …, p and r ik is the coefficient correlation corresponded to i-th and k-th variable, for i = 1, 2, …, p and j = 1, 2, …, p. Furthermore, the Tr(R 2 ) statistics are defined as follows: Tr R 2 ¼ Tr where p denotes the number of variables, and b is a positive number where the value is determined by the correlation among the variables.
For example, three-dimension sample random vectors X 1 ; X 2 ; X 3 are taken from the population which follow the multivariate normal distribution. The sample correlation matrix can be written as R ¼ 1 r 12 r 13 r 21 1 r 23 r 31 r 32 1 2 4 3 5 Tr(R 2 ) statistics are calculated as follows: R 2 ¼ 1 r 12 r 13 r 21 1 r 23 r 31 r 32 1 2 4 3 5 1 r 12 r 13 r 21 1 r 23 r 31 r 32 1 2 4 3 5 ¼ 1 þ r 12 r 21 þ r 13 r 31 r 11 r 12 þ r 12 þ r 13 r 32 r 13 þ r 12 r 23 þ r 13 r 33 r 21 þ r 13 þ r 13 r 13 r 21 r 12 þ 1 þ r 23 r 32 r 21 r 13 þ r 23 þ r 23 r 33 r 31 þ r 32 r 21 þ r 31 r 31 r 12 þ r 32 r 22 þ r 32 r 31 r 13 þ r 32 r 23 þ 1 2 4 3 5 ¼ 1 þ r 2 12 þ r 2 13 r 12 þ r 12 þ r 13 r 32 r 11 r 13 þ r 12 r 23 þ r 13 r 21 þ r 21 þ r 23 r 13 r 2 21 þ 1 þ r 2 23 r 21 r 13 þ r 22 r 23 þ r 23 r 31 þ r 32 r 21 þ r 31 r 31 r 12 þ r 32 þ r 32 r 2 31 þ r 2 32 þ 1 2 4 3 5 with common form diagonal elements are: Because R is symmetric, r ik ¼ r ki ; forkÞi, then   (Härdle & Linton, 1994) No Kernel function Formula Tr(R 2 ) statistic for the number of quality characteristics p = 3 can be calculated as p + b, where p = 3 and b ¼ 2ðr 2 12 þ r 2 13 þ r 2 23 Þ: In general, for p quality characteristics, Tr(R 2 ) can also be written as Tr(R 2 ) = p + b, where p is the number of quality characteristic and b is two times of sum squared coefficient correlation which can be expressed as: Þ:  The minimum value of Tr(R 2 ) is p for " r ik ¼ 0and its maximum value is p 2 for " r ik ¼ 1: Figure 1 illustrated the statistics of Tr(R 2 ) for p = 3 with various coefficient correlation. It can be seen that the larger coefficient correlation the larger statistics Tr(R 2 ) produced. Because of the Tr(R 2 ) statistics is obtained from p þ b where b is two times of sum squared coefficient correlation, the distribution of Tr(R 2 ) statistics is not easy to determine. Thus, this research employs the KDE to estimate the empirical distribution of Tr(R 2 ).

Kernel density estimation
KDE method is a non-parametric method to estimate the probability density function of a random variable. This method was first introduced by Rosenblatt (1956) and Parzen (1962) so that its name is called the Rosenblatt-Parzen kernel density estimator which is the development of the histogram estimator. Chou et al. (2001) proposed KDE to estimate the distribution of T 2 statistic. Let T 2 is a Hotelling's statistic which obtained under in-control condition. The distribution of T 2 statistic could be calculated with the following kernel function: where K and h define kernel function and smoothing parameter, respectively. Table 1 presents some kernel functions displayed in (Härdle & Linton, 1994), where I is an indicator. The most used Gaussian Kernel is used in the analysis for this paper.
The control limit in equation (2) can be calculated using tables of integrals, in closed form distribution. However, the control limit might be not efficient to be calculated if the distribution is not closed form. Thus, the kernel control limit is solved using trapezoidal rule (Burden & Faires, 2011), one of the numerical integration methods to approximate the definite value of integral equation. Furthermore, the control limit of T 2 based on KDE could be estimated by taking the percentile of kernel distribution. Hence, the control limit of T 2 based on KDE equal to ½100ð1 À αÞ-th percentile of T 2 distribution which could be calculated using as follows: 3.3. Tr(R 2 ) control chart based on KDE The following procedures are used to form the Tr(R 2 ) control chart with the KDE control limit.
Procedures to form the Tr(R 2 ) control chart: (1) Specify the level of significance αand sample size n.
(2) Input data as a matrix with the dimension of mn Â p, where m denotes the number of subgroups. (3) For each subgroup i from 1 to m: (a) Calculate the correlation matrix R i with a dimension ofp Â p from i-th subgroup matrix with a dimension of n Â p.
(b) Calculate R 2 i ¼ R i :R i which has the dimension of p Â p. (c) Calculate the statistics Tr i ¼ traceðR 2 i Þ with a dimension of 1 Â 1: For i = 1,2,.,m subgroup the dimension of the vector Tr is m Â 1: (4) Create the control chart by plotting theTr i , where i = 1, 2, … m and Control Limit (CL) calculate using KDE method.
Procedures to calculate the density of Tr i using KDE: 1. Determine the type of kernel, in this study Gaussian kernel is used with the following equation: Procedures to calculate the control limit Tr(R 2 ) control chart using KDE: 1. Calculate the cumulative distribution function of f h ðTrÞ using the following equation: 2. By using the trapezoid rule method which is a numerical integration method, the integral ð Tr 0 f h ðtrÞdtr can be computed as follows: 3. The control limit of Tr(R 2 ) control chart can be estimated by taking the 100ð1 À αÞth percentile of the empirical density of f h ðtrÞas the following equation:

Simulation studies
In this section, several simulation studies are conducted to investigate the performance of the proposed chart. The quality characteristics are assumed to follow the multivariate normal distribution, X,N p ðμ; ΣÞ: First, the performance of the control limit is evaluated by calculating its ARL 0 . Further, the performance of the proposed chart to detect a shift in the process is also evaluated using ARL 1 criteria. The simulation study, ARL 0 and ARL 1 of the proposed chart is calculated using Algorithm 1.
Algorithm 1. Calculation of ARL 0 and ARL 1 1. Specify the significance level α ¼ 0:00273, number of characteristics p and sample size n.
2. Calculate the control limit (CL) for specified parameters given in step 1 using KDE procedure in section 2.
3. For 1000 repetitions, follow these steps: a. Generate the data which follow Multivariate Normal distribution with vector μ ¼ 0 and covariance matrix Σ ¼ I.
b. Calculate statistics Tr 2 from the generated data.
c. Calculate the run length (RL), number of samples until finding the first statistic Tr which is greater than CL.
4. Calculate ARL 0 by taking the average of RL over 1000 replications.

Define
where δ a shift in process.
6. For 1000 repetitions follow these steps: a. Generate the data which follow Multivariate Normal distribution with vector μ ¼ 0 and b. Calculate statistics T r from the generated data in step 6.a.
c. Calculate the RL', number of samples until finding the first statistic T r which is greater than CL.
7. Calculate ARL 1 by taking the average of RL' over 1000 replications.
8. Repeat step 1 until 7 for different value of p and n.

Control limit
In this section, the performance of the KDE control limit is evaluated using ARL 0 criteria. Tables 1 and 2 present the KDE control limits of the proposed chart. Since the simulation study is conducted using significance level α ¼ 0.00273, which corresponds to three-sigma, the ARL 0 equals to 370. Three sigma control limit is used in this study because it refers to the processes that run efficiently to create the highest quality of production goods. For several combinations of n and p, the control limits calculated using the KDE method always produce ARL 0 at about 370 which indicates that the KDE control limit is reliable for process monitoring. Moreover, it also can be seen that the value of control limits become smaller for the larger value of sample size n and it became larger as the larger value of quality characteristics p.

Performance of the proposed chart
This section provides the performance evaluation of the proposed control chart using the KDE control limit. The out of control ARL is calculated by adding shift to covariance matrix where δ ¼ 0:2 I. The simulation study is conducted for μ ¼ 0, Σ ¼ I over the various number of characteristics quality p and number of sample n. In addition, the KDE control limit is taken from Tables 2 and 3 for α ¼ 0:00237.

Level of significance sarathkumar
In this subsection, the performance of the proposed chart is evaluated using α ¼ 0:00237: Table 4 shows the ARLs of the proposed chart with n = 10 for the various number of quality characteristics. It can be known that the larger shift in the process, the faster the proposed chart to detect the out-of-control signal. Furthermore, the proposed chart become more sensitive for the larger number of quality characteristics p. This is confirmed by the value of ARL 1 for such condition. Table 5 shows the ARLs of the proposed chart with n = 20 for the various number of quality characteristics p. Similar to the previous case, it can be seen that the larger shift in the process, the faster the proposed chart to detect the out-of-control signal. Furthermore, the proposed chart become more sensitive for the larger number of quality characteristics p. Meanwhile, the performance of the proposed chart to detect a shift in the covariance matrix for n = 30 is presented in Table 6. The ability of the proposed chart to detect shift is increased as the larger number of sample size n and number of quality characteristics p. Table 7 shows the ARLs of the proposed chart with n = 50 for the various number of quality characteristics. For this case, the proposed chart shows great performance in detecting the shift in the process. This is shown by the proposed chart become very sensitive for the larger number of quality characteristics p. Moreover, the performance of the proposed chart to detect a shift in the covariance matrix for n = 100 is presented in Table 8. The oversensitive performance of the proposed chart to detect the small number of a shift in the covariance matrix is found for this case. It can be seen that the value of ARL 1 for this case is 1.00 except p = 3 for only 0.1 shift in the covariance matrix. Figure 2 shows the performance comparison of the proposed chart with a number of quality characteristics p = 3 for various value of sample size n. It can be seen from the figure that the value of ARL 1 for n = 100 is very small indicated that the proposed chart is very sensitive to detect the shift for such condition. In general, the proposed chart becomes more sensitive as the number of sample size increase.

Level of significance sarathkumar
In this subsection, the performance of the proposed chart is evaluated for αÞ0:00237: Simulation studies are conducted using a number of sample n = 10, number of quality characteristic p = 3, 4, 5, 7, and 10, as well as α ¼ 0:005; 0:01; and 0:05: Table 9 tabulated the performance of the proposed chart for α ¼ 0:005 with n = 10 for the various number of quality characteristics. The ARL0 is written as the bold number while the value inside bracket below the ARL0 is the estimated KDE control limit. It can be seen that for α ¼ 0:005, the ARL 0 produced is around 200. These facts prove that estimated ARL 0 produce by proposed control limit has a similar value to ARL 0 in the theory. The theoretical ARL 0 is equal to 1 α or in case is 1 0:005 % 200:According to the table, it can be concluded that higher number of quality characteristic p smaller value of ARL 1 . In other words the larger quality characteristics the more sensitive the proposed chart to detect shift in the covariance matrix.
Tables 10 and 11 report the performance of the proposed chart for α ¼ 0:01 and 0:05, respectively. Similar to the previous case, the estimated ARL 0 is around 100 for α ¼ 0:01 and around 20 for α ¼ 0:05: The pattern is same as the previous case, the proposed chart has a better performance for the larger quality characteristics.

Discussion
The previous sections present the simulation result of the control limit and performance of the proposed chart. From these results, there are several notes that can be used as further discussion. First, the control limit of the proposed chart always shows the consistent ARL 0 value at about 370 for α ¼ 0:00273. It also has the consistent ARL 0 for αÞ0:00273 by producing value at about 1 α : This fact indicated that the proposed KDE method in calculating the control limit has a great ability to obtain the approximation value of the control limit. This happens due to the KDE method employed can capture all changes on the distribution of statistics Tr and estimated the control limit based on the empirical density of the statistics proposed.
Second, the proposed chart has an outstanding ability to detect a shift in the covariance matrix, especially for the larger number of quality characteristics p and sample size n. This may happen due to the ability of the proposed statistics to detect small changes in the covariance matrix. By  this result, it can be said that the proposed chart has the great ability to detect the shift in the higher dimensional data.

Illustrative example
A numerical example is given in this section in order to illustrate the operation of the Tr(R 2 ) control chart with the KDE control limit. In this section, we use the ZA fertilizer production dataset in carbonation step to illustrate the use of the proposed chart. There are 44 samples with each of them has 12 observations. The quality characteristics measured in this dataset include CO 2 (g/L), NH 3 (g/L), the ratio of CO 2 and NH 3 , specific weight (kg/L), as well as temperature (°C). It can be seen that the quality characteristics are recorded with different measures.
First, by treating the first 20 samples as the in-control process, the control limit is estimated using the KDE method. Further, the estimated KDE control limit, which equals 14.1633, is used to monitor the full dataset. Figure 3 shows the monitoring process of the ZA fertilizer production dataset. Moreover, the conventional GV in Figure 4 also employed to monitor this dataset as the benchmark of the proposed chart.
According to the figures, it can be seen that the proposed chart can detect the changes in the covariance matrix by declaring three out-of-control samples. Meanwhile, the conventional GV chart did not detect any out of control signal. Furthermore, the conventional method produces very small and similar statistics. This fact can be proved that the proposed chart is more sensitive to detect changes in the covariance matrix than the conventional one. Also, this is confirmed the second drawback of the conventional method that is the multivariate variability becomes small due to the quality characteristic is a linear combination of the other variables.

Conclusion
In this paper, the Tr(R 2 ) control chart based on the squared correlation matrix with the trace operator is proposed in order to overcome the drawbacks of the existing multivariate variability chart. Kernel Density Estimation method is used to calculate the better control limit for the proposed chart. The simulation studies show that for the several combinations of a number of quality characteristics p and number of sample size n, the control limit using the KDE method always results in an ARL 0 at about 370 for α ¼ 0:00273. For αÞ0:00273; the proposed chart has ARL 0 near to 1 α : For the shifted process, the proposed chart demonstrated the great performance as shown by the ARL 1 value. The performance of the proposed chart becomes better for the larger number of quality characteristics and the number of sample size n in the process. The application of the proposed chart to monitor the ZA fertilizer production dataset shows that the proposed chart has a better performance to monitor the covariance matrix than the GV chart. For future research, the proposed chart will be applied to detect high dimensional big data due to its great ability for such a case. In addition, the use of the bootstrap resampling method is also appropriate for estimating the control limit of the proposed chart.