PROFILE MONITORING OF RESIDUALS CONTROL CHARTS UNDER GAMMA REGRESSION MODEL

The statistical control chart is considered one of the super-lative tools in quality control. Currently, control charts are being widely used in various areas, one of them being manufacturing processes. They are essential instruments that can impart crucial insights to quality controllers for maintaining productivity. The quality of a product or process can be cha-racterized by a relationship between two or more variables, which is typically referred to as a profile. Also, public health surveillance is considered another important area that widely used control charts. In this regard, they are very useful and reliable tools for detecting outbreaks of infectious diseases. On the other hand, the gamma regression model (GRM) is a po pu-lar model considered in medical and other fields. It is applied when the response variable is continuous and positively skewed and well fitted to the gamma distribution. This paper presents a scheme for monitoring the profile. Based upon the generalized linear model (GLM) in the case of two link functions: identity and log link function. Exponentially weighted moving average control charts (EWMA) are proposed using deviance residuals and Pearson residuals for detecting any disturbance in the control variable of the gamma regression model. A detailed simulation study is designed to scrutinize and evaluate the performance of the control charts in phase I analysis and in phase II under parametric maximum likelihood estimation (MLE) using the average run length (ARL) measure. It turns out that using deviance residuals under the identity link function seems more suitable than Pearson residuals. Also, with increasing the sample size, the percentages of out-of-control (OC) samples increased which is theoretically acceptable


Introduction
One of the most powerful tools in statistical process control (SPC) is the statistical control charts, the idea of graphical summary of information for monitoring and controlling of the production process [1] presented the term of «profile» as refer to the relationship between the response and one or more explanatory variable(s). [2] proposed the fundamental objective of profile monitoring is to check the stability of the functional relationship between the response and the explanatory variable(s) over time. Many studies have been done by researchers for monitoring different types of profiles such as: gamma profiles, Poisson profile and so on. In practice, control chart analyses are carried out either in phase I or phase II. In phase I, it is often interested in checking the stability of a process and identifying possible out-of-control samples. But, in phase II it is often interested in detecting an out-of-control condition fast. Average run length and probability of signal are the common criteria to evaluate performance of the control schemes in phases I and II, respectively. [3] provided a unified framework for phase I analysis control of generalized linear profiles. Besides the generalized linear models (GLMs), other types of models have also been used to represent profiles, such as simple linear regression, [4] and multiple regression, [5].
For detecting any disturbance in the control variable of the gamma regression model exponentially weighted moving average control charts (EWMA) are proposed. This methodology, EWMA, was first introduced by [6] who evaluated its properties and showed that the EWMA is an effective alternative to the traditional Shewhart control chart when small shifts in the process parameters are monitored. [7] also reached the same results. Various modification and supplemental criteria have also suggested that control charts based on moving average is also an effective chart in detecting small process change [8]. [9] stated that the Shewhart charts are slow in detecting small shifts as they use only the information contained in the last sampling point. The EWMA and cumulative sum (CUSUM) charts are time weighted control charts that were suggested as superior alternatives to the Shewhart chart when the detection of small shifts was desired. [10] studied the monitoring of EWMA control charts to obtain accurate average run length (ARL) values of ( x, s 2 ) and EWMA control charts. [11] stated that Shewhart control charts are used to detect large shifts in a process, whereas the CUSUM chart and the EWMA chart are more efficient to detect shifts from small to moderate in a process. [12] defined EWMA chart as a time weighted and a powerful tool for detecting small shifts in a parameter of a process more rapidly than Shewhart chart with an equal sample size. Therefore, in practice, gamma distribution can be used with great flexibility in the analysis of positive random variables. Thus, gamma regression models (GRM) are applied in a wide range of empirical applications, such as in the process of rate setting in the framework of heterogeneous insurance portfolios, which is the most important function of insurers, and in hospital admissions for rare diseases, [13,14]. [15] proposed the use of GLMs -based control charts for monitoring gamma distributed response variables. The procedure was only based on the deviance residual, which is a likelihood ratio statistic for detecting a mean shift when the shape parameter is assumed to be unchanged, and the input and output variables were related in a certain manner.
The statistical control chart is considered as they are essential instruments that can impart crucial insights to quality controllers for maintaining productivity. Also, public health surveillance is considered another important area that widely used control charts. In this regard, they are very useful and reliable tools for detecting outbreaks of infectious diseases. Therefore, research on monitoring the profile of different types of residuals in phase I and phase II analysis for very important regression model such as GRM which is considered one of GLM is relevant.

Literary analysis and statement of the problem
The importance of the study comes in relation to the subject of SPC research, charts are essential tools that are used to monitor process quality. Based on these charts, operators may immediately spot changes in quality characteristics and pursue high-quality production or service. Traditional control chart research typically assumes that quality characteristics follow a normal distribution, but, non-normal response outcomes can occasionally occur in processes, such as discrete count data. Traditional control charts cannot process non-normal data due to the distributional assumption constraint; thus statistical profile monitoring of non-normal response outcomes is desirable.
For this purpose, it is aimed to analyse the usage of residual control charts in conjunction with forecasting models to evaluate the features of production processes. The primary goal is to assess the effectiveness of exponentially weighted moving average (EWMA) and control charts for individual observations (CCIO) charts when applied to residuals of models to identify outliers in auto correlated processes. [16] designed and studied the performance of the EWMA control charts that monitor the rate of road crashes. They observed that EWMA control chart scheme is sensitive in detecting small shift in the process and it detects shift quickly for this type of dataset. And [17] proposed two types of residuals for gamma regression model; for which many link functions can be used, so they chose the identity and log link for their evaluation. As it is known, the residual analysis pursues to specify outliers and/or model misspecification. And it can be based on ordinary residuals, standardized variants, or deviance residuals. i.e., most residuals are based on the differences between the observed responses and the fitted conditional mean. In this regard the authors constructed more reliable goodness of fit measures and measures of explained variation for gamma regression models. [18] and [19] evaluated EWMA control chart based on the residuals for autocorrelated observations. Because they can have a significant effect on the control chart performance.
[20] focused on monitoring a process that is measured by a linear function and proposed to monitor the average of the residuals for the samples with EWMA and R-chart. [21] used EWMA control charts to monitor a drilling process to detect chatter vibration and to secure production with high quality. These control charts used the residuals obtained from an approximated autoregressive model [22] compared the effectiveness of the Shewhart x, EWMA, and geometric moving average (GMA) residual control charts for auto-correlation observations. They showed that the performance of EWMA charts was superior to the Shewhart x, and GMA residual charts for small shifts, however the performance of Shewhart x residual chart was superior to EWMA and GMA residual charts for large shifts.
[23] designed EWMA chart of Pearson residuals of a negative binomial regression. They founded that their proposed charts present better performances over EWMA charts for deviance residuals, with a remarkable advantage of the Pearson residuals, which are much easier to interpret and calculate. [24] applied three well-known count data models; Poisson, negative binomial, and Conway-Maxwell-Poisson to identify the best fit model for the number of crashes. Conway-Maxwell-Poisson was identified as the best fit model, GLM-based EWMA and CUSUM control charts were proposed using the randomized quantile residuals and deviance residuals. Their simulation study was designed for predictive performance evaluation of the proposed control charts with Shewhart charts. Their study results showed that the EWMA type control charts have better detection ability compared with the CUSUM type and Shewhart control charts under small and/or moderate shift sizes. [25] proposed GLM -based control charts for inverse Gaussian response variable. Deviance and Pearson residuals of the IG regression were considered as plotting statistics. An example related to the yarn manufacturing industry and a simulation study was designed and the performance of the proposed methods was compared with existing counterparts in terms of the run length properties. Moreover, run-rules were implemented to gain the efficiency of the Shewhart type GLM-based control charts under small-to-moderate shifts.
So, it is proposed to supplement the work of [17] via designing EWMA charts of residuals of a gamma regression model. Also, it is proposed to design EWMA chart of Pearson residuals such as [23] and [25] because of the aforementioned advantages, to evaluate its performance in relation to EWMA charts for deviance residuals.
All this suggests that it is advisable when conducting a study on the control variable of two different gamma regression models to depend on exponentially weighted moving average control charts (EWMA) for some different kinds of residuals such as deviance and Pearson residuals to detect any disturbance in that variable since EWMA is superior to any existing counterparts, especially when it is desired to detect small and moderate shifts using the average run length (ARL) or RARL measures.

The aim and objectives of the study
The aim of this study is to determine which of the proposed charts is an efficient chart than its existing competitive control charts for detecting out-of-control process quickly. The gamma distribution has been used to model, for example, the time between events, the size of insurance claims and rainfalls, and failure times of machine parts.
To achieve this aim, the following objectives are accomplished: -to investigate the performance of the two different types of residuals via EWMA charts by using RARL measure for first gamma model with the identity link function; -to investigate the performance of the two different types of residuals via EWMA charts by using RARL measure for second gamma model with the log link function; -to compare the performance of the two different gamma models with three different sample sizes: four, five, and ten via EWMA charts by using RARL measure; -to compare the performance of the two different gamma models with three different numbers of samples: twenty-five, thirty, and thirty-five via EWMA charts by using RARL measure.

1. Object and hypothesis of the study
Simulation modeling was carried out to compare between performances of two residuals control charts: deviance residuals control charts and Pearson residuals control charts for two gamma regression models. To evaluate their performance under parametric estimation method maximum likelihood estimation (MLE) by using two types of control charts designs: x-bar/s charts in phase I analysis and the EWMA control charts in phase II analysis using the average run length (ARL) measure. According to the simulation study data, it is generated by fitting two different gamma regression model using the maximum likelihood estimation (MLE) as a parametric regression estimation method in case of two different link functions; identity and log link functions. Then, extract two types of residuals: Pearson residuals and deviance residuals. After that, the exponentially weighted moving average control chart is design to monitor residual types by using x-bar/s chart in phase I analysis and using EWMA chart in phase II analysis. Then the ARL and RARL are computed for EWMA control charts.

2. Generalized linear models
Generalized linear model term was introduced by [26] who extend the scoring method to maximum likelihood estimation in exponential families. [27] proposed GLMs model as a class of regression models appropriate for application in the case of non-normally distributed response variables. These models based on probability distributions with an unknown location parameter j, if it belongs to the exponential family. The probability density function of the exponential family is most seen, as in (1): For some specific functions a(.), b(.), and c(.). If j is known, then the family is termed the linear exponential family and j is the natural or canonical parameter. From (1) there are E(y) = b′(j) where b′(j) = db(j)/dj, and var(y) = a(j)b′′(j).
GLMs models are structured around three components. a) the systematic component: this specifies the explanatory variables as a set of linear predictors. The linear pre-dictor, denoted by η is the non-random component which given by: or can be written in matrix notation η = Xβ.
b) the random component: this identifies the probability distribution of the response variable in which y belongs to the exponential family as independent random variables having the same form of distribution. Apart from the normal, other distributions such as binomial, Poisson, and Gamma can be handled; c) the link function: this specifies the relationship between the systematic component and value expected of random component. It says how the expected value of the response relates to the linear predictor of explanatory variables. The link function can be expressed as: where β′s are unknown coefficients and k′s are control variables, the application of the GLMs for modelling data process allows for better precision of estimates, nonlinear relationship between the variables and predicting their behaviour. The systematic component that makes up the regression model is the structure of control variables as a linear sum η and the relationship between variables in a GLMs can be expressed by a known function g(.), called the link function, which g(.) denotes a fixed link function between the mean of the response variable and the linear combination of the explanatory variables (the linear predictor η). [28] studied the gamma distribution that can be used for regression models with more flexibility than other models, such as the exponential and Poisson, and proposed GRM assumed that the pendent variable is gamma distributed and that it's mean is related to a set of regressors can be identity, the inverse, or the logarithm function. The model also includes a shape parameter, which may be constant or dependent on a set of regressors through a link function, and they considered GRM in with both mean and shape parameters are allowed to depend on covariates, in which these two parameters follow regression structures, are proposed in [29] under both classic and Bayesian approaches.

3. Gamma regression models
The Gamma distribution can be viewed as a generalization of the exponential distribution with mean 1/ψ, ψ>0. An exponential random variable with mean 1/ψ represents the waiting time until the first event to occur, where events are generated by Poisson process with mean ψ, while the gamma random variable y represents the waiting time until the i th event to occur. The probability density function of y is given by: where θ, ψ>0, Γ(·) denotes the gamma function, and I(·) is the indicator function.

Gamma Regression Residuals
Most residuals are based on the differences between the observed responses and fitted conditional mean. [30] mentioned that the analysis of residuals plays a major role in predictive model formation. When numerical-valued functions are fitted to sample data, all the information about lack of fit is contained in the residuals which can be used to provide necessary feedback on the modelling process. These residuals from a fitted model can be plotted to help detect unequal variances, relationships over time. There are several references to residual analysis can be found in [31].
Residuals may be employed as measure for model selection, play a significant role in model diagnostics and assessment of its fitting and used for testing the validity of the assumptions of statistical models. For example, residuals are used to verify homoscedasticity, linearity of effects, normality, and independence of the error. In classical linear models, residuals are usually standardized so that they become scale free and have the same precision, and this makes it more convenient to compare residuals at various locations in the region of experimentation. There are different types of residuals such as classical residuals, Pearson residuals, and deviance residuals.

1. Pearson Residuals
Residuals in GLMs were first discussed by [32], though ostensibly concerned with logistic regression models. Pearson residuals are the most used measures of overall fit for GLMs, which can be used to check the model for each observation, [33] Defined Pearson Residuals in GRM as follows: where  µ i is the fitted mean value and the denominator follows from the fact that y i = n i π i (1-π i ) the fitted variance function of y i . [17] defined alternative residual that was based on the deviance or likelihood ratio, which for GRM is given by:

2. Deviance residuals
. Two statistics used to assess the goodness of fit of the GLMs are the deviance and the Pearson chi squared statistic χ 2 . Linear models use raw residuals for testing and model diagnostic, whereas the GLMs provide several structures for residuals such as the Pearson, deviance, and likelihood. The two common types of residuals are then the components of these statistics. Other definitions for residuals in GLMs have been proposed in [34].

1. Generating the data
To conduct the simulation study, the observations X i , i = 1, 2, 3 were generated such that, the values of X 1 , X 2 , and X 3 were generated from a uniform distribution on the intervals (0, 30), (0, 15), and (10,20), respectively. Gamma regression model was built. This model has two different link functions: identity link function and log link function [17]. Different sample sizes ranges from moderate to large were chosen such as 125, 150, 175, 250, 300, and 350. The following are the steps by more details.

First
Step: Mean Structure Function with Identity Link Function: a) the gamma regression model with mean and shape structures given by formulas (6) and (7), respectively, is considered: where (6) describes the variable of interest Y and the regressors X of the mean structure. And (7) describes the regressors Z for the shape structure, with log-link function. b) the values Y i were generated from a gamma distribution with mean and shape parameters given μ i = 15+2x 2i +3x 3i , and θ i = exp(0.2+0.1x 2i +0.3x 3i ). Second Step: Mean Structure Function with Log Link Function: a) the gamma regression model with mean and shape structures given by (7), (8), respectively, is considered: ) the values y i were generated from a gamma distribution with mean and shape parameters given by: where -5, 0.2, and -0.03 are the value of the θ 0 , θ 1 , and θ 2 coefficients respectively.

2. Design of the exponentially weighted moving ave rage control chart (Phase II analysis)
To construct an EWMA chart, the design parameters of it are l and λ, the multiple of σ zi used in the control limits and the weighting or smoothing constant, respectively.
The following are steps to conduct the EWMA chart: a) compute the values of parameters form historical data through X-bar/S-chart such as the target mean, or process mean (μ 0 ) and the estimated sigma  σ ( ) by using the S-bar method; b) calculate the EWMA statistic (Z i ) and calculate the control limits (UCL, LCL) with the combination of (l = 2.962, λ = 0.2) for each sample(i) [8]; c) plot the statistic (Z i ) for each sample, with the control limits in place; d) declare the process to be in-control if LCL (i) ≤ Z i ≤ UCL (i) ; otherwise, declare the process to be OC.
If the process is declared OC, count the number of subgroups as the run length, i.e., the process remains in-control (IC) before it is declared to be OC. Then the ARL for EWMA control charts is computed.

3. 1. Number of samples equals twentyfive
The following Figures from Fig. 1-4  Under identity link function, Fig. 1 showed that (16 %) OC lower and (4 %) OC upper and Fig. 2 gave the same result as Fig. 1. But under log link function, Fig. 3 showed that for deviance 4 % of samples became OC (upper) and Fig. 4 showed that for Pearson 8 % of samples became OC (upper).

3. 2. Number of Samples Equals Thirty
The following charts from Fig. 5-8 showed the EWMA values for the 30 samples under ID, IP, LD, and LP res pectively.
Under identity link function, Fig. 5 showed that (7 %) OC lower and (3 %) OC upper for all samples and Fig. 6 gave the same result as Fig. 5. But under log link function, Fig. 7 showed that for deviance (7 %) of samples became OC lower and (10 %) of samples became OC upper. And Fig. 8 gave the same result as Fig. 7.  Under identity link function, Fig. 9 showed that (31 %) OC (upper) of all samples and Fig. 10 gave the same result as Fig. 9. But under log link function, Fig. 11 showed that 6 % of samples became OC (upper) and Fig. 12 gave the same result as Fig. 11.

4. Performance of exponentially weighted moving average charts
The following tables, Tables 1, 2, contain the results of two measures of performance for the charts: ARL and the relative ARL (RARL) for gamma regression models. Three different sample sizes were selected, four, five, and ten. To facilitate the comparison between different methods of estimating the ARL, let's use the following formula for RARL. RARL = Proposed estimate value/Deviance residuals under identity link function.
The following Table 2 showed the relative performance of ARL for all ID, IP, LD, and LP values relative to ID value.
As depicted by Table 2, and by comparing the RARL va lues for deviance and Pearson residuals under the two kinds of the link functions. It can be observed that, when using identity link function, the ARL of deviance residual is slightly less than the Pearson residual with various choices of sample sizes; 4 or 5 or 10. Also, it is observed that the lar gest (worst) value of ARL was for Pearson residuals, specifically with n = 4 or 5 and for both kinds of link function. And the smallest (best) values of ARL was for Pearson resi-  For more clarification, the relative average run length values have been shown graphically by Fig. 13-16.
From Fig. 13, it can be concluded that, at m = 25 there were the worst values for EWMA of log link function for both deviance and Pearson residuals. And from Fig. 14

Discussion of the results of profile monitoring of residuals control charts under gamma regression models
The control chart monitors the average, or the cantering of the distribution of data from the process. The bottom chart mo nitors the range, or the width of the distribution. Control charts for attribute data are used singly. To facilitate the comparison between different methods of estimating, let's use RARL, Table 2 showed the relative performance of ARL for all ID, IP, LD, and LP values relative to ID value. It's also the performance of the residual control charts was also monitored by figures. From Fig. 14 The EWMA chart is a suitable alternative to other charts for detecting inaccuracy, especially where minor shifts are of concern. It gives a flexible tool for depicting imprecision and inaccuracy. However, detecting imprecision using EWMA charts need for specialized adjustment. [22] compared the effectiveness of the Shewhart x EWMA, residual control charts for auto-correlation observations, and the comparison of the control charts was based on the ARL also.
One of the limitations inherent in the specific research is its application to large sample sizes. Most previous research confirmed that profile monitoring control charts give the best results for a sample of size four or five. An intense comparison could have been carried out by examining the performance of other types of residuals such as standardized residuals and working residuals and by using various measures of performance for evaluation as average time to signal and average number of units sampled until signal under other fitting methods such as Bayesian approach and weighted least square method.
Some results were unsatisfactory with identity and log link functions used in the case of a large sample size, so it can be suggested to find a new link function, but it may be difficult mathematically. Also, the bootstrap method that is scientifically proven to improve the results of the performance estimators may be proposed to use.
Generally, and from Table 2 it can be concluded that the deviance residuals results are more accurate than the Pearson residuals results. Also, the different between the two gamma regression models emphasize that EWMA is an effective alternative to the traditional Shewhart control chart when small shifts in the process parameters are monitored which is compatible with [6] and [7] results.
It is worth mentioning regarding the use of GLMsbased control charts for monitoring gamma distributed response variables and considered as one of the advantages of this study is the use of two types of residuals unlike [15] that proposed to monitor only the deviance residuals comparing with the original observations using the classical Shewhart chart. Another advantage of this study is the residuals monitoring using EWMA chart which is more efficient (more rapidly) to detect small and moderate shifts in a process with an equal sample size than Shewhart chart [11,12,18,19].
In our view, the main drawback to this study as previously mentioned is some unsatisfactory results when using a sample size greater than 5. So, it is maybe suitable to recommend proposing a new joint EWMA chart or monitoring other types of residuals to try to overcome this disadvantage.

1.
For the first gamma model, from the simulation result with the identity function, it can be generally concluded that the EWMA results for deviance residuals gave same percentages for OC (lower or upper) samples such as EWMA results for Pearson in most of cases. But at the other cases, specifically, at (n = 10 & m = 30) OC samples increased by one sample (lower) for deviance residuals. This is probably due to the greater efficiency of the deviance residuals over the Pearson residuals with the identity link function.
2. For the second gamma model, from this study results, with the log function, EWMA results for deviance residuals were identical with EWMA results for Pearson in approximately 50 % of cases. At the other 50 % of cases number of OC samples increases sometimes with deviance residuals and sometimes with Pearson residuals.
3. At sample size five, the identity link function gave OC samples more than the log link function. And at sample size ten, the identity link function gave OC samples more than the log link function. This may give an indication that using log link function instead of identity link function gives more accurate results with small and large sample sizes.
4. At any number of samples and by increasing the sample size, the percentages of OC samples increased with identity link function. This didn't satisfy with log link function. This also may give an indication that using log link function instead of identity link function gives more accurate results with any number of samples. Accordingly, it can be said that both flexibility and usefulness are achieved with using the log link function. Usually, the choice of link function is depending on the problem and the data at hand.

Conflict of interest
The authors declare that they have no conflict of interest in relation to this research, whether financial, personal, authorship or otherwise, that could affect the research and its results presented in this paper.

Financing
The study was performed without financial support.

Data availability
Manuscript has no associated data.