Process adjustment by a Bayesian approach

Abstract: In a production or measure situation, operators are required to make corrections to a process using the measurement of a sample. In both cases, it is always difficult to suggest a correction from a deviation. The correction is the result of two different deviations: one in set-up and the second in production. The latter is considered as noise. The objective of this paper is to propose an original approach to calculate the best correction using a Bayesian approach. A correction formula is given with three assumptions as regards adjusting the distribution: uniform, triangular and normal distribution. This paper gives a graphical interpretation of these different assumptions and a discussion of the results. Based on these results, the paper proposes a practical rule for calculating the most likely maladjustment in the case of a normal distribution. This practical rule gives the best adjustment using a simple relation (Adjustment = K*sample mean) where K depends on the sample size, the ratio between the maladjustment and the short-term variability and a Type I risk of large maladjustment.


Introduction
Statistical process control has always taken an interest in the detection of out-of-control situations. Since Shewhart (1931), who first proposed the eponymous control chart, many contributions have been made to detect a process that is deemed as out of control. Several types of control charts have improved the Average Run Length (ARL) such as the EWMA chart proposed by Lucas and Saccucci (1990) and the CUSUM chart (Hawkins & Olwell, 1998;Page, 1962). More recently, several authors have contributed to the improvement of the concept and suggested new advances on control charts, either to reduce the size of samples (Torng, Lee, & Liao, 2009), to improve the ARL (Nezhad & Niaki, 2010) or to adapt them for specific situations such as monitoring the mean of a measurable quality characteristic × under 100% inspection (Wu, Wang, & Jiang, 2010). Process adjustment is critical for small run, so some research turned to the improvement of control cards for small series (Pillet, 1996).
The adjustment of the process after detecting a situation that is out of control was also studied in parallel to the work on the detection. The statistical approach to the tools setting was first addressed by Grubbs (1954). Grubbs proposed a sequence of adjustments using successive approaches. More recently, Castillo (1998), Pan (2002), and Trietsch (1998) have proposed a detailed analysis and an extension of the Grubbs rule for the adjustment of machines. Recent contributions have focused on the adjustment of the tools using the in situ measurement (Kibe, Okada, & Mitsui, 2007) and on the interest of the use of neural networks (Liao, Xie, Zhou, & Xia, 2007;Lin & Lin, 2009).
It is not easy to find the best adjustment when the value of the deviation from a given target has been estimated. This distance is the result of maladjustment and the process spread. Yet this problem arises more frequently with the use of machines in autonomy using automatic monitoring. The objective of this paper is to propose an approach for estimating the best correction using a Bayesian approach.

The adjustment problem
The adjustment problem can be applied to different situations such as: • Monitoring a production parameter, • Controlling a gauge tool, • Monitoring the bias on a measuring device.
Production is considered as a random process described by an X variable, depending on a θ adjustment parameter.
To simplify the writing of mathematical expressions, the parameter θ represents the deviation from the target (which corresponds to a change of origin for the parameter). θ is unknown, the result of this study will attempt to estimate it.
The variable X is distributed according to an  distribution depending on the parameter θ ( Figure 1).
It represents the random deviation of production for a given setting, θ: • Mean: μ = θ (we assume that the distribution is symmetric because X is often a sample mean).
• Variance: σ²/√n (if n is very small, the previous symmetry assumption can be questioned).
• A particular individual estimate of X will be written x i .
If we take a sample of n individual values for the variables X 1 , … Xi … X n , considered a priori as totally independent, then we can determine the adjustment is to analyse the event A, a posteriori.
Adjuster aggregates the information of the sample by retaining only the average (or median) of the sample. This simpler method of treatment leads to a loss of information. We will replace (1) by:  The information obtained using this sample informs us, a posteriori, of the probable values of θ.
Before analysing this sample, we had ideas about a priori the parameter θ, which were: • θ belongs to a domain defined as D θ (corresponding to the knowledge of the setting: set point, tracking drifts, expertise, etc.).
• g(θ) being the density of probability on the interval.
After reviewing the event A, a new a posteriori density can be calculated for the parameter θ, density conditioned by knowledge of the event A (Delsart & Vaneecloo, 2011): Assuming that all the individual variables X i are distributed according to the same law , it means taking a sample over a relatively short period.

θ parameter estimation
There are several possibilities for estimating the parameter θ. For example: We propose to retain the most likely value of θ on the definition of the domain D θ (it may differ from the mean in the case of non-symmetrical distribution). This is equivalent to finding g(θ/A) maximum; that is to say, finding the maximum of the numerator for the expression (3). It is then necessary to know the probability law of the parameter θ and the domain boundaries of D θ . This paper examines three assumptions for g(θ): uniform, triangular and normal distribution ( Figure 2).

Best adjustment
In this case, where g(θ) is a constant, it may be omitted in order to look for the most likely value of θ. This is equivalent to maximizing the function: In the case where the estimated parameter is the mean, the value of the parameter θ for which this function (5) reaches its maximum, is called the "maximum likelihood estimator".
(2)  Based on the assumption that the variable X =x is normally distributed with a mean θ and a variance σ²/n: To find the value of θ that maximizes this function, it is easier to find the maximum of the logarithm (the logarithm function is monotone): The first derivative gives: As the second derivative is negative, it is a maximum for: In this particular case, with the assumption of uniform distribution for the parameter θ, the best adjustment corresponds to the maximum likelihood estimator.

Graphic illustration with a uniform distribution for g(θ)
Let us assume that the distribution of X =x ex is symmetrical in relation to θ. In addition we will use the following variable change: With X P as the random short-term distribution, we will retain a normal marginal distribution: and for the maladjustment θ, an uniform distribution The distribution function for the pair (θ,X P ) is given by: For a particular instance of X, X =x ex , the information for the pair (θ,X P ) is not detailed. Therefore, several values of θ can correspond to the equation: Figure 3 illustrates all the possible solutions for x = 3 μm.
x ex = + X P http://dx.doi.org/10.1080/23311916.2015.1096999 In the case of a uniform distribution setting, the best setting =x ex corresponds to the maximum likelihood estimator.

Best adjustment
In this assumption, g(θ) is not a constant, so the equation to maximize is changed and becomes: The likelihood of a strong adjustment decreases proportionally to θ: With the same assumptions as previously stated, (the variable X =x ex is normally distributed with a mean θ and a variance σ²/n) we obtain on the half domain (0; δ): By taking the logarithm (the logarithm function is monotone): The first derivative gives: By canceling this derivative, we have a quadratic equation where the physical solution is: g( ) = 1 − + 1 for ∈ 0;

Graphic illustration with a triangular distribution g(θ)
We will take the same example as above and the same experimental values. Only the distribution of the pair (θ,X P ) is modified, and the density near the point (0.0) is at its maximum.
With the same sample mean of 3 μm, we have the picture given in Figure 4. Figure 5 gives a view from the plane containing θ and f(θ,X P ) allows us to estimate the maximum likelihood.
In the previous example, the correction suggested is practically minimal compared to the sample mean for small values of the mean. However, when the experimental sample mean is large (close to δ), the difference between the average sample and the suggested correction is bigger (Figure 6).
For different values of the sample means (from 0 to 8 μm) with σ = 4 μm and n = 5, Figure 7 gives the density of the probability of the maladjustment. The maximum likelihood corresponds to the peak of the curve.

Generalization setting in the triangular case
Equation (20) gives the likelihood of a maladjustment for the case where x > 0. To reflect a negative maladjustment, we propose the following modification:

Sample average
Equation (20) was found by canceling the derivative. However, when x tends towards 0, the limits where x → 0 + and x → 0 − are not identical and are different from 0. In this case, the maximum likelihood will be forced to 0. Rule (21) must be slightly modified:

Assuming a normal distribution for g(θ)
With g(θ) corresponding to a normal distribution  (0, ), reflecting a probability for a strong extremely low maladjustment, the calculation of the maximum likelihood for θ gives (Duret, 2012): Keeping the same assumptions (the variable X =x ex is normally distributed with a mean θ and a variance σ²/n), then the L(θ) is: By taking the logarithm: The first derivative gives: By canceling this derivative, the solution is:

Discussion
The choice of the a priori distribution is relatively important for the calculation of the best adjustment, and the difference increases when the ratio σ/δ increases. For example, it is obvious that if the short-term variability of the production process (or measurement) is low compared to the variations of the parameter θ (σ << δ), the difference between a uniform distribution and a triangular one is not very significant. Here, correcting the value of the sample average would be a good idea. On the other hand, if the setting is easily compared to the process spread, the correcting of the average is excessive. Using the previous example, the process spread is doubled (σ = 4 μm instead of σ = 2 μm, δ = 8 μm, n = 3). The corrections are given in Figure 8.
In addition to the choice of distribution law, the choice of the uncertainty involved setting strongly in the calculation. Several approaches deserve further developments: (1) Knowledge of the probability distribution can be obtained from an observation of historical data.
(2) Determining the prior distribution can be enriched with the knowledge of the sample average at the sampling time.
The second point allows refining the calculation of the best adjustment. In the case of a normal distribution, Equation (27) was demonstrated. By taking X = z , z depending on the Type I risk α.
(25) l n (L( )) = l n � √ n � − l n ( . . We suppose that the knowing of the sample mean gives us information on the maladjustment distribution. The Type I risk α is the risk of very large maladjustment.

Equation (27) becomes:
Posing: r =x∕ (Pillet & Pairel, 2011) this means that the offset is evaluated a posteriori in shortterm sigma: Figure 9 indicates the proportion K of the most likely adjustment with the ratio r for z α = 1.

Validation by Monte Carlo simulation
To validate the various assumptions used in this paper, it is necessary to compare the result in terms of variability in the same initial maladjustment situations. To achieve this validation, we conducted a Monte Carlo simulation with the following conditions: • Simulation of a small run of 10 workpieces (n = 1).
Assumptions tested are the following: • Uniform distribution M =x.
(28) Results of the simulation are global variation (standard deviation) and the average of a million measurements. Less the standard deviation is large, better is the assumption. More close to 0 is the average, better is the assumption (see Table 1).
Best result is for Equation (29) with z α = 3, worst result is for the uniform assumption and only the Gaussian assumptions improve the results vs. the Shewhart benchmark.

Conclusion
In this paper, we proposed a method of calculation of the most probable maladjustment using three hypothesis of a priori distribution of the maladjustment. The rules we proposed are heavily dependent on the arbitrary choice: • The domain of definition D θ = ±δ for the parameter θ.
• The probability density g(θ) for the parameter θ.
Between an assumed normal distribution for g(θ) (Duret & Pillet, 2011;Pillet & Pairel, 2011) and an assumption of uniform distribution, the choice of a triangular distribution might seem a good compromise, but simulation shows the superiority of the Gaussian hypothesis with z α = 3.
Future development must be done in order to propose a method to identify the good probability density and the parameter K.

Funding
The authors received no direct funding for this research.