A useful empirical Bayesian method to analyse industrial data from saturated factorial designs

Article history: Received March 1 2012 Received in revised format April 1 2013 Accepted April 12 2013 Available online April 14 2013 The use of saturated two-level designs is very popular, especially in industrial applications where the cost of experiments is too high. Standard classical approaches are not appropriate to analyze data from saturated designs, since we could only get the estimates of the main factor effects and we would not have degrees of freedom to estimate the variance of the error. In this paper, we propose the use of empirical Bayesian procedures to get inferences for data obtained from saturated designs. The proposed methodology is illustrated assuming a simulated data set. © 2013 Growing Science Ltd. All rights reserved


Introduction
Factorial designs have been extensively used by experimental researchers in many areas of interest such as agriculture, engineering, medical research, industrial research (Fisher, 1926(Fisher, ,1935;;Yates,1935Yates, ,1937)).Box et al. (1978) described the systematic exploration of factorial designs in industrial applications.Some important applications of factorial or fractional factorial experimental design techniques in manufacturing industries are presented in the literature.Philpott et al. (1996) employed a factorial design approach to identify key cost drivers of a process and to develop practical cost models from contract quotes.Kleijnen e Standridge (1988) used factorial designs to simulate a Flexible Manufacturing System.Feng et al. (2003) used a factorial design to illustrate the goodness of Neural Networks Modeling of Honing Surface Roughness Parameters defined by IS0 13565.Pei et al. (2003) employed a factorial design to reveal the main effects and the interaction effects of four factors on the quality of starting silicon wafers.Chan e Chan (2003) presented a simulation modelling and analysis of a serial production line in a printed circuit board (PCB) factory.Factorial designs also have been applied in simulation techniques to evaluate the performance of existing manufacturing systems to find out active factors that have great impacts on the current operational problems.Nazzal et al. (2006) integrated simulation modeling, factorial design, and economic justification tools to build a comprehensive framework for strategic capacity expansion.Ayanso et al. (2006) used computer simulation and a full factorial experimental design to study and to define Inventory rationing policy.
There are also other important applications of factorial designs in industrial engineering.Bagici and Işık (2006) employed factorial designs to investigate the surface roughness when orthogonal cutting tests were carried on unidirectional glass fibre reinforced plastics (GFRP).Lin et al. (2007) used fractional factorial experiments to propose an efficient approach to develop a robust plasma spraying coating process.Datta e Bandyopadhyay (2008) applied factorial designs to evaluate an optimal parameter combination to obtain acceptable quality characteristics of bead geometry in submerged arc bead-on-plate weldment on mild steel plates.Nagesh e Datta (2008) proposed an integrated approach based on the use of Design of Experiments (DOE), Artificial Neural Networks (ANN) and Genetic Algorithm (GA) for modeling Gas Metal Arc Welding (GMAW) processes.Roy et al. (2010) presented a fractional factorial design approach for an inventory model (of a volume flexible manufacturing system for a deteriorating item with randomly distributed shelf life, continuous time-varying demand, and shortages over a finite time horizon) along with its practical implication.Zhang et al. (2010) investigated important operating variables in the electrochemical treatment of acrylic fiber manufacturing wastewater (AFMW) with boron-doped diamond (BDD) electrode.Jayabal et al. (2010) used factorial design methodology to evaluate mechanical and machinability characteristics of hybrid composites in India.Erginel (2010) applied factorial design to analyse several materials and methods used for packing of products in order to discover the optimum level of packing materials to minimize damage to the product.Galanis e Manolakos (2010) employed factorial design in the development of a surface roughness model for turning of femoral heads from AISI 316L stainless steel.Amari e Mohtashami (2011) presented a multi-objective formulation of the buffer allocation problem in unreliable production lines.Factorial designs also have been used to build a meta-model for estimating production rate based on a detailed, discrete event simulation model.Savic et al. (2012) applied the experimental design principles in pharmaceutical development and discussed the impact of these principles on pharmaceutical legislation.A special kind of factorial designs is given by saturated fractional factorial designs used in industrial applications when we have only a very small number of experiments due to time and costs.
Inferences on the effects of the different factors on the response variable of interest have been explored in the literature using different approaches.Daniel (1959) introduced the graphical method (Q-Q plots) of the half-normal plot to explore the important factors on a response variable of interest; fractional replication was first discussed by Finney (1945).When we have primary interest only on the main effects, we could use saturated factorial designs (e.g., Plackett & Burman,1946).The use of saturated designs has become very popular for screening factors, especially in industrial applications where the observations usually are very expensive to obtain (see for example, Box et al., 1978;Daniel, 1959;Wu & Hamada, 2000 ;Cox & Reid, 2000).This is the case when the number of factors is too large or when we have destructive tests.A saturated design is a fractional factorial design in which the number of parameters in the main effect models is equal to the number of runs.In this paper, we consider saturated designs in which k = n -1 main effects are considered in n experimental units or experiments without replicates.In such designs, all information is used to estimate the main effect parameter, leaving no degrees of freedom to estimate the error variance.The classical analysis allows only the estimation of main effects under the assumption that interactions are negligible.A well-known two-level saturated design is based on a work developed by Plackett and Burman (1946) where the constructed designs use Hadamard matrices of order n, where n is a multiple of 4.
In this paper, we propose the use of empirical Bayesian methods to analyse data from a saturated design, since the use of saturated classical approach based on least squares estimation usually only permits the estimation of the main effects.According to Miguel (2006), methodologically this study can be classified as pure but with practical applications and objectively descriptive, and taking a quantitative approach.Bertrand and Fransoo (2002) defined quantitative research in production engineering where a problem is modeled whose variables present causal and quantitative relationships.In general, quantitative research uses mathematical, statistical, or computational modeling (simulation)-specifically in this paper, statistical modeling will be adopted.
The paper is organized as follows: in section 2, we introduce a simulated data set from a saturated twolevel design to motivate our approach; in section 3, we introduce a classical analysis for data from a saturated design; in section 4, we introduce a Bayesian analysis assuming conjugated or other priors for the parameters of the model; in section 5, we introduce an empirical Bayesian approach; in section 6, we analyse the data from a saturated design introduced in section 2; finally, in section 7, we present some concluding remarks

A simulated example
Consider a 12-run Plackett-Burman design and a simulated data set from the model 1 2 3 Y = 2x + x + 1.5x + ε , where  ~ N(0,0.25 2 ) and, 1 1 when factor A uses "low" level (-) 1 when factor A uses "high" level (+) x      In the same way, we obtain the codes for factors B to K. From this model used to simulate the data, we observe that the factors A, B and C are active in the experiment.The simulated data are given in Table 1 (data set introduced by Baba & Gilmour, 2006).The general model for the data from a two-level saturated design is given by, where i x denotes the i-th factor main effect and ε ~N(0, 2 ).The model can be expressed in matrix notation by, where X is a n  p matrix showing the levels at which the factors are fixed,  is a p  1 vector of parameters and y is n  1 vector of observations.

A classical analysis
The use of least square estimation permits to obtain estimates for the main effects, but there are no degrees of freedom to estimate the error.This is a great difficult to get inferences from data of a saturated design, that is, the usual analysis of variance (ANOVA) cannot be used.Assuming that the matrix X´X is nonsingular, as is the case of Plackett-Burman design, the least squares estimates of the main effects are given by, -1 β = (X ´X ) X ´y . (3) From the simulated data of Table 1, we get: 0 = 0.057; 1 = 2.005; 2 = 1.028; 3 = 1.548; 4 = 0.031; 5 = 0.091; 6 = 0.071; 7 = 0.021; 8 = -0.069;9 = -0.010; 10 = -0.011and 11 = 0.031.The the least squares estimates were obtained using the MINITAB ® software.In practical work, usually the researches consider the use of normal plots to decide by the important factors, but the interpretation of the normal plots depends on how strongly the experimenter believes in factor sparsity.A possible alternative to analyse data from saturated two-level designs is the use of Bayesian methods.

A Bayesian analysis
The likelihood function for β and 2  is given by, Expanding the quadratic form (y-Xβ)'(y-Xβ) in (4) , we have, where β is given by Eq. ( 3) and Q =(y-Xβ)'(y-Xβ) is the residual sum of squares.
Note that for saturated designs, we can estimate the vector of parameter , but the quantity Q is always zero, which means that the error variance cannot be estimated from the likelihood and the analysis used in the general linear model is not appropriate for saturated designs.In this way, we assume a Bayesian approach to analyse data from a saturated design.
Different priors could be considered to analyse data from saturated designs (see Baba and Gilmour, 2006), but since data from saturated designs provide only limited information, the interpretation of these data depends heavily on the prior assumptions.
Among the different priors considered by these authors, a conjugate prior is given (from (5) ) by, with hyperparameters a > 0, d > 0, m є R p and V is a p p positive definite matrix (a Normal -Inverse Gamma distribution).Assuming different values for the hiperparameters of the prior (6) , the use of conjugate priors are very inflexible since very informative priors can lead to the posterior being very vague and centered in the wrong place (see Baba & Gilmour, 2006).
Other priors as a finite mixture of densities or non-conjugate priors are also considered in the literature, but the obtained posterior summaries are heavily dependent of the choice of the prior for β and 2  .Observe that in a saturated design, we are using n observations to estimate n + 1 parameters, that is, in absence of any prior knowledge, the data do not provide any information.

Use of empirical Bayesian methods
To get some information for the variance 2 of the error for data of a saturated design, let us assume the following procedure: among the g experiments considered in the saturated two-level design, we have for each factor, g/2 values in the "high" level (+) and g/2 values in the "low" level (-).From these g/2 values in each level "+" or "-", we get its standard deviations denoted by s  l and s  l , l = 1, 2, . . ., K.
Denoting by y  l i and the g/2 values in the "high" level + for factor l, we have, In the same way, denoting by y l i the g/2 values in the "low" level (-) of factor l, In this way, we have K values 2 s  l and 2 s - l , that is, a total of 2K sample variances, or 2K quantities.From these 2K quantities, we get the sample mean denoted by A and the sample variance denoted byB 2 , which can be used to find appropriated values for the hiperparameters of the prior distribution for the variance 2  of the error in Eq. ( 1).Assuming a gamma prior distribution for 2  , that is, where E[ 2 ] =a/b and var[ 2 ] =a/b 2 , we get values for the hyperparameters a and b by solving the following equations: (10)

Analysis of the data of Table 1 -Discussion
Let us assume the simulated data of Table 1 considering a 12-run Plackett-Burman design with model Considering the g = 12 observations in factor A, we have 6 observations in level "high" or "+" given by 1.7502, 2.7130, 4.4825, 1.4302, -0.5429 and 2.5350, from where we get 1 s  = 1.661 (see ( 7) ).In the same way, considering the 6 observations in factor A in the level "low" or "-" we have 0.4377, -4.4129, -1.4384, -2,4942, 0.8998 and -4.6797, and 1 s  = 2.362 (see ( 8)).With the same approach, we get for other factors B to K the values s  l and s  l , l = 2, 3, . . ., 11.In  Considering the squares for the 22 quantities given in Table 2, we get a sample mean A = 8.311 and a sample variance 2 B = 3.260.From Eq. ( 10) , we get the values of the hyperparameters of the gamma prior distribution given in Eq. ( 9) for the variance 2 of the error in Eq. ( 1) , given by a = 6.4994 and b = 0.7820.Also assuming normal N(0,10 6 ) priors for the regression parameters  l = 0, 1, . . ., 11 in Eq. (1) (that is, very non-informative priors), we use Markov Chain Monte Carlo (MCMC) methods to get the posterior summaries of interest (see for example, Chib & Greenberg, 1995or Gelfand & Smith, 1990).
Using the Winbugs software (Spiegelhalter et al., 2004) we simulated 5,000 Gibbs samples (taking every 10 th sample) of the joint posterior distribution for  and 2 , after a "burn-in period" of size 5,000.Convergence of the Gibbs sampling algorithm was monitored by observing the traceplots of the simulated samples for each parameter.In Table 3, we have the posterior summaries of interest.In Table 4, we have the observed and predicted values , where ˆl  are the Monte Carlo estimates of the posterior means for , 0, ,11 l l    based on the 5000 simulated Gibbs samples given in Table 3, assuming a Gamma (6.4994; 0.7820) prior for and normal N(0,10 6 ) priors for β l , l = 0, 2, . . ., 11 (use of the empirical Bayesian method).Let us denote this model as "model 1".From the results of Table 4, we observe very good predictions assuming "model 1", that is, good inferences for the data from a saturated two-level design.In Table 4, we also have the predicted values considering other priors for 2 and normal N(0,10 6 ) priors for β l also considering the use of the Winbugs software (5,000 Gibbs samples after a "burn-in period" of size 5,000).Another possibility is to assume a uniform U(0, 1000) prior for and normal N(0,10 6 ) priors for .Let us denote this model as "model 2".A third model denoted as "model 3", is considered assuming a uniform U(0, 1000) prior for 2 and N(  l , 0.01) priors for  l , where   l are the least squares estimates for β l , l = 0, 2, . . ., 11 (see section 3), that is, very informative priors for the regression parameters, but very non-informative prior for 2 .From the results of Table 4, we observe that the predicted values assuming "model 2" and "model 3" are very different of the observed values, that is, we get bad predictions.From these results, we observe that the use of the proposed empirical Bayesian method could be a powerful methodology for applications of saturated two-level designs.It is also important to point out that the obtained 95% credible intervals considering "model 1" (empirical Bayesian model) show that the factors 1 2 , x x and 3 x are active in the experiment (in agreement to the true values   ,   and   given by the model used to simulate the data) and very accurate posterior means for β 1 ,β 2 and β 3 .

Concluding Remarks
The use of saturated factorial designs has been extensively used by industrial researchers and engineers as a powerful methodology for screening factors, especially in the presence of a great number of factors.Usually, we get some information about important factors on the response variable of interest using normal Q-Q plots.The use of these normal plots in the interpretation of factorial two-level experiments usually could be very subjective and in many cases where we are in trouble to find the active factors on the response of interest.The use of the empirical Bayesian approach introduced in this paper could be of great interest in applications.We also observed very good predictions and inferences for the parameters of interest using our proposed methodology.

Table 2
, we have these 22 quantities ( s  l , s  l ) for the 11 factors.

Table 2
Estimated quantities s  l and s 

Table 3
Posterior summaries (use for empirical Bayesian methods)