Nonlinear Split-Plot Design Model in Parameters Estimation using Estimated Generalized Least Square-Maximum Likelihood Estimation

This research aimed to provide a theoretical framework for intrinsically nonlinear models with two additive error terms. To achieve this, an iterative Gauss-Newton via Taylor Series expansion procedures for Estimated Generalized Least Square (EGLS) technique was adopted. This technique was applied in estimating the parameters of an intrinsically nonlinear split-plot design model where the variance components were unknown. The unknown variance components were estimated via Maximum Likelihood Estimation (MLE) method. To achieve the numerical stability in the iterative process of estimating the parameters, Householder QR decomposition was used. The results show that EGLS method presented in this research is available and applicable to estimate linear fixed, random, and mixed-effect models. However, in practical situations, the functional form of the mean in the model is often nonlinear due to the dynamics involved in the system process.


I. INTRODUCTION
A Split-Plot (SP) experiment is simply a blocked experiment. The experimental units are made of blocks for a compartment of the factors. Thus, there are two levels of experimental units. The blocks are usually called the Whole Plot (WP), and the experimental units within blocks are called split units, subplots, or SP. This implies two levels of randomization on the two levels of experimental units. One randomization determines the assignment of block-level treatments to WP. Then, as always in a blocked experiment, randomization of treatments to SP experimental units occurs within each block or WP (Montgomery, 2008;Jones & Nachtsheim, 2009). Hence, those are designed experiments that can be viewed as two combined or overlaid experiments on each other. In addition, Hinkelmann and Kempthrone (2008) put it as a superimposition of two similar or different forms of designs. Many researches have been done in estimating the parameters of the linear SP design, response surface, and optimal models respectively (Jones & Nachtsheim, 2009;Jones & Goos, 2012;Lu, Robinson, & Anderson-Cook, 2014;Lu, Anderson-Cook, & Robinson, 2011).
SP design in an experiment and its analysis are not new. It is applied in various fields outside agriculture from which it is originated from Lu, Anderson-Cook, & Robinson, 2012;Wang, Kowalski, & Vining, 2009;Myers, Montgomery, & Anderson-Cook, 2009). Jones and Nachtsheim (2009) identified that all experiments in industries were SP experimental design. This is because it contains two types of factors, which are Hard-To-Change (HTC) factor and Easy-To-Change (ETC) factor. The HTC is the WP factor, and the ETC factor is the SP and subplot factor. Milliken and Johnson (2009) stated that the SP experimental design was a multilevel design because of its hierarchal design structure. It has two units of different experimental sizes that the large experimental unit size is the WP and the small experimental unit size is the subplot or SP. Recently, Kulahci and Menon (2017) applied trellis plots to visualize multivariate data by allowing the condition during the preliminary data analysis stage of the SP experimental design data. Moreover, SP experimental design is applied to the equipment test to study three different explosive powers influenced by four different intensifiers and four different steel balls (Gao, Yang, & Shi, 2017).
Moreover, in the line of cost implication for the design, it has been found that SP experimental design is cheap regarding cost compared to other experimental design types. Despite its cheapness, it retains its statistical efficiency and adequacy compared to other randomized design of experiments. Anderson (2016) compared the power efficiency of five factors in SP design (one HTC factor at three levels and each four ETC factor at two levels) and randomized design. He found out that the SP design had greater power efficiency for all the subplot factors compared to the randomized design. However, the SP design showed great weakness in power efficiency for the WP factor compared to the completely randomized design. The interaction between the HTC and ETC factors showed great power efficiency compared to the completely randomized design. He identified this interaction power efficiency for the SP experimental design as a bonus and a viable alternative to run an experiment as a completely randomized design when there was the presence of HTC factor like temperature. The same findings were seen in the research of Anderson and Whitcomb (2014). They employed power to the right size design of the experiment.
According to Anderson-Cook, Borror, and Montgomery (2009), there should be additional consideration for different costs associated with making changes at the WP and sub-plot levels. It is observed that for a completely randomized experiment, the total number of observations can be a practical substitute for the cost. Consequently, many of the optimal design criteria use the total number of design points as the penalty to balance the decrease in prediction variance for larger designs. Furthermore, in some SP design cases, it may be prohibitively expensive to do the equipment set-up for each of the WP. Therefore, the total number of WP in the designed experiment can be the only contributor to cost in these situations. In other situations, WP may be slightly more expensive than a sub-plot. However, the total cost of the experiment is more logically to be a weighted average of the number of WP and the number of sub-plots runs.
Estimating The MINQUE and MIVQUE methods are developed to find unbiased quadratic estimators, which are invariant and minimize some matrix norm. Rasch and Masata (2006) stated that unfortunately, the solution in the most interesting cases depended on the unknown variance components. If they were replaced by estimates from the data, the solution would be neither unbiased nor quadratic any longer. However, they only identified the MINQUE and MIVQUE in variance components estimation. The same thing applied to other variance component estimation methods because in almost all cases of modeling with variance components, the population variance components were unknown. Therefore, since the estimated values obtained from the design data, which could have outliers, replacement of missing values, and others. The solution may not be unbiased for all other estimation methods. Weerakkody and Johnson (1992) presented a twostep residual-based approach for estimating the WP and subplot error variances separately. However, as identified by Ikeda, Matsuura, and Suzuki (2014), the estimator of the WP error variance in approach by Weerakkody and Johnson (1992) can be obtained only for the case of a > p (p = 1 + p1 + p2), where a is the number of runs in each WP unit.
Then, p 1 and p 2 are levels of the WP and subplot effects. This condition is strict and not practical in many situations because it is only suitable for balanced designs.
Hasegawa, Ikeda, Matsuura, and Suzuki (2010) proposed a different estimator for the WP error variance having more practical conditions than the one suggested by Weerakkody and Johnson (1992). Nevertheless, both of these approaches for estimating the two error variances can only be used in balanced designs, and their approaches are not compared to other methods such as ML, RMLE, ANOVA, and others. They cannot be used in unbalanced designs, which are frequently employed to reduce the number of experiments.
Ikeda, Matsuura, and Suzuki (2014) did a modification of the two-step residual-based method proposed by Hasegawa et al. (2010) to make it available for application in balanced and unbalanced designs. Then, they compared their method with RMLE only. They concluded that their method could be an alternative to RMLE based on their simulation results. Their alternative method was not a better estimation method because it could perform poorly under a different simulation scenario and they did not compare it with other estimation technique like MLE. However, the methods introduced by Weerakkody and Johnson (1992), Hasegawa et al. (2010) and Ikeda, Matsuura, and Suzuki (2014) were only implemented for linear balanced and unbalanced SP design models. Nonlinear modeling of SP design has attracted few researches especially in estimating the parameters of the model. Although, it follows the same procedure used in parameter estimation for nonlinear regression. Gumpertz and Rawlings (1992) stated that when the objective of fitting a nonlinear function to data from SP experimental designs, a nonlinear model with variance components (WP variance, σ 2 γ , SP variance, and σ 2 ε ) was appropriate.
Nonlinear modeling of SP data, which has two error terms, is a special modeling case of a nonlinear model with variance components. The reason is that the model contains a nonlinear function for the mean part, g(X, θ), and the random effects (WP and subplot errors) are added to the mean function. Normal nonlinear regression modeling assumes that all the observations in the data set are uncorrelated and there is only one source of random error. If they are used to fit models with over one random error term, they give standard errors for the incorrect parameter estimates and other important quantities. Therefore, if a standard nonlinear regression program is used to analyze SP design data, the single variance estimate like Mean Squared Error (MSE) will be a compromise between the two error variances, MSEa and MSEb, from the SP analysis of variance (Gumpertz & Rawlings, 1992;Knezevic, Evans, Blankenship, Van Acker, & Lindquist, 2002;Blankenship, Stroup, Evans, & Knezevic, 2003).
In this research, a theoretical presentation of an iterative Gauss-Newton via Taylor series expansion procedures for Estimated Generalized Least Square (EGLS) technique for estimating intrinsically nonlinear SPD model parameters will be done. The variance components are unknown, and they are estimated via MLE method. Householder QR decomposition technique is adopted to achieve numerical stability during the iterative process of estimating the parameters in the model.

II. METHODS
The methodological approach for this research is completely theoretical. No qualitative or quantitative data is used for estimating the intrinsically nonlinear SP design model. In achieving the objective, the use of EGLS method of parameter estimation is adopted. The EGLS is adopted due to the assumption that the SP model variance-covariance matrix is unknown.
The variance-covariance matrix is estimated using MLE. The estimated variance-covariance matrix parameter. Then, V is substituted into the model to estimate the SP model parameters. The model parameters are estimated via an iterative Gauss-Newton via Taylor Series expansion procedure. This iterative system has to be used since the partially derived system of equation does not exist in a closed form. However, to achieve an asymptotically numerically stable parameter estimate, QR decomposition by Householder (1958) is implemented.

III. RESULTS AND DISCUSSIONS
The nonlinear SP model, which has Whole Plot Error (WPE) and Sub Plot Error (SPE) are special cases of the nonlinear model with random effects. It is also called the nonlinear model with variance components. The formulated model and assumptions are given as follows. (1) Inserting the levels of the factors to be investigated (1) is as follows.
( 2) Where, y ijk is the response variable and i = 1, .... Moreover, S is as replicates or blocks and j = 1, ... Then, a levels the WP factor A and k = 1, .... Next, b levels the SP factor B, w ijk is the WP error and ε ijkl is the SP error. Meanwhile, f(x ijkl ,θ) is the nonlinear function for the mean.
There are three assumptions in this model. First, it is assumed that the WP and SP errors are random effects. Moreover, it is assumed that and .
Second, it lets to be the model parameter estimate of θ. It follows an asymptotic normal distribution with mean and variance , where F is the n × p matrix with elements . It has full column rank, p. This implies that the estimated response follows an asymptotic normal distribution with mean y 0 and variance where f x is a p × 1 vector with elements and V is the covariance matrix of the response vector.
Third, if the mean function, parameters is p and the number of random effect is r, the number of observations in the data set, n, must be at least p + r +1. It estimates all of the parameters. This implies that it is n ≥ p + r +1.
In EGLS estimation method, the covariance matrix of y is known as the Generalized Least Square (GLS) estimator, . It is to minimize the objective function With respect to θ. (Gumpertz & Rawlings, 1992).
It lets the variance-covariance matrix of the observations var (y) be written as follows.
Using spectral decomposition, it can be seen that V is positive definite if and only if there exists a non-singular matrix P such as Then, by multiplying model (4)  At this point, equation (12) has no closed form. Hence, it will be solved iteratively using the Gauss-Newton method via Taylor series expansion of at first order of (13) It shows that around x = a. Then, is the remainder term which is reasonably small if p is sufficiently large. Therefore, the researches have the equation as follows: (14) Let And for all N cases and , then equation (14) becomes (15) It shows that D 0 is the N×P derivative matrix with elements of {d ijkl×p }. This is equivalent to approximating the residuals for the model that is E(θ * ) = T -η(θ * ) by It shows that z 0 = and . Next, the researchers apply the QR decomposition by Householder (1958) to (16). This is due to its numerical stability characteristic for estimating the parameters in the model (Klotz, 2006). This is done to decompose D 0 into the product of an orthogonal matrix and an inverted matrix.
Theorem 1: If A is an m × n matrix with full column rank, A can be factored as A = QR. It shows that Q is an m × n matrix whose column vectors forms an orthonormal basis for the column space of A and R as an n × n invertible upper triangular matrix.
Proof: Let m×n matrix have columns w 1 , w 2 ,... w n m-vectors. In addition, it lets q 1 , q 2 ,...q n , q n+1 ,...q m be orthonormal m-vectors such as: Then Q is m×n with orthonormal columns, Q T Q = I. If A is a square matrix (m = n), tQ is orthogonal that is Q T Q = QQ T = I. Hence, q i is orthogonal to w 1 , w 2 ,...w n . Therefore, it shows:

This implies that A = QR
It lets A = [w 1 w 2 … w n ] and R = w • q • Therefore, equation (19) is written as (20) Equation (20) shows that R is n×n. It is upper triangular with nonzero diagonal elements and R is nonsingular. Meanwhile, diagonal elements are nonzero.
Theorem 2: If A is an m×n matrix with full column rank, and A = QR is a QR-decomposition of A, the normal system for Ax = b can be expressed as Rx = Q T b and the least squares solution can be expressed as = R -1 Q T b. Proof: Let = (A T A) -1 A T b be the best approximate solution to Ax = b. Based on the orthonormal and orthogonal property exhibited by QR-decomposition, if it is A = QR , it shows A T = R T Q T . Therefore, it is: Based on the two stated and proved theorems on QR-decomposition, the decomposition of M 0 is presented as M 0 = QR.
It shows Q as an N×N orthogonal matrix that is Q T Q = QQ T = I . Then, R is an N×P triangular matrix that R is zero below the main diagonal. The researchers write Q and R as follows: It means Q 1 is the first P columns and Q 2 is the last N -P columns of Q. Then, it shows: with R 1 a P×P upper triangular matrix with all elements greater than zero and R 2 is a (N -P)×P lower matrix of zeros. Moreover, it shows: It means that and are of dimension P×N and (N -P)×N respectively. Therefore, it is geometrically. Then, Q columns define an orthogonal or orthonormal basis for the response space with the property. Moreover, P columns cover the expectation plane. Projection onto the expectation plane is very easy if the projection is in the coordinate system given by Q (Klotz, 2006).
Next is the transformation of the response vector, which is (25) with components of (26) and (27) The projection of g onto the expectation plane is therefore simply given as in the Q coordinates and with as n the original coordinates. So, it is . This implies that can now be easily estimated using backward solving (Klotz, 2006).
In equation (28), it should be closer to y than and move to a better parameter value of .Another iteration is performed by calculating new residuals z 1 = , a new derivative matrix M 0 , and a new increase. This process is continuously repeated until convergence is obtained, and until the increment becomes so small. There is no useful change in the elements of the parameter vector.
It is expected that the new residual sum of square should be less than the initial estimate, but if it is otherwise, a small step in the direction is introduced. A step factor λ is introduced such as where λ is chosen to ensure that the new residual sum of squares is less than the initial estimate. A common method starts with λ = 1 and halves it until it is satisfied that the new residual sum of squares is less than the initial estimate. In actual practice the GLS, it is not possible to be implemented. This is because the variance-covariance matrix, V, is unknown. Therefore, an estimated V is obtained and substituted into equation (3), and the term of EGLS is used. There are many methods for estimating the variance components to substitute for V in equation (3). In this research, the procedure for MLE technique is presented.
The MLE procedure for variance components estimation from nonlinear SP design model is presented. The MLE method used for this research is an iterative method for estimating the variance components. The method follows the maximum likelihood algorithm for the linear model with variance components by Hemmerle and Hartley (1973) and the procedure presented by Gumpertz and Rawlings (1992). The mean function of is the first approximated through a first-order in Taylor series centered at as shown in equation (15). Therefore, the log-likelihood function is given as: It means is approximated by the surface and lets ln L to Γ become, It shows that z 0 = , , , and . Then, the gradient is given by: Multiplying the partial derivative first term by VV -1 and equating to zero gives the estimating equations as follows: The estimates of and are iteratively obtained at (h + 1) st iteration by substituting a prior estimate of into equation (33) to get an updated estimate of . Then, the updated and prior estimate of are substituted into equation (34) to obtain updated estimates of the variance components. These two steps are iterated till convergence is achieved. Therefore, equations (33) and (34) become (35) and (36) When the further iteration does not improve the log likelihood, the solutions to the equations may turn out to be negative. In such scenario, the negative value resets to zero before the next iteration.
This research presents the procedure and steps in estimating the parameters for a SP design model that the mean part of the model can be any nonlinear function. Then, the variance components ( ) of the model are estimated via MLE technique. This is achieved by minimizing the objective function, that the estimates of and are iteratively obtained at (h + 1) st iteration by substituting a prior estimate of to the estimating equation till convergence occurs. This is done by transforming the GLS nonlinear SPD model into an OLS nonlinear SPD model using iterative Gauss-Newton via Taylor Series expansion procedure approximated at first order. QR decomposition technique is introduced into the estimation system to achieve stability in the estimates.
The EGLS method presented in this research is available and applicable for estimating linear fixed, random and mixed-effect models. However, in practical situations, the functional form of the mean in the model is often nonlinear due to the dynamics involved in the system process. Since the parameters enter the model nonlinearly in which the model is intrinsically nonlinear, the closed form of the differentiated objective function does not exist. Hence, the use of an iterative Gauss-Newton via Taylor Series expansion procedure approximated at first order is adopted and implemented. These iterative procedures for estimating the parameters of the nonlinear SP models, statistical software such as the %NLINMIX SAS macro or SAS PROC NLMIXED can be used to handle all computations.