One missing value problem in Latin square design of any order: Exact analysis of variance

Abstract: This research proposes a simplified exact approach based on the general linear model for solving the K × K Latin square design (LSD) with one replicate and one missing value, given the lack of ready-made mathematical formulas for the sub-variance. Under the proposed scheme, the effects of the potential variable were determined by means of the regression sums of squares under the full and reduced treatment models. The mathematical expressions could be applied to the LSD with one missing value of any order. Moreover, the treatment, row and column sums of squares are unbiased.


Introduction
In science and engineering, design of experiments (DOE) refers to the experimental situations or strategies for analysis of quantitative responses associated with the experimental units. DOE is classified into various types, including the classical DOE based on Fisher's principles, Shainin experiment, Taguchi experiment. Specifically, the DOE based on Fisher's principles involves randomization, replication and blocking (Montgomery, 2008). In the design and improvement of products and production, the role of experimentation is to identify the influencing factors (determinants) of the response variable and manipulate the determinants such that the response variable outcome closely

PUBLIC INTEREST STATEMENT
On the Fisher's principles, a classical experimental design (e.g. one-way ANOVA, Latin square design (LSD), 2-level factorial design, factional factorial design, and so on) is a powerful methodology in order to explain causal mechanisms between independent variables and response variable by means of the identification of variation of data. LSD is of great use for analyzing one potential variable and two block variables. One missing experimental data could, however, pose significant challenges to the analysis. In this research, an incomplete LSD of any order with one missing experimental data was of the exact approach based on the general linear model. Due to the lack of ready-made formula, this research paper has thus proposed the explicit and mathematical formulae for the treatment sum of squares for ease of comparisons of mean squares, along with an F-test.
resembles the desired nominal value. In fact, Fisher's classical DOE is a form of statistical hypothesis testing under the analysis of variance (ANOVA) (Speed, 1992). Meanwhile, ANOVA is defined as a collection of statistical procedures to compare the between-group variation with the within-group variation (Montgomery, 2008).
A Latin square design (LSD) is an efficient design of experiments for three factors, whereby only one factor is of primary interest (i.e. the potential variable) while the other two (the nuisance variables or factors) are blocked to restrain extraneous variability in experimental units. The word "Latin square design" is abbreviated to "LSD" in this research. Latin letters are used to symbolize the level of the factor of primary interest. In the LSD, the levels of the two nuisance variables are identified with the rows and columns of a two-way table; every level of the factor of primary interest appears once in each column and once in each row; and the two-factor and three-factor interaction effects are assumed non-existent. Besides the randomized complete block design (RCBD), in which the effect of a single nuisance variable is blocked, the LSD also utilizes the blocking technique to separate the variations of nuisance variables from the experimental error. Unlike in the LSD, in the Latin rectangle design (Mead, Gilmour, & Mead, 2012) the numbers of columns and rows (blocks) are not identical for the two nuisance factors, and the Latin letters in each row (or column) can be replicated. In Youden (1937), the Youden square design or the distinct Latin rectangle design was proposed whereby the number of blocks on one side is greater than the other side's, and the number of treatments (Latin letters) is equal to the number of blocks of the former.
In a real scientific test under certain conditions, experimenters might face a difficult situation in which a set of experimental observations is not complete. The incomplete-observation situation can be commonly divided into two situations: (1) the initial intention to occur the incomplete observations due to a limitation on the number of experimental units, i.e. material units, articles, or subjects, (2) the accidental situation. The first situation can be the existence of balanced characteristic or unbalanced arrangement. For instance, Youden (1937), Yates (1936), and Ai, Li, Liu, and Lin (2013), respectively proposed the Youden square design, the balanced incomplete block design (BIBD), and the balanced incomplete Latin square design (BILSD). Such a balanced arrangement can help make the ANOVA easier with the simple formulae to determine the treatment and error sums of squares. In the second situation which might occur from bad control of some variables, the reading values from experiment are abnormal or not observed. Hence, their values might be cut from a set of observations, leading to the unbalanced or asymmetrical arrangement. It is important to note that there is no certain formula for the ANOVA in the incomplete-observation experimental design The work of Allan and Wishart (1930) seems to be the earliest paper specifically considering the analysis of incomplete-data problem by means of the differentiation based on the overall mean. In Yates (1933) and Sirikasemsuk (2016a), the non-iterative and iterative missing plot techniques were proposed whereby the differential calculus was utilized to determine the missing experimental data with minimal error sum of squares. The estimates of the missing experimental data however contribute to an upward bias of the treatment sum of squares. Thus, the bias is determined and subtracted from the initial treatment sum of squares (Little & Rubin, 2002). In Coons (1957), Cochran (1957) and Wilkinson (1958), the analysis of covariance (ANCOVA) technique was proposed for solving the incomplete-data experimental designs. In fact, the earliest paper with a reference to the ANCOVA was Bartlett's (1937). Table 1 tabulates existing methods to solve the incomplete-data experimental problems. However, the single imputation methods based on the mean (or mode) substitution, listwise deletion and pairwise deletion are excluded.
Many recent research studies considered aspects of combinatorics, examples of which were the studies on the construction of the orthogonal Latin squares by Zhang (2013) and Donovan and Şule Yazıcı (2014); and the studies on the completability of the incomplete Latin squares from the partial Latin squares by Euler (2010) and Casselgren and Häggkvist (2013).
In Table 1, all the methods, except the exact approach, must estimate the missing observations. As a matter of fact, the missing observations should never be estimated because the estimate values are not experiment-based. Thus, it is advisable that the exact approach with the general linear model be adopted to solve the incomplete-data experimental design problems (Montgomery, 2008;Sirikasemsuk, 2016a). Specifically, this research proposes a simplified exact approach (the general regression significant test) for the K × K LSD with one replicate and one missing experimental data, where K is the order of the LSD.
The organization of this research is as follows: Section 1 is the introduction. Section 2 details the general ANOVA table for a complete LSD with K × K order and the components. Section 3 deals with a K × K LSD with one missing data, the estimated parameter values of the full effect model and the regression sum of squares, while Section 4 concerns those of the reduced-treatment effect model and the regression sum of squares. Section 5 derives the simplified formulas of the sums of squares. The concluding remarks are provided in Section 6. The notations are provided in the Appendix.

Analysis of variance in complete K × K LSD
The full effect model of y ijk , given the complete K × K LSD, is expressed as where ɛ ijk is independently, identically and normally distributed, i.e. ɛ ijk ~ N(0, σ 2 ). Table 2 presents an example of the LSD with K × K order whose components are summarized and tabulated using an ANOVA table, as shown in Table 3.
The sums of squares for a Latin square experiment are expressed as (1) Table 1. Existing methods to solve the incomplete-data experimental problems

Method Description Author
Missing plot technique by minimizing the error sum of squares with non-iterative method Differentiating the estimated parameter of the overall mean with respect to each missing value Allan and Wishart (1930) Differentiating the error sum of squares to each missing value (when only one observation is missing) Yates (1933) General method for estimating several missing values in Latin square design Kramer and Glass (1960) Non-iterative Rubin method Rubin (1972) Missing plot technique with iterative method Iterative Yates method (based on the work of Allan and Wishart (1930) when more than one observations are missing) Yates (1933) Healy-Westmacott method based on regression imputation Healy and Westmacott (1956) Exact approach with general linear model General regression significance test Montgomery (2008) Analysis of covariance (ANCOVA) technique A combination of regression analysis and ANOVA consisting of the covariate Coons (1957), Cochran (1957), and Wilkinson (1958) Expectation maximization algorithm (EM Algorithm) Iterative method with maximum likelihood estimation Dempster, Laird, and Rubin (1977) Multiple imputation (MI) method A combination of raw maximum likelihood and EM method Rubin (1987)

Incomplete LSD and regression sum of squares under the full model
For the missing-data LSD, the sums of squares in Equations (2)-(4) are invalid. The general regression significance test could instead be applied to the incomplete LSD for ANOVA. According to Montgomery (2008), the computational formulas for the sums of squares of treatments, rows, columns and errors could respectively be expressed as SS tr = R( , , , ) − R( , , ) SS column = R( , , , ) − R( , , ) . .
where R(µ, τ, λ) and R(μ, ω, τ) are the regression sums of squares of the reduced effect model of y ijk , in which the effects of rows and columns are overlooked, respectively; and if one observation is missing, the degrees of freedom of SS total and SS E in Table 3 would respectively be K 2 − 2 and K 2 − 3 K + 1.
Meanwhile, the theoretical regression sum of squares of the full effect model of y ijk is expressed as In Sirikasemsuk (2016b), the estimated values of all parameters (Equation (11)) of the incomplete LSD with one missing observation were derived and the regression sum of squares of the full effect model of y ijk could be expressed as where y sum_m = y r⋅⋅ + y ⋅m⋅ + y ⋅⋅c .
To find the treatment sum of squares (see Equation (7)), it is assumed that the treatment effects (τ j ) are not considered in Equation (1), i.e. τ j = 0 for all values of j. The estimated , i , and k will be substituted with ̂N T , ̂N T i , and ̂N T k instead of ̂, ̂i and ̂k . With the treatment effects of a single factor is of primary interest ignored, this linear statistical model of y ijk is referred to as "the reduced-treatment effect model" in this research. Thus, its regression sum of squares, R(µ, ω, λ), can be expressed as Equation (13).
The estimated model parameters, i.e. ̂N T , ̂N T i and ̂N T k , will be later detailed in Section 4. It should be noted that the determination of the parameter estimates in R(μ, τ, λ) and R(μ, ω, τ) is similarly carried out for R (μ, ω, λ) in Section 4. The expressions of R(μ, τ, λ) and R(μ, ω, τ), including their parameter estimates, are not demonstrated in this research.

Estimated values of all parameters and regression sum of squares under the reduced-treatment model
With the exact approach, it is necessary to find the estimates of the fitted values of the reducedtreatment effect model prior to R(μ, ω, λ) according to Equation (13). In addition, the parameter estimates for the reduced-treatment effect model can be divided into two categories: The first category refers to the parameter estimates directly influenced by the missing value, i.e. ̂N T , ̂N T r and ̂N T c (see Proposition 1), while the second category consists of the remaining parameter estimates directly unaffected by the missing value, which can be derived and shown in Equations (22) (17), then adding Equations (18) and (19), and rearranging, the parameter estimate of μ is expressed in Equation (14). The fitted parameters ̂N T r and ̂N T c in Equations (15) and (16) can be easily solved from Equations (18) and (19). This completes the proof. ☐ In the second category of the reduced-treatment effect model in which the treatment effect is ignored, the normal equations can be expressed as where i ≠ r and k ≠ c. The remaining parameter estimates are subsequently determined as where i ≠ r, k ≠ c; and ̂N T in Equations (22) and (23) is substituted with Equation (14). It is noted that the fitted parameters ̂N T i and ̂N T k in Equations (22) and (23) can be easily solved from Equations (20) and (21).
Proposition 2: In the K × K LSD with one missing experimental data, the regression sum of squares for the reduced-treatment effect model of y ijk can be expressed as Proof. The determination of R( , , ) can be carried out in a similar fashion to that of R( , , , ) in the paper of Sirikasemsuk (2016b) and is presented as below.

Sums of squares for incomplete LSD with one missing experimental data
Proposition 3: In the K × K LSD with one missing experimental data, the sums of squares for the treatments, rows, and columns can be determined as Proof. Based on Equation (7), the treatment sum of squares (SS tr ) in Eq (26) can be derived by subtracting Equation (24) in Preposition 2 from Equation (12). The determinations of the row and column sums of squares are similarly carried out for SS tr above. This completes the proof. ☐ An attracting illustration is given by an elongation experiment (Ott & Longnecker, 2010) which was laid out in a 5 × 5 LSD as shown in Table 4. There were five different versions of the stockings (treatments) by each of five investigators on five separate days.
The treatment, column, row and error sums of squares without bias can be easily calculated as presented in Table 5. In addition, the sum square of treatment using the missing plot technique is biased and cannot be used immediately in the ANOVA table, according to Ott and Longnecker (2010).   (12) Regression sum of squares for the reduced model R(µ, ω, λ) = 11,325.823 See Equation (24) Treatment sum of squares (version) SS tr = 165.4943 (with df = 4) See Equation (26) Row sum of squares (investigator) SS row = 14.3688 (with df = 4) See Equation (27) Column sum of squares (day) SS column = 0.9428 (with df = 4) See Equation (28) Total sum of squares SS total = 191.4000 (with df = 23) See Equation (5) Error sum of squares SS E = 13.2500 (with df = 11) See Equation (6) Mean square of treatment MS tr = 41.3736 See Table 3 Mean square of error MS E = 0.1312 See Table 3 F test-statistic for treatment F test = 315.35 See Table 3