A MATLAB Package for Computing Two-Level Search Design Performance Criteria

In a 2 m factorial design, search designs are considered for searching and estimating some non-zero interactions based on search linear models. There are some criteria for comparing search designs. Computing these criteria is a heavy task. In this paper, we provide the SD package for the numerical computing environment MATLAB to compute these criteria and also to check Srivastava’s condition for a given design. The package is illustrated by an example.


Introduction
The topic of search designs was introduced in Srivastava (1975). He also provides a condition that a search design must satisfy, which is based on checking the rank of several matrices. The problem of choosing a better search design in searching and identifying true factorial effects, based on some criteria is another topic in the search design literature. In this paper, we provide a package for the numerical computing environment MATLAB (The MathWorks, Inc. 2011) to check whether or not a design satisfies Srivastavas's condition and also to calculate some criteria used in comparing search designs. The criteria are searching probability (SP), weighted searching probability (WSP), Kullback-Leibler (KL) and expected Kullback-Leibler (EKL) search criterion. Using the SD package on the MATLAB command line interface is even for a beginner an easy task.
The paper is organized as following. In Section 2, search designs and criteria for choosing an optimal search design are reviewed briefly. In Section 3, the SD package is introduced and an example is presented for explaining and using the package in Section 4.

A review of search designs and their comparison
In the following subsections, a brief review of search designs and some criteria used for choosing a better search design are presented. A comprehensive review in search designs could be found in Ghosh (1996) and Ghosh, Shirakura, and Srivastava (2007). Several authors have developed criteria for measuring and comparing the search performance of designs. These criteria are given in Shirakura, Takahashi, and Srivastava (1996), Ghosh and Teschmacher (2002) and Talebi and Esmailzadeh (2011a,b).

Search design
Consider the following linear model for observations vector y (N × 1) of a 2 m factorial experiment, where X i (N × ν i ) are known design matrices with entries ±1, β i (ν i × 1) for i = 1, 2, are unknown vectors of fixed parameters, e (N × 1) is a random vector of errors, σ 2 is the error variance and I N is the identity matrix of order N . We assume at most k ( ν 2 ) elements of β 2 are nonzero but which of them, is unknown. The problem is to estimate the vector β 1 and search for k nonzero elements of β 2 to identify and estimate them. The problem was first introduced by Srivastava (1975). A design which is able to solve the search problem is called a search design and the corresponding linear model (1) is called a search linear model. Srivastava showed that the search design must satisfy the following rank condition rank(X 1 ; X 22 ) = ν 1 + 2k, for every submatrix X 22 (N × 2k) of X 2 . If σ 2 = 0, the rank condition (2) is necessary and sufficient. However, when σ 2 > 0, it is not sufficient but still necessary.

Searching probability
For searching the k nonzero elements in β 2 , Srivastava (1975) suggested using the sum of squared errors (SSE) for all possible rival models. The model with minimum SSE is chosen as the true model. However, note that if the SSE is stochastic then we may make a wrong decision in choosing the true model. Shirakura et al. (1996) studied the search ability of a design in searching the true model and defined the search probability to measure such an ability. Shirakura et al. (1996) considered the least discrimination strength of a design and suggested a search criterion for a search design T as follows: where A(β 20 , β 2 ) is a set that includes all β 2i in β 2 other than β 20 and for independent and identical normal error and for k = 1 where Φ(·) is the standard normal cumulative distribution function (CDF), and ρ = β 20 /σ. Due to dependence on the unknown parameter ρ, using SP T (ρ) for comparing two search designs, for all values of ρ, is an enormous task and in some cases inconclusive. To overcome this problem, Ghosh and Teschmacher (2002) and Talebi and Esmailzadeh (2011b) gave some criteria based on SP, which is independent of ρ. The Ghosh and Teschmacher criterion is based on differences between searching probability matrix (SPM), whose entries are possible values of G in (4), of two candidate search designs. For more details see Ghosh and Teschmacher (2002).

Weighted searching probability
Talebi and Esmailzadeh (2011b) considered the weighted searching probability criterion for a design T , as following where T (·) is the CDF of a t-student random variable with 2υ degrees of freedom and µ = ( υ λ ) 1 2 , λ > 0.
The criteria based on SP are limited to the case of k = 1 in a search linear model (1). For the general case k ≥ 1, two criteria have been developed, based on Kullback-Leibler distance, by Talebi and Esmailzadeh (2011a) which will be reviewed in the following subsection. Talebi and Esmailzadeh (2011a) proposed the Kullback-Leibler search criterion, for a given search design T , by

Kullback-Leibler search criterion
where S = {M i : y = X 1 β 1 + X 2i β 2i + e, i = 1, 2, . . . , ν 2 k }, X 2i is the N × k submatrix of X 2 corresponding to β 2i , S 0 = S − {M 0 }, M 0 is the true model and for independent normal error, Similar to the SP criterion, the KL criterion also depends on the unknown parameter ρ. To overcome this problem, Talebi and Esmailzadeh (2011a) suggested the expected Kullback-Leibler criterion. Their EKL criterion is: where

SD package
The SD package contains two major functions, SrCond() and SDC(). The SrCond() function checks Srivastava's condition in (2) for a given two-level design matrix with ±1 entries. It finds the largest value of k for which condition (2) is satisfied. The function has two input arguments D and fi: D, a two-level design matrix with entries ±1 and fi which takes values 2, 3 or 4 when the columns of matrix X 2 in model (1) are 2-factor interactions, 2 and 3-factor interactions and 2, 3 and 4-factor interactions, respectively. If there are some values of k ≥ 1 for which the condition (2) are satisfied, the output of SrCond() is the largest value of them. The value of k = 0 means that the input design matrix is not a search design.
The SDC() function calculates searching probability, weighted searching probability, Kullback-Leibler and expected Kullback-Leibler search criteria for a given two-level search design. The SDC() function contains the following four main components: 1. SP(): Calculate the searching probability in (3) for a given ρ value.

KL():
Calculate the value of the Kullback-Leibler criterion in (6) for a given ρ value.
The function SDC() has six input arguments: D, a two levels search design matrix with entries ±1, k indicating the number of nonzero parameters in vector β 2 in search linear model (1) and fi is defined as in the SrCond() function. Due to the hierarchical principle of effects, five and higher order factor interactions are not considered in this function. The input argument rho is a k × 1 vector. The arguments v and la, are parameters of Gamma(v, la), which are required for calculating WSP.
The SPM() function calculates the searching probability matrix SPM, matrix c and vector r given in Ghosh and Teschmacher (2002), used for comparison of search designs. The SPM() function has four input parameters: D, a two-level design matrix with entries ±1, rho, a positive real number, fi is defined as in SrCond() function and fname is a string containing the name of the file to be opened for saving SPM, c and r values. The searching probability matrix, matrix c and vector r are used for obtaining the criteria given in Ghosh and Teschmacher (2002).
We also provide the function Spplot() for plotting searching probability against parameter ρ. The plot is useful for comparing two search designs based on the SP criterion. The function has three input arguments, D, rho and fi, defined as in function SDC().
To use the SD package, one basic step must be carried out in MATLAB: the folder in which the package is stored must be defined as the current directory. This can be done by using the current directory window in the MATLAB environment. After this, the user can execute SrCond(), SDC(), SPM() and Spplot() on the MATLAB command line. In the package the function combinator() given by Fig (2010) is used, so in the current directory the user should also save this function.

Illustrative example
In this section we use the design matrix given in Table 1 to illustrate the SD package functions. First, the user should enter the design matrix D on the MATLAB command line.
By executing SrCond(D, 2), the output is the following message: The maximum value of k for which Srivastava condition is satisfied = 3 It means that the design D satisfies the Srivastava's condition in (2)  The matrices SPM and c and the r vector are used for design comparison based on criteria I-III in Ghosh and Teschmacher (2002). The searching probability plot for design D, for values 0 < ρ < 3.5, is shown in Figure 1. The plot is produced by executing Spplot(D, 3.5, 3) on the MATLAB command line. Figure 1: Plot of SP for design D.