Efficient methods for the estimation of the multinomial parameter for the two-trait group testing model

Estimation of a single Bernoulli parameter using pooled sampling is among the oldest problems in the group testing literature. To carry out such estimation, an array of efficient estimators have been introduced covering a wide range of situations routinely encountered in applications. More recently, there has been growing interest in using group testing to simultaneously estimate the joint probabilities of two correlated traits using a multinomial model. Unfortunately, basic estimation results, such as the maximum likelihood estimator (MLE), have not been adequately addressed in the literature for such cases. In this paper, we show that finding the MLE for this problem is equivalent to maximizing a multinomial likelihood with a restricted parameter space. A solution using the EM algorithm is presented which is guaranteed to converge to the global maximizer, even on the boundary of the parameter space. Two additional closed form estimators are presented with the goal of minimizing the bias and/or mean square error. The methods are illustrated by considering an application to the joint estimation of transmission prevalence for two strains of the Potato virus Y by the aphid myzus persicae.


Introduction
Estimation of some trait in a population when the (unknown) prevalence is rare and/or only a limited number of tests can be performed is a difficult statistical problem. One approach in such cases is group testing, in which individuals are screened in pools as opposed to individually. Depending on the underlying prevalence and group size, such methods have been shown to yield large gains in efficiency (as measured by mean square error (MSE)) and, often, a reduction in the total number of tests required (see, for example, Thompson, 1962;Swallow, 1985). Applications can be found in a wide range of areas, although uses in plant and animal sciences are especially common.
The standard group testing problem, in which the goal is estimation of a single trait, has been well studied, yielding an array of efficient estimators (see, as a few examples, Burrows, 1987;Tebbs et al., 2003;Hepworth and Watson, 2009;Santos and Dorgman, 2016;. More recently, however, there has been interest in the use of pooled sampling for the simultaneous estimation of two or more traits (see, for example, Hughes-Oliver and Rosenberger, 2000;Pfeiffer et al., 2002;Ding and Xiong, 2015;Tebbs et al., 2013;Warasi et al., 2016;Li et al., 2017;Hyun et al., 2018). Unfortunately, even in the case of two-diseases, results for small sample estimation do not exist, and basic tools such as maximum likelihood estimation have not been adequately addressed.
Conceptually, finding the maximum likelihood estimator (MLE) for the two-trait group testing problem can be expressed as a special case of maximizing a multinomial likelihood with a constrained parameter space (details are provided in the following section). Due to the restrictions on the parameter space, closed form techniques, such as those based on the invariance property of the MLE, do not work in this case (a recent work, Li et al., 2017, did report a closed form MLE based on this principle, but it can easily be checked that it yields estimates outside the parameter space). While numerical methods are possible, the restriction means that many estimates will fall on the boundary of the parameter space, and it is difficult to ensure convergence to a global maximum in such cases.
Maximization under constrained parameter spaces is a well studied problem for a variety of statistical models (see, for example, Nettleton, 1999;Jamshidian, 2004). Numerical methods specifically for the multinomial model under a range of convex constraints have been developed as well (see Grendár andSpitalský, 2017, and the references therein).
While these previous methods can be adapted to the group testing problem, our goal here is to provide a much simpler solution for this special case which is guaranteed to yield a global maximizer. We show that, when optimizing over the boundary, the problem is equivalent to a convex optimization problem in one fewer dimensions. The maximization can then be carried out using a variety of methods, and we develop here an EM Algorithm based approach. The resulting estimator is shown to converge to the unique global maximizer.
Two additional closed form estimators are presented as well. One, based on the method of moments, is shown to approximate the MLE very closely. The second, a shrinkage estimator based on the one-trait group testing estimator presented in Burrows (1987), is developed with the intention of reducing the MSE and/or bias of the MLE.
Numerical comparisons in terms of relative bias and MSE are presented which cover a wide range of applicable situations. The methods are further illustrated by considering two experiments where the transmission rates for prevalences of different strains of Potato virus Y are to be estimated simultaneously.
For use in later results, we define the closure Ψ p = Ψ p ∪∂Ψ p where ∂Ψ p is the boundary of the parameter space. Likewise, let X = {x : 0 x n}, where denotes element wise non-strict inequality, be the sample space of X with interior X 0 = {x : 0 ≺ x ≺ n}.
The following lemma, the proof of which is given in Appendix A, establishes the concavity of the log-likelihood function.
Lemma 1. (a) For x ∈ X 0 , the log-likelihood is strictly concave for all p ∈ Ψ θ .
(b) For all x ∈ X, the log-likelihood is concave (not necessarily strict) for all p ∈ Ψ θ .
Using standard multinomial theory, we know that, when maximizing over Ψ θ , the unique MLE is given byθ n for x ∈ X 0 , and this point is the unique maximizer of the likelihood for all x ∈ X over the closure of Ψ θ . To find the MLE under the restricted parameter space, Ψ p , we first note that there exists a one-to-one mapping θ → p as given in the following lemma. The proof of this lemma is found by inverting (1) and (2).
Lemma 2. The unique function h : θ → p is given by If we define the set R n = x : x 00 + x 10 n where h is as in Lemma 2, so that such values provide the unique MLE for p by the invariance property of the MLE. Since the log-likelihood is a concave function on Ψ θ , it is clear that the maximizer for all x / ∈ R n must lie in ∂Ψ p . This leads to the following result.
Theorem 1 (Existence and uniqueness of MLE). (a) A necessary and sufficient condition for the maximum likelihood estimator of p ∈ Ψ p to exist and be unique is that x ∈ X 0 ∩ R n . In this case, the MLE is given bŷ wherep M LE 00 (b) As n → ∞, P (x ∈ X 0 ∩ R n ) = 1 so that the MLE as given in (4) exists and is unique with probability one.
Proof. (a) This follows directly from the previous discussion using the invariance property based on the multinomial MLE. (b) Let g(x) = x 00 +x 10 n 1/k + x 00 +x 01 n 1/k − x 00 n 1/k . Then, by the strong law of large numbers g(x) a.s.
→ θ implies that x lies in the interior of its support with probability one. That is, for large enough n, P (x ∈ X 0 ) = 1. It follows then that P (x ∈ X 0 ∩ R n ) = 1.

Maximization over the boundary
While the above result is complete for large samples, in many cases n will necessarily be small and we will be interested in maximizing over the closure Ψ p . Defining R n = x : x 00 + x 10 n the invariance property of the MLE and Lemma 2 are sufficient to establishp M LE = h(x) for all x ∈ R n . To get an idea of how common boundary estimates can be, Table 1 provides values of P (x / ∈ R n ) for a wide range of n with several realistic values of p and k.
From the table, we see that, when p 00 is large this probability can be quite substantial, even for large values of n and small k. While this effect seems to lessen as the probability of at least one positive trait increases, group testing is most commonly used in the context of rare traits. It is apparent, then, that the problem of boundary estimates will be present in many applications.
Unfortunately, despite the concavity of the log-likelihood, the maximum on the boundary will not occur at a stationary point. Furthermore, the problem as previously expressed is not guaranteed to have a unique maximizer over the boundary. As such, maximizing the likelihood over ∂Ψ p is a non-trivial optimization problem.
To proceed, the following theorem allows us to reduce the dimension of the parameter space by one, facilitating the use of convex theory results to find the maximizer.
Theorem 2. For all x / ∈ R n the log-likelihood over Ψ p is maximized at a point such that p 11 = 0. If x ∈ X 0 or x 11 = 0 is the only zero, then the log-likelihood is uniquely maximized at a point withp 11 = 0.
Proof. For all x, the invariance property of the MLE gives the unique maximizer over Ψ θ to bep = h(x), where h is as in Lemma 2. If x ∈ X 0 ∩ R c n then we havep 11 < 0, by the definition of R n . Furthermore, by Lemma 1 (a), the log-likelihood, , is strictly concave over this set and, by (1) and (2), each ofp 00 ,p 10 , andp 01 are non-negative (since otherwise, the corresponding θ values would be negative, hence not in Ψ θ ). Suppose now that p ∈ Ψ p is the true maximizer over the boundary and satisfies p 11 > 0. Then, sincep is a global maximum and is strictly concave, the line segment (tp + (1 − t)p ), 0 ≤ t ≤ 1 is strictly decreasing as t → 0. However, there exists a point in ∂Ψ on the line with t > 0, say p , satisfying p 11 = 0 and (p ) > (p ). If x / ∈ R n and at least one of the elements of x is equal to zero, the log-likelihood is concave (not strictly, Lemma 1 (b)). As such, the previous analysis can be repeated with the conclusion that (p ) ≥ (p ), so that is maximized at a point with p 11 = 0, although perhaps not uniquely.
For x 11 = 0, the log-likelihood is proportional to x 00 log(p k 00 )+x 10 log((p 00 +p 10 ) k −p k 00 )+ x 01 log((p 00 + p 01 ) k − p k 00 ). Let p be any point such that p 11 > 0. It is clear that taking the point p = (p 00 , p 10 +p 00 , p 01 , 0) the value of this function can be increases, provided x 10 > 0.
If instead x 10 = 0 but x 01 > 0, the same can achieved by taking p = (p 00 , p 10 , p 01 + p 11 , 0). Likewise, if both x 10 = 0 and x 01 = 0, so that x 00 = n, the log-likelihood can be increased by taking p = (p 00 + p 11 , p 10 , p 01 , 0). Since one of these three cases must occur, it follows that any point maximizing the log-likelihood will necessarily have p 11 = 0.
As a result of Theorem 2, for values x / ∈ R n the objective function given in (3) can be expressed in terms of the simpler two-parameter model in which we seek to maximize * ∝ x 00 log( The following lemma addresses the concavity of this likelihood function. The proof of this lemma is given in Appendix A. Lemma 3. (a) For x ∈ X 0 ∩ R n , the log-likelihood given in (5) is strictly concave for all With this result established, for any x / ∈ R n maximization of (5) over Ψ p * can be carried out in a number of ways with the resulting estimate, combined withp 11 = 0, yielding a global maximum of (3) which is the desired MLE. In the following section, we develop an EM algorithm based approach to solving this simplified optimization problem.

EM algorithm
In this section we derive an EM algorithm based estimator assuming values x / ∈ R n , for which we know the maximizing value takes p 11 = 0. For the complete data, we use the true underlying status of each individual in the study.
Result 1. Beginning with an initial value p * (0) , the estimate from the tth iteration, t = 1, 2, 3, . . ., of the EM algorithm is given by Proof. Let Z ij = (z ij 10 , z ij 01 ) represent the disease status of the jth unit from the ith pool, so that Z ij ∼ M N (1, p * ) and set z ij 00 = n − z ij 10 − z ij 01 . Using Z as the complete data in the EM framework, the complete data log-likelihood is given by We now proceed to calculate the E and M steps, respectively.

M-step:
Since (9) is a standard multinomial log-likelihood of size kn, the unique global maximizer is given by

Global maximum over the closure
The previous results can be combined to yield an estimator which yields a global maximizer of the likelihood over the closure, Ψ p . The steps for finding this estimator are given in Algorithm 1.

Algorithm 1 Global Maximizer
beginning with an initial value p * (0) , iterate p * (t) , t = 1, 2, 3, . . . as in (6) -(8) until convergence of (call the value at convergence p * (∞) ); 5: For x ∈ R n , this estimator yields the unique global maximizer of the likelihood based on the invariance property of the MLE and standard multinomial theory.
For other values in the sample space, using the results in Wu (1983), the EM sequence of estimates will be guaranteed to converge to the global maximizer, provided the sequence of estimates lies in the interior of the parameter space. In our numerical work, there was not a single case where the final estimate lay on the boundary of the space. To see why this is true, inspection of R n shows that x is in this set only if x 10 and x 01 are both nonzero. This is sufficient to guarantee the likelihood is maximized at a point with p 10 and p 01 both positive (otherwise the value of will be −∞). If x 00 > 0, the same theoretical guarantee can be made for p 00 , so that the maximum must lie in the interior and the EM algorithm will always converge to this point. If x 00 = 0, it can not be shown theoretically that p 00 > 0 at the maximum, but our numerical work shows that, even in this case, the maximum will tend to occur at very large values of p 00 (for example, with n = 250, k = 10, and x = (100, 100, 50), we have p M LE 00 = 0.82).
To demonstrate the global convergence property of this estimator, Table 2 gives estimates and log-likelihood values for the EM algorithm approach compared with numerical optimization on the full likelihood using the Nelder Mead algorithm (Nelder and Mead, 1965). This was done for the fixed values k = 10, n = 42, x = (25, 10, 2) / ∈ R n , and ten starting values randomly generated on the probability simplex. We see that, for many of the starting values, both algorithms yield identical values, but that the EM algorithm based estimator is extremely consistent across all initial points. For the Nelder Mead algorithm, however, there is quite a bit of variation, with some final estimates far from the true maximizer. While in some cases it may be possible to make a more informed decision about the starting value, the ability to bypass this step all together, while still guaranteeing convergence to the global maximum, is a strong advantage of the EM algorithm based estimator presented here.

Alternative estimators
In this section we propose two closed form estimators which are alternatives to the MLE given in the previous section which requires numerical optimization.

Restricted method of moments estimator
The first estimator, which is a method of moments type estimator, is motivated by the result in Theorem 2, and simply truncates the value of p 11 to 0 for values x / ∈ R n . This estimator has several advantages, most notably that it has a simple closed form and, as we will show empirically, slightly outperforms the MLE in terms of both bias and MSE in most cases. It is not hard to see thatp RM M ∈ Ψ p for all x and thatp RM M =p M LE ∈ Ψ p for all x ∈ R n . Further properties of this estimator are given in the following result, which is a corollary of Theorem 1. (c)p RM M is asymptotically normal and efficient.

Burrows' type estimator
One of the main advantages of a closed form estimator as above is the ability to provide simple bias corrections. The problem of bias for the MLE can be generalized from the single-trait group testing case, where this issue is a major focus of the literature. A proof that no unbiased estimator exists for the single-trait group testing problem under a fixed sampling model is given in , and this can be extended directly to the two-trait model considered in this paper. The issue of bias for the two-trait case is discussed further in , where it is shown that any unbiased estimator found under an alternative sampling plan necessarily yields values outside the parameter space.
In the one-trait case, Burrows (1987) estimator has been shown repeatedly to improve on the MLE in terms of both bias and MSE (see, for example, Hepworth and Watson (2009) and Ding and Xiong (2016)). The motivation is to find a shrinkage estimator of the form (1 − αx) 1/k where α is optimal in the sense of removing bias of O(1/n) from the estimator. Burrows (1987) showed that this is accomplished by taking α = n n+η where η = k−1 2k . In the two disease case, it can be shown that, when the MLE exists, applying the identical shrinkage coefficient to each term in the estimator yields the same overall bias reduction. This is true since, for x ∈ R n , each term of the MLE as given in Theorem 1 is marginally binomially distributed, so that the problem is identical to that in Burrows' original work. In cases where the MLE does not exist, we can apply the same correction to the terms of the estimatorp RM M to get a similar approximate result on the non-truncated terms.

Numerical comparisons
In this section we provide numerical comparisons for each of the three estimators introduced here in terms of relative bias and MSE. For the ith component of p and an estimatorp, the relative bias is defined to be 100 . Results are provided for six values of p, covering a range of realistic scenarios from very small (p = (0.001, 0.001.0.0001)) to moderately small (p = (0.25, 0.05, 0.15)). Larger values of the prevalence parameters are not considered here as it would be uncommon for group testing to be considered in such cases. Tables 3 and 5 give the bias and MSE calculations, respectively, for the group size k = 2, while Tables 4 and 6 give the same for k = 10. In each table, results are given for an increasing number of tests, n = 10, 25, 50 and 100.
Across all tables, we see the unsurprising fact that, overall, both the bias and MSE decrease as a function of n. For all values of p, there is a marked increase in bias as k moves from 2 to 10. It should be noted that if k = 1, all estimators would yield an identical unbiased estimator. This increase in bias is accompanied by large decreases in MSE for small p, which shows the general advantage of group testing in such cases. Of course, for larger p, k = 10 yields unacceptably large MSE values. This indicates the importance of choosing an appropriate value of k, a problem which is known to be very difficult, even in the case of estimating a single trait (for discussion of this issue, see Hughes-Oliver and Swallow, 1994;Haber and Malinovsky, 2017).
Comparing estimators, we see that the the MLE and RMM estimators yield very similar, and often identical, values for both bias and MSE. While there is some trade off for the bias, the RMM method generally outperforms the MLE method slightly in terms of MSE. This, combined with the estimator's simple closed form expression, makes a strong argument for preferring the RMM method to the MLE method in practice, even for small sample sizes.
For the Burrows' type estimator, comparisons in terms of bias are much more difficult. This is especially true for k = 10, where the larger overall values occur with wide fluctuation across individual components of the parameter vector. For example, with k = 10, n = 10, and p = (0.15, 0.1, 0.2), the relative bias for the MLE and Burrows' estimators, respectively,  In both cases, the levels of bias are very high, and would likely be unacceptable in most applications. On the other hand, when p is very small, or as n increases, the Burrows' estimator does generally offer an, at least modest, bias reduction. The real advantage of the Burrows' type estimator is seen when looking at the MSE. In Table 5, the MSE values forp B are moderately smaller than those of the other two estimators for all scenarios considered. When k is increased to 10, as shown in Table 6, we see that this trend of slight improvement continuous for the smaller values of p, which are precisely the cases for which a larger k is appropriate. As the prevalence increases, however, and the MSE values for the MLE and RMM become highly inflated, the Burrows' type estimator is able to maintain much more reasonable levels. This robustness property ofp B is very important, as it lowers the impact of choosing a poor value for the group size k.
We close this section with two interesting observations regarding the numerical comparisons. First, for the bias calculations, we see that each estimator often yields a large positive bias for the p 11 component. This is true despite the fact that these estimators truncate the corresponding value to zero for a set of sample values with non-trivial probability (as seen in Table 1). This surprising fact indicates that any attempts at reducing the bias in such cases will likely be very difficult. Second, from Table 6 we see that, for the larger two values of p, some of the MSE values forp B actually increase with n, approaching rapidly the values for the other two estimators. For p = (0.25, 0.05, 0.25), we see from Table 1 that the increasing MSE values correspond with rising values of P (x / ∈ R n ). While the higher prevalence of boundary estimates largely explains the increasing MSE values, it is interesting that this phenomenon is seen only for the Burrows' type estimator and not the MLE or RMM methods.

Application to estimation of Potato virus Y transmission rates
Potato virus Y (PVY) is a member of the genus potyvirus, one of the largest groups of plant viruses in the world (Mallik et al., 2012). PVY is known to infect over 15 plant species and to be transmitted by over 50 species of aphids (Gray et al., 2010). The disease can have a large economic impact due to decreased yield from a variety of symptoms such as leaf necrosis, mosaic or vein banding, and leaf drop. Further losses occur when seeds are excluded from being sold as part of certified seed programs requiring virus incidence rates as low as 1% (Gray et al., 2010). In recent years, the ordinary strain, PVY O , has declined in prevalence relative to several new and recombinant strains (Mondal et al., 2017). Notable among such strains in the United States are PVY N:O and PVY NTN . These developments are important since various strains can present with a variety of different symptoms, and they may not be detected in the current widely available screening tests for PVY. The development of multiplex assays for this purpose has been an active area of research in the plant science literature (see, as a few examples, Avrahami-Moyal et al., 2017;Lorenzen, 2006;Mallik et al., 2012).
Understanding why this shift in strains is occurring remains largely unknown. One possibility that has been frequently tested is that these differences can be accounted for by varying rates in transmission by aphids across strains (Mondal et al., 2017). Since examining this hypothesis requires an efficient means of carrying out prevalence estimation, we use this example to illustrate the methods presented in this paper.
As a specific example, we consider two experiments estimating the prevalence rates of two sets of PVY strains, PVY O and PVY N:O , as well as PVY O and PVY NTN . This was the motivation for two studies found in Mondal et al. (2017), which looked at transmission rates for these strains by the aphid myzus persicae. Those studies yielded estimates  (Gray et al., 2010).
Using the above parameter estimates as fact, we calculate the relative bias and MSE for each estimator presented here for various values of n and k. Included is the case of k = 1 (no group testing), for which all estimators are identical. We note that, while the studies in Mondal et al. (2017) did not use pooled sampling, there exists many group testing applications in the PVY literature (see, as just two examples, Mello et al., 2011;Fletcher, 2012). Figures 1 and 2 give plots of the relative bias and log(M SE), respectively, for k increasing from 1 to 25 and various values of n for the first experiment. We use log(M SE) here to account for the rapid increases in scale as k grows. Figures 3 and 4 give similar plots for the second experiment. From Figure 1, we see that, with the exception of n = 25, the relative bias is reasonably close to zero for all values of k. For the largest number of tests, n = 250, the bias has all but disappeared. Even for n = 25 with small k, less than or equal to 10, the level of bias is sufficiently small for all estimators. We note that, in this and all subsequent plots, the MLE and RMM values are indistinguishable, which is unsurprising given the results in the previous section.
In Figure 4, we see the log(M SE) has a similar parabolic shape for each component of the parameter vector and all n. In this particular case, it appears that the same group size, k = 10, minimizes the MSE simultaneously for each parameter and number of tests. By carefully noting the change in scale on the y-axis, it is clear that the MSE is decreasing appreciably with n. The values across estimators are similar for most scenarios, with the Burrows' type estimator appearing to offer a slight improvement in MSE. For the PVY O component, which is the largest among the three, we see that Burrows' estimator is much more robust to the choice of group size, although this advantage decreases with n. In Figure 3, where the underlying prevalence is larger, we see much more variation in the relative bias among estimators and across k. Now, only for small values of k = 1 or k = 2 is the bias near zero. In this case, due to the large value of p O+N T N , as k increases the probability of each test being positive for both strains of PVY goes to one. As such, we see, for all estimators, the relative bias for PVY O and PVY NTN go to −100 (since the estimates for each converge to zero).For PVY O + NTN , the MLE and RMM estimates approach 100 × 1 0.178 − 1 = 461.8. This is another illustration of the importance of choosing an appropriate pool size, so as to avoid the problem of getting all positive groups (for an excellent discussion of the problem of drawing all positive groups in the one-trait estimation case, see Hepworth and Watson, 2009). While the Burrows' type estimator ameliorates this somewhat by shrinking the estimate of p O+N T N towards zero, it does nothing to help the zero estimates for the other parameters.
For the MSE, Figure 4 indicates that, for the MLE and RMM estimators, the minimum value is attained at k = 2 for all values of n and each parameter. For the Burrows' type estimator, this is true for the first two parameters, but not for p O+N T N , for which the MSE appears to continue decreasing as a function of k (with the exception of n = 25, where the minimum occurs at k = 20). Of course, when considering all three parameters together, the choice of k = 2 appears ideal, even for the Burrows' type estimator. As in Figure 2, the Burrows' type estimator outperforms the others, slightly for small k and to a significant degree for k between 10 and 20. While it is important to look at performance for each component of the parameter vector, the lack of a single index value makes direct quantitative comparison difficult. To address this, we consider looking at the average absolute relative bias and average MSE, where the mean is taken across the three elements of the parameter vector. These values are provided in Tables 7 and 8 for each of the scenarios considered above for the first and second experiments, respectively. Table 7 shows that, for all estimators, the smallest average MSE value occurs when k = 10. This adds support to what was observed in Figure 2. With this group size, the average MSE is 4.4 times smaller when n = 25 than the value when group testing is not used. For n = 250, this decreases to 4.1, which is still a very large gain in efficiency. This is achieved with only a small increase in the bias with, for example, the Burrows' type estimator yielding relative bias values of 2.636 for n = 25 and 0.001 when n = 250. This shows a clear advantage to using group testing in similar applications. Even if the ideal group size of 10 was not chosen, we still see a large benefit for all estimators when k = 2 or k = 5. For the Burrows' type estimator, even if too large of a group size, such as k = 25, were chosen, the result is still much better than when group testing is not used. For example, with n = 50 and k = 25, the Burrows' type estimator yields an average MSE 2.6 times smaller than that achieved without group testing, and an average absolute relative bias of only 5.3. Of course, in the same scenario if the MLE were used instead, the average MSE would be nearly double that of the non-group testing case, showing again the benefit of the Burrows' type estimator.
In Table 8 we again see, similar to what was indicated in Figure 4, that the best choice of group size for the second experiment is k = 2. In this case, however, the gains relative to the non grouping case are much more modest with, for example, the average MSE for the Burrows' type estimator only 1.2 times smaller when n = 25.

Discussion
In this paper, we have addressed the problem of prevalence estimation for two-traits simultaneously using group-testing methods. While a solution for large samples is straightforward (as shown in Theorem 1), the high probability of the MLE lying on the boundary for small sample cases requires more careful consideration. We have shown that (see Table 1), depending on the underlying prevalence, this problem can be substantial, even for sample sizes that are much larger than would be feasible in many applications. The problem of finding an MLE has been shown as a special case of maximizing a multinomial likelihood with a restricted parameter space. While this is, in general, a difficult problem, we showed that, whenever a closed form maximizer does not exist, the optimization problem can be expressed as a simpler problem in one fewer dimensions. This was used to develop an estimation approach based on the EM algorithm which is simple both conceptually and computationally. More importantly, the resulting estimator is guaranteed to converge to the global maximizer for all values in the sample space.
In addition, we have provided a second estimator, based on the method of moments, which very closely approximates the MLE, but has the advantage of a closed form expression. This, in turn, was used to develop a third estimator based on the shrinkage estimator for one-trait group testing estimation presented in Burrows (1987).
Among the three estimators, numerical comparisons showed that none uniformly outperforms the others in terms of relative bias and MSE. Still, the Burrows' type estimator does generally offer some advantage in terms of MSE, and has the added benefit of being robust to poor specification of the group size. As such, this estimator can be recommended as the best choice in most cases, including those similar to the PVY strain prevalence estimation application considered here.
One important extension that we have not considered here is estimation when tests are subject to misclassification. Among many other examples, group testing studies incorporating testing errors can be found in Tu et al. (1995), Liu et al. (2012), and Zhang et al. (2014). While very common in medical studies, this issue appears to be less commonly considered in plant science areas such as the PVY application presented here. For example, none of the previously cited papers in the PVY literature (Gray et al., 2010;Mello et al., 2011;Mondal et al., 2017) included misclassification parameters in their statistical models, although exceptions do exist (see, as one example, Liu et al., 2011).
A second area which has not been considered here, is how to best choose the group size k. In the one-trait estimation case, this has proven to be a very difficult problem with solutions requiring either reasonably precise prior knowledge of the true parameter value (e.g. Swallow, 1985) or adaptive approaches (e.g. Hughes-Oliver and Swallow, 1994). For two traits, an adaptive approach based on optimal design theory has been given (Hughes-Oliver and Rosenberger, 2000), but relies heavily on large sample assumptions. Numerical studies or theoretical results for constructing locally optimum designs based on prior information have, to our knowledge, not been done. Further research is necessary to provide reasonable small sample solutions to this problem in order to realize the full benefits of group-testing for a wide range of applications.