Comment on “Hypothesis testing by convex optimization”

With the growing size of problems at hand, convexity has become preponderant in modern statistics. Indeed, convex relaxations of NP-hard problems have been successfully employed in a variety of statistical problems such as classification [2, 16], linear regression [7, 5], matrix estimation [8, 12], graphical models [15, 9] or sparse principal component analysis (PCA) [10, 4]. The paper “Hypothesis testing by convex optimization” by Alexander Goldenshluger, Anatoli Juditsky and Arkadi Nemirovski, hereafter denoted by GJN, brings a new perspective on the role of convexity in a fundamental statistical problem: composite hypothesis testing. The role of this problem is illustrated in the light of several interesting applications in Section 4 of GJN. One of the key insights in GJN is that there exists a pair of distributions, one in each of the composite hypotheses and on which the statistician should focus her efforts. Indeed, Theorem 2.1(ii) guarantees that any test that is optimal for this simple hypothesis problem is also near optimal for the composite hypothesis problem. Moreover, this pair can be found by solving a convex optimization problem. While convexity does not necessarily imply tractability, the convex problem considered here may become simple to the point that closed solutions exist even though no succinct description of the hypothesis sets may be known. This point is illustrated below. Unlike the papers cited above, where the original problem to be solved is non-convex, GJN assumes given convex hypotheses (or finite unions of convex hypotheses). Hereafter, we investigate the performance of the proposed test when convexity is artificial and arises as a relaxation of a non-convex problem. Let us consider two examples that fall under the umbrella of combinatorial testing problems [1]. Such problems are defined as follows. Assume that one observes a Gaussian random vector X ∼ N (μ, In) for some μ ∈ R. Let p ∈ {0, 1}n be a sparsity pattern [17]. Given a class P ⊂ {0, 1}n of sparsity patterns,


1724
P. Rigollet we are interested in the following hypothesis testing problem: where λP = {λp : p ∈ P} for some λ > 0. The question is: "How large should λ be in order to test with a pre-specified small risk?". Here, the risk of a test is defined as in GJN.
Several classes of sparsity can be considered [1] but perhaps two of them have more direct statistical relevance. The first one is the class of k-sparse vectors defined as P n The problem becomes detection of sparse means, which has applications in various problems including signal processing and steganography. To describe the second problem, assume that n = d 2 and fix an arbitrary bijection T : R n → R d×d , onto the space of d×d real matrices. The class P 2 of k-clusters (or cliques) is defined as the set P n 2 = {p ∈ {0, 1} n : T (p) = qq , q ∈ P d 1 }. In other words, these are the sparsity patterns p such that T (p) is the adjacency matrix of a clique of size k in an otherwise empty graph of size d. The class P 2 of sparsity patterns has applications in clustering [6,14] and sparse PCA [3,4].
These combinatorial testing problems do not fall in the category of good observation schemes as defined in GJN because the class Y = λP is not convex. Moreover, these two sets are of size that is exponential in k and performing the simple hypothesis tests for all p ∈ P n i , ∈ {1, 2} as recommended in Section 3.1 of GJN is simply intractable. To circumvent this limitation, let us explore a convexification of the problem and study instead where conv(P) denotes the convex hull of P and is defined as the smallest convex set that contains P. In the case of P n 1 and P n 2 , these convex sets are polytopes. Even so, optimization over polytopes may not be tractable. For example, some polytopes are known to not have a description involving a small number of linear constraints [18] and are therefore not amenable to linear programming. Fortunately, the optimization problems that are required by GJN admit an explicit solution in these two specific cases. Indeed, it follows from equation (7) in GJN that a near optimal test can be found by testing H 0 : μ = 0 against H 1 : μ = λμ whereμ is the point in the polytope conv(P) with the smallest Euclidean norm. For both polytopes conv(P n 1 ) and conv(P n 1 ), such a vector can be easily computed analytically.
We begin with P n 1 . In this case, it is simply the vectorμ 1 = (k/n)1 n , where 1 n ∈ R n denotes the all-ones vector. Moreover, the optimal test of 0 versusμ 1 has small risk as soon as λ ≥ C √ n/k for some positive constant C. This rate is known to be optimal when k √ n but is suboptimal for smaller values of k [1].
In the case of P n 2 , it can be shown that so that the optimal test of 0 versusμ 2 has small risk as soon as λ ≥ Cd/k 2 for some positive constant C. As before this rate is optimal if k √ d but a better rate can be achieved if k √ d [6,14]. As a result, it seems that convexifying the problems in that way is too coarse for very sparse cases.
While in appearance the two problems seem to have the same computational limitations, they are in reality quite different from this point of view. Indeed, on the one hand, detecting a sparse mean μ ∈ P n 1 can be solved in a very efficient way by simply looking at the k largest entries of X [1]. On the other hand, a recent line of work has shown that optimal detection for k-clusters may not be solvable efficiently if one wishes to use a computationally efficient procedure such as the one employed in GJN. Indeed, sparse PCA [3] and sub-matrix detection [14] both have the k-cluster structure and are known to be intrinsically computationally hard to solve optimally if one believes in the planted clique conjecture [11,13].