Discussion of “Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation”

We would like to first congratulate the authors for this timely survey on highdimensional covariance matrix estimation. Over the last decade, there has been a flurry of activities in high-dimensional statistics in general and more specifically on covariance estimation. The article provides an organized overview of the key advancements in the area of sparse covariance matrix estimation and sparse inverse covariance matrix estimation. Specifically it enlists the parameter spaces corresponding to the different types of sparse structures commonly assumed in the literature and provides minimax lower bounds and estimators that match these lower bounds for each of the parameter spaces. Furthermore, it includes discussion on two closely related problems, namely sparse principal component analysis and high-dimensional covariance testing. This expository article, without a doubt would serve as a “goto” reference for people who are interested to learn about high dimensional covariance matrix estimation in the future. In this discussion, we concentrate on a particular assumption made in most of the literature on covariance estimation, namely the sub-Gaussian assumption. It is long recognized by statisticians that data from real-world experiments oftentimes tend to be corrupted with outliers and/or exhibit heavy tails. In such cases, it is not clear that those covariance matrix estimators described in this article remain optimal. In particular, many of these estimators are based upon sample covariance matrices that is known to perform rather poorly if the data is corrupted with outliers, under the classical fixed dimensionality paradigm [1].


Introduction
We would like to first congratulate the authors for this timely survey on highdimensional covariance matrix estimation. Over the last decade, there has been a flurry of activities in high-dimensional statistics in general and more specifically on covariance estimation. The article provides an organized overview of the key advancements in the area of sparse covariance matrix estimation and sparse inverse covariance matrix estimation. Specifically it enlists the parameter spaces corresponding to the different types of sparse structures commonly assumed in the literature and provides minimax lower bounds and estimators that match these lower bounds for each of the parameter spaces. Furthermore, it includes discussion on two closely related problems, namely sparse principal component analysis and high-dimensional covariance testing. This expository article, without a doubt would serve as a "goto" reference for people who are interested to learn about high dimensional covariance matrix estimation in the future.
In this discussion, we concentrate on a particular assumption made in most of the literature on covariance estimation, namely the sub-Gaussian assumption. It is long recognized by statisticians that data from real-world experiments oftentimes tend to be corrupted with outliers and/or exhibit heavy tails. In such cases, it is not clear that those covariance matrix estimators described in this article remain optimal. In particular, many of these estimators are based upon sample covariance matrices that is known to perform rather poorly if the data is corrupted with outliers, under the classical fixed dimensionality paradigm [1]. * Main article 10.1214/15-EJS1081. † This research of was supported in part by NSF Career Award DMS-1321692, FRG Grant DMS-1265202, and NIH Grant 1-U54AI117924-01.

K. Balasubramanian and M. Yuan
To explore this possible pitfall and offer potential solutions, we investigate in this discussion a simple strategy of replacing the sample covariance matrix with a more robust median covariance matrix.

Median covariance matrix estimator
Let {X (1) , . . . , X (n) } be n random variables taking values in a Banach space (B, · ), with metric · . Recall that the geometric median (cf. [2]),m(X (1) , X (2) , . . . , X (n) ), of the n random variables is defined as the point in the space that minimizes the sum of its distance to all observations: m(X (1) , X (2) , . . . , X (n) ) = arg min We note that the minimizer on the right hand side is uniquely defined and therefore the above definition is valid if the space B is separable, reflexive and strictly convex; see [3] for further discussions about the existence and other properties ofm. Now consider employing a similar strategy for covariance matrix estimation. More specifically, definê where, for simplicity, we use the standard Frobenious norm A F = tr(A A) as the metric for calculatingm. Instead of Frobenius norm, one may also consider other metrics, for example those adapted to the manifold of symmetric positive semidefinite matrices, when computingm.
Once the median covariance matrix is computed, we may impose sparse structures on it to derive estimates suitable for high dimensional problems. To fix ideas, here we focus on the bandable case and consider a banding estimator based onΣ med :Σ We now examine the effect of using Σ med as the initial estimator through a numerical experiment.

Numerical example
To compare the median covariance matrix based banding estimator against the sample covariance matrix based estimator, we now conduct a set of simulation studies. We generated n = 50 random vectors in R 120 . Of the 50 observations, a random fraction (ρ) are corrupted, and the corrupted observations are sampled from [−10, 10] 120 . The remaining observations are samples from a Gaussian distribution with a certain AR(1) covariance structure similar to [4], that is Σ * ij = η |i−j| , with η = 0.7. To gain insights into the effect of the level of corruption, we consider various values of ρ, namely ρ ∈ {0.10, 0.14, 0.18, 0.20}. We compare both the sample covariance matrix based and median covariance matrix based banding estimator in terms of spectral norm. To fix ideas, we fix k = 5 for both estimators. The result based on 50 runs is summarized below. The benefit of using the more robust median covariance matrix as the initial estimator is evident from this example.
To conclude, robustness to corrupted and/or heavy-tailed data appears to be an important practical issue when it comes to covariance matrix estimation. Our limited experiment here suggests that it could potentially be gained by simply replacing the sample covariance matrix with a robust covariance matrix estimator. To what extent this is true and what are the other possible strategies to deal with heavy tailed distributions warrant further studies.