Corrigendum to “Maximum likelihood estimation in logistic regression models with a diverging number of covariates”

Binary data with high-dimensional covariates have become more and more common in many disciplines. In this paper we consider the maximum likelihood estimation for logistic regression models with a diverging number of covariates. Under mild conditions we establish the asymptotic normality of the maximum likelihood estimate when the number of covariates p goes to infinity with the sample size n in the order of p = o(n). This remarkably improves the existing results that can only allow p growing in an order of o(nα) with α ∈ [1/5, 1/2] [12, 14]. A major innovation in our proof is the use of the injective function. AMS 2000 subject classifications: Primary 62F12; secondary 62J12.


Introduction
High dimensional logistic regression models have attracted much attention recently as binary data with a diverging number of covariates are becoming more and more common in many disciplines. Let y be a binary response variable and x = (x 1 , . . . , x p ) ⊤ be the vector of covariate whose relationship with y can be described as logit{P (y = 1|x)} = x ⊤ β, (1.1) where β = (β 1 , . . . , β p ) ⊤ is a vector of unknown parameters. We are interested in the high-dimensional case when p diverges with the sample size n. Hence we may use p n when needed to emphasize the dependence of p on n.
Logistic models are standard and powerful tools to describe the relationship between a binary response variable and a set of covariates. Estimation and inference based on the maximum likelihood estimation in logistic regression have been well studied in theory and widely used in practice [6,[9][10][11]. Recently, logistic regression models have been applied to analyze high-dimensional data where p may diverge with n [3,4,7,13]. These papers primarily focus on developing various variable selection procedures. The success of these procedures relies on certain sparsity assumption that allows only a small number of covariates to have nonzero effects. Correspondingly, the asymptotic theories in these papers are devoted to the investigation of selection property like the well-known oracle property. In summary, the contribution of these work is the successful reduction of the possibly ultra-high dimension (p >> n) estimation problem to a problem with much lower dimensions (p = o(n)).
Despite these developments, theoretical properties of the maximum likelihood estimator (MLE) in a general setting of high dimensional β with p = o(n) are not established yet. [3] offered some insights for a special case when all the components of the true β are nonzero and all the components of its MLE are not too close to zero. [14] considered generalized estimating equation analysis of clustered binary data with a diverging number of covariates. Particularly, she showed that the GEE estimator is consistent when the dimension diverges in the order of o(n 1/2 ) and its arbitrary linear combination is asymptotically normal when the dimension diverges in the order of o(n 1/3 ).
Our paper aims to fill in this important gap by showing the asymptotic normality of the MLE of high-dimensional β under mild conditions that only require p/n → 0. A critical step in our theoretical derivation is an innovative use of the injective function.
The rest of the paper is organized as follows. We present the asymptotic normality result for high-dimensional logistic regression models in Section 2, and give the proof in Section 3. We conclude the paper with some discussions in Section 4.

MLE with diverging dimension
Let F (v) = {1 + exp(−v)} −1 be the logistic distribution function. Then the log-likelihood function is The maximum likelihood estimator β n of β is the solution to When p is fixed, [2] studied asymptotic normality of the maximum likelihood estimator β n under mild assumptions. [15] considered the same problem under weaker assumptions. With a diverging p n , the problem becomes much more complicated and has not been investigated yet. Two questions need to be answered. First, are there any random variables such that (2.2) holds in probability? Second, if so, are these random variables still asymptotically normal and under what conditions? Our theorem below addresses these two questions. The tool we use to prove the existence is a local inverse function theorem developed by [1], who studied strong consistency of maximum quasi-likelihood estimators in generalized linear models, and the idea once used in [15].
In what follows, λ max (A) and λ min (A) denote the maximum and minimum eigenvalues for a matrix A respectively, and A ,j and A j, the jth column and row of matrix A. A 1 ≥ A 2 means A 1 − A 2 is semi-positive definite for two matrices A 1 and A 2 . C will be a generic constant with different values in different places. Let · 2 be the standard Euclidean norm on R n .
The following conditions are imposed to obtain Theorem 1. Assumption.
To the best of our knowledge, condition (A1) is the weakest assumption on the order of p n comparing to the assumptions of p n = o(n 1/2 ) or o(n 1/3 ) in the existing literature; see, e.g., [14] and the references therein. It might be very difficult to improve the order without any further assumptions such as sparsity.
To bound the covariates, [14] requires sup i,j |x ij | = O( √ p n ). When p n is a constant, this bound coincides with ours in condition (A2). When p n diverges with n, our bound is a bit more restrictive. With our bound, one can always find a positive constant c 00 < 1/2 such that For example, the right-hand side 1 − c 00 of (2.3) can always be replaced by 3/4. Equation (2.3) indicates that, for any p n -vector v, The rest of condition (A2) bounds the eigenvalues of S n . This is a stability assumption to ensure S n /n is not ill-conditioned. This assumption is needed for asymptotic investigation ofβ even in the designs with fixed number of covariates [1,8]. Similar conditions are required in establishing asymptotic normality of the maximum likelihood estimation for GLM with fixed number of covariates [see, for example, 1].
Theorem 1. Suppose Assumptions (A1)-(A2) hold. Then there exist a sequence of random variables β n such that and where u is an unit p n -vector, and G n = G n (β 0 ).
The first part indicates that with probability tending to 1, there exists a solution of the equation L n (β) = 0, while the second part ensures that this solution is asymptotically normal.

Technical Lemmas
We state or prove several preliminary lemmas first. In the following, · always refers to the l 2 -norm · 2 . Lemma 1. [5] If F is continuously differentiable in a convex interval of IR, then where t 1 , t 2 ∈ IR.
[1] Let Υ be a smooth injection from IR pn to IR pn with Υ(x 0 ) = y 0 and inf x−x0 =δ Υ(x) − y 0 ≥ R. Then for any y with y − y 0 ≤ R, there is an x with x − x 0 ≤ δ such that Υ(x) = y.
Lemma 3. Under the conditions of Theorem 1, we have sup β∈Nn(δ) where Q n (β) = ∂L n (β)/∂β ⊤ and N n (δ) = {β : G We will show that each of these three terms approaches to zero on N n (δ).

Proof of Theorem 1
The proof consists of three steps. We establish the asymptotic normality of L n (β 0 ) in the first step, and prove (2.4) in the second step. In the third step, we justify that u ⊤ G 1/2 n ( β n − β 0 ) can be approximated as a combination of L n (β 0 ), and thus complete the proof of the theorem.
Step 1. We will show It is easy to verify that E(ξ i ) = 0. It now suffices to prove that (Lindeberg's condition), for any ζ > 0, as n → ∞, Let a ni = u ⊤ G −1/2 n x i . Similar to (3.10), we can show that max 1≤i≤n a ni 2 → 0. Also (3.10) showed that n i=1 a ni 2 is bounded. Combining these with the Cauchy-Schwartz inequality and (2.3) ensures (3.13). The central limiting theorem then yields (3.12).

Discussion
In a rather general setting, we have established the asymptotic normality for maximum likelihood estimators in logistic regression models with high-dimensional covariates. We believe that the procedure can be extended to other generalized linear models and similar theoretical results may be established with straightforward derivations. One potential complication for other generalized linear models is that the response y may not be bounded as in logistic regression models.
Other possible extensions are to the Cox model, robust regression, and procedures based on quasi-likelihood functions. Further effort is needed to build up similar procedure and theoretical results under these settings.