Eﬃcient estimation for longitudinal data by combining large-dimensional moment conditions

: The quadratic inference function approach is able to provide a consistent and eﬃcient estimator if valid moment conditions are available. However, the QIF estimator is unstable when the dimension of moment conditions is large compared to the sample size, due to the singularity problem for the estimated weighting matrix. We propose a new estimation procedure which combines all valid moment conditions optimally via the spectral decomposition of the weighting matrix. In theory, we show that the proposed method yields a consistent and eﬃcient estimator which follows an asymptotic normal distribution. In addition, Monte Carlo studies indicate that the proposed method performs well in the sense of reducing bias and improving estimation eﬃciency. A real data example of Fortune 500 companies is used to compare the performance of the new method with existing methods.


Introduction
Longitudinal data arise frequently in many studies where repeated measurements from a subject are correlated. The correlated nature of longitudinal data makes it difficult to specify the full likelihood function for non-normal responses. [10] proposed the generalized estimating equation (GEE) for correlated data, which only requires the first two moments and a working correlation matrix of errors to account for correlations. Although the GEE provides a consistent estimator regardless of whether the working correlation is correctly specified or not, the estimator can be inefficient under misspecified correlation structures. [13] developed the quadratic inference function (QIF) based on the generalized method of moments [8] to achieve better estimation efficiency.
For correlated data with large-dimensional cluster size, it is important to account for the true correlation information since it can reduce the bias and increase the efficiency of estimation. For example, the QIF utilizing a full set of basis matrices allows one to select a flexible correlation structure, and can increase the efficiency of the estimator significantly [15]. However, this generates many moment conditions for large sized clusters since the dimension of moment conditions depends on the number of basis matrices, which relies on the cluster size. This could be problematic in estimating the inverse of the sample covariance matrix of moment conditions, which is an optimal weighting matrix for the QIF estimator and plays a crucial role in achieving an efficient QIF estimator. First, the sample covariance matrix might not be full rank when there are more moment conditions than the sample size. Second, even if the sample covariance matrix is invertible, the estimation of its inverse could be biased with high variation. Therefore the QIF could perform poorly due to infeasible or imprecise estimation of the optimal weighing matrix.
In the generalized method of moments literature, it has been shown that over-identified moment conditions may cause poor performance in finite sample estimation [9,11]. In this paper, we are motivated by a problem in longitudinal data where a dimension of moment conditions is relatively large compared to the sample size, or the moment conditions are highly correlated. The singularity or near singularity of the weighting matrix makes the QIF estimator infeasible or unstable. In order to solve this problem, the subset moment selection method has been developed for large-dimensional moment conditions. [7,1,12] propose to eliminate the least informative moment conditions to reduce the overall number of moment conditions. However, this requires prior information on the moment conditions. [6,2,3] utilized penalized objective functions to select informative moment conditions. However, the underlying assumption is that the most of the moment conditions are not informative. [5,4] propose selecting moment conditions based on the criterion of minimizing the mean square error of the estimator. However, their criterion requires inverting the sample covariance matrix, which could be infeasible when the dimension of moment conditions exceeds the sample size. Moreover, most moment selection approaches result in efficiency loss for parameter estimation, since the information from unselected moment conditions is not utilized.
We propose a new estimation procedure which combines all valid moment conditions using principle components analysis. We apply a spectral decomposition of the covariance matrix for the moment conditions and select an optimal number of linear combinations of the moment conditions through a new objective function based on a Bayesian information type of criterion [14]. This allows one to reduce the dimensionality of valid moment conditions, while retaining most of the information from all moment conditions. The proposed method performs well in the sense of reducing bias and improving the efficiency of QIF estimation, and is especially effective when the dimension of moment conditions is high compared to the sample size. Furthermore, it is capable of incorporating a set of preselected moment conditions, in conjunction with selecting the optimal linear combinations of remaining moment conditions. This has the advantage of preventing any information loss from moment conditions which surely should be included for estimation.
In theory, we show that the proposed criterion is able to select the number of principal components consistently, when the sample size goes to infinity. The QIF estimator using the selected linear combinations of moment conditions is consistent and asymptotically normal. In addition, the proposed approach yields an efficient estimator in the sense that its asymptotic variance matrix reaches the minimum. Our numerical studies also confirm that a subset moment selection approach, or replacing an identity matrix as the weighting matrix approach result in less accurate and efficient estimation compared to the proposed estimator.
The paper is organized as follows. Section 2 provides the background of the quadratic inference function and motivation of the problem. Section 3 introduces an efficient estimation approach which combines all valid moment conditions optimally and provides asymptotic properties of the proposed estimator. Section 4 illustrates our method with simulation studies and an application to real data. The final section gives concluding remarks and discussion. All proofs of the lemmas and theories are provided in the Appendix.

Notation and preliminaries
Let the response variable for the ith subject be y i = (y i1 , . . . , y imi ) , where y i 's are independent identically distributed for i = 1, . . . , n, n is the sample size and m i is the cluster size. To simplify the notation, we first set m i = m for all i, and the unbalanced data case will be discussed in more detail in Section 3.3. The corresponding covariate for the ith subject is For the generalized linear model, the marginal mean of y ij is represented as μ ij = E(y ij |x ij ) = μ(x ij β), where μ(·) is an inverse link function and β is a p-dimensional parameter vector. [10] proposed the generalized estimating equation (GEE) as a marginal model approach for estimating β by solving . . , μ im ) , A i is the diagonal marginal variance matrix of y i and R is a common working correlation matrix for all subjects.
[13] approximate the inverse of the working correlation using a linear combination of basis matrices, where I is an identity matrix, B 1 , . . . , B q are basis matrices with 0 and 1 components and a j 's are unknown coefficients. Consequently, the GEE can be approximated as a linear combination of the elements in the following moment conditions (2.3) Note that g i in (2.2) does not involve the nuisance parameters a 0 , . . . , a q associated with the linear weights in (2.1). However, it is impossible to set each estimating equation in (2.2) to zero simultaneously in solving β, as the dimension of the moment conditions exceeds the dimension of parameters. [13] proposed obtaining an estimator of β by minimizing the quadratic inference function, where V(β) −1 = E{g i (β)g i (β) } −1 is a weighting matrix and V(β) is estimated consistently by a sample covariance matrix C n (β) = 1 n n i=1 g i (β)g i (β) . Similar to the generalized method of moments, the QIF estimator utilizing C n (β) is optimal in the sense that the asymptotic variance matrix of the QIF estimator reaches the minimum among all estimators solved by the same linear class of the moment conditions given in (2.3).

QIF with large-dimensional moment conditions
For high-dimensional clustered data, utilizing accurate correlation structures for correlated measurements is essential for improving the efficiency of regression parameter estimators and reducing the bias of the estimator. Although the GEE approach requires only a few nuisance parameters to specify a common working correlation structure, this structure does not represent the true correlation structure sufficiently well, especially when the cluster size is large. It is well-known that when the correlation structure is misspecified, the GEE estimator can be inefficient. The QIF approach is able to improve the efficiency of parameter estimation by representing the correlation structure as pre-specified basis matrices.
The pre-specified basis matrices are useful to approximate the working correlation matrix R if the inverse of the correlation structure has a linear representation in (2.1). For example, if R corresponds to an exchangeable structure, then R −1 = a 0 I + a 1 B 1 , where a 0 and a 1 are coefficients associated with the exchangeable correlation parameter, and B 1 is a symmetric matrix with 0 on the diagonal and 1 elsewhere. If R is the first-order autoregressive (AR1), then where a 0 , a 2 and a 3 are coefficients associated with the AR1 correlation parameters, B 2 is a symmetric matrix with 1 on the sub-diagonal entries and 0 elsewhere, and B 3 is a symmetric matrix with 1 in elements (1, 1) and (m, m). However, this kind of representation requires prior information for working correlation matrices.
Suppose the prior information for correlation structure is unknown. We can use a linear representation of a complete set of basis matrices with 1 for the (i, j) and (j, i) entries and 0 elsewhere, which can handle any form of the correlation matrix. Alternatively, the basis matrices B j 's can also be obtained through an eigenvector decomposition, R −1 ≈ a 0 I+ m j=1 a j B j , where B j = e j e j is the jth basis matrix and e j is the eigenvector corresponding to the jth largest eigenvalue of the sample correlation matrix for y i . However, this will lead to the generation of many moment conditions when the cluster size is large and prior information is not provided.
If the number of moment conditions is much larger than the number of parameters, some moment conditions could be either less informative or highly correlated. This could lead to a large variability in estimating the weighting matrix C n (β) −1 and result in an unstable QIF estimator in finite samples. Moreover, if the number of moment conditions is greater than the sample size, the sample covariance matrix C n (β) in (2.3) is singular and therefore the QIF estimator is infeasible. To solve the singularity problem caused by highly overidentified moment conditions, [7,1,12] proposed to select a subset of moment conditions for parameter estimation. However, a subset selection approach may lose efficiency in parameter estimation. In the following section, we propose a new method combining all valid moment conditions optimally which is capable of achieving high efficiency in estimation.

Methodology
We first decompose a moment condition vector G n into two sets of moment conditions G n = (G n1 , G n2 ) , where G n1 could be an s-dimensional preselected moment condition vector, and G n2 are the remaining moment conditions. The preselected moment conditions are the ones which should definitely be included in the estimation. For example, in modeling an unspecified correlation structure, the first set of moment conditions in (2.3) involving the identity basis matrix should be preselected, since the moment conditions generated from any type of correlation structure always contain the one with an identity basis matrix. It is well-known that estimation efficiency can be achieved under the true correlation information. Thus, there might be a loss of estimation efficiency if the moment conditions generated from a misspecified correlation structure are selected. Note that the dimension of preselected moment conditions s is finite, and smaller than the sample size n, to avoid the singularity problem discussed in Section 2.2.
In the following development, we retain the first set of preselected moment conditions G n1 and extract important information from the remaining largedimensional moment conditions G n2 for parameter estimation. We first orthogonalize G n2 against G n1 to distinguish the contributions of the two sets of moment conditions for estimation, where the orthogonalized moment conditions . Through orthogonalization, the two moment conditions G n1 and G o n2 are no longer correlated, e.g., cov(G n1 , G o n2 ) = 0. In the second step, we reduce the dimension of G o n2 through spectral decomposition to extract most of the information from G o n2 . Specifically, we convert G o n2 into linearly uncorrelated moment conditions. It follows that the sample covariance matrix V o 2 for G o n2 can be represented as a spectral decomposition V o 2 = r j=1 λ j e j e j , where e j is the jth eigenvector of V o 2 corresponding to the jth largest eigenvalue λ j and r = p(q + 1) − s. Equivalently, the jth principal component is e j G o n2 , a linear combination of G o n2 . To reduce the dimensionality of the moment conditions G n2 , we select the first t principal components and obtain t orthogonal linear combinations of G o n2 . That is, the reduced moment conditions G * n incorporating the first set of moment conditions G n1 and t principal components are: where U is the matrix containing t eigenvectors (e 1 , . . . , e t ) , and I s and I r are identity matrices with s × s and r × r dimensions. Consequently, the QIF estimatorβ based on G * n is obtained via minimizing where V * n (β) is the sample covariance matrix of G * n (β). Note that the objective function in (3.2) can be expressed with the full moment conditions G n as Q * n = nG n T 1 T 2 V * −1 n T 2 T 1 G n , which utilizes all moment conditions with a different weighting matrix T 1 T 2 V * −1 n T 2 T 1 (denoted by V −1 n ) to capture important information, but with much lower dimension of the sample covariance matrix V * n relative to the sample size. Our method is still applicable if there are no preselected moment conditions G n1 if we do not have prior knowledge of a subset of moment conditions to be included for estimation. This will be discussed with an example in our simulation studies.
One important step is to select t such that most of the information from the moment conditions G n2 can be captured. We propose a Bayesian information type of criterion to select the number of principal components t through minimizing the objective function λ j e j e j and tr{X} is the trace of a square matrix X. Note that the first term in (3.3) measures the difference between the sample covariance matrices of the moment conditions G o n2 and t-selected linear combinations of moment conditions. The second term of (3.3) is a penalty function of both n and r to ensure an appropriate convergence rate for a consistent selection of the number of principal components. This penalty term also guarantees that the number of selected principal components is always smaller than the sample size.
The advantage of the proposed approach is that it does not require inversion of the sample covariance matrix V o 2 . This is quite critical when the dimension of moment conditions is high relative to the sample size, and the inversion of the high-dimensional covariance matrix is infeasible. Note that the proposed approach is very different from [5,4] which require the inverse of the sample covariance matrix to minimize the mean square errors.

Asymptotic properties
In this section, we provide the asymptotic properties of the proposed estimator, when the number of moment conditions and the sample size both increase. We denote β 0 as the true parameter, t 0 as the optimal number of selected principal components, The following regularity conditions are required in order to establish the asymptotic properties: Condition (C-1) states that the population moment conditions exist, and the mean-zero assumption for the estimating function g i (β) enables one to identify the true parameter β 0 . Condition (C-2) is required for the minimization of Q * n (β) in (3.2), where the parameter space is closed and bounded under (C-3). Condition (C-4) ensures that g i (β) satisfies a uniform weak law of large numbers so that the difference between the average sample moments and population moments converges in probability to zero. Condition (C-5) indicates that the asymptotic covariance matrix of our estimator exists and the eigenvalues λ j s are sufficiently small, if they are not selected as one of the principal components.
We first investigate whether the minimizing criterion J(t) in (3.3) leads to consistent estimation of the covariance matrix. The following lemmas provide the asymptotic rate of convergence for the estimated covariance matrix through a consistent selection of t 0 principal components.

Lemma 1. If the condition (C-5) holds, there exists
, where X is defined as tr(X X)/ij and i × j is the dimension of matrix X.
Lemma 1 indicates that the discrepancy (in matrix norms) between the estimated covariance matrixṼ(t 0 ) and the covariance matrix V o 2 of G o n2 converges to 0 as n → ∞. The following lemma shows that the number of principal components can be consistently selected based on the criterion J(t) in (3.3) when the sample size goes to infinity. (3.3) such that lim n→∞ Prob t = t 0 = 1.

Lemma 2. Under the condition (C-5), there exists a minimizert of J(t) in
Note that the choice of a penalty function plays an important role in selecting the number of principal components consistently. Here the penalty term in (3.3) vanishes at an appropriate rate such that the number of linear combinations of moment conditions is consistently selected with probability tending to 1. The above lemmas ensure that the proposed criterion J(t) results in consistent estimation of the covariance matrix V o 2 for moment conditions G o n2 . The following theorem provides the asymptotic normality and efficiency of the estimatorβ. Theorem 1. If regularity conditions (C-1)-(C-5) hold, there exists a minimizer β of Q * n (β) in (3.2) which has the following asymptotic properties as n → ∞.
Theorem 1 indicates that the estimatorβ is consistent and asymptotically normal. This implies that asymptotically there is no efficiency loss if the number of principle components t 0 is selected based on the proposed criterion in the new weighting matrix V * n (β) −1 in Q * n (β) enables one to combine all valid moment conditions optimally without loss of efficiency. Furthermore, the following theorem illustrates that the estimator based on any subset of moment conditions is less efficient than the one utilizing all moment conditions. In the following, we denoteβ A as the estimator based on all moment conditions, and β S as the estimator using a subset of moment conditions. Theorem 2. Under (C-1)-(C-5), the estimatorβ A is more efficient than the estimatorβ S , that is, var(a β A ) ≤ var(a β S ) for any constant vector a. The above theorem shows that higher estimation efficiency can be achieved by combining all valid moment conditions optimally. The proofs of the lemmas and theorems are provided in the Appendix.

Implementation with unbalanced data
In longitudinal studies, unbalanced are quite common as cluster size m i for the ith subject varies due to missing. If the measurements from unbalanced data are regarded as cluster data without considering the order of lag time, then the marginal mean of response μ i is a m i -dimensional vector and basis matrices B j 's are m i × m i matrices for i = 1, . . . , n. On the other hand, when the lag time between measurements is considered, we provide a strategy to implement the proposed method for unbalanced data using a transformation matrix for each subject.
Let M i be a m × m i transformation matrix of the ith subject where m = max(m 1 , . . . , m n ). The matrix M i 's are generated by deleting the columns of the m × m identity matrix corresponding to the missing measurements for the ith subject. We transform the unbalanced data to artificial balanced data us- The QIF estimator with unbalanced data is obtained based on the transformed extended score vector. Note that the estimator holds the aforementioned properties if the data is missing completely at random [15].

Continuous responses
We generate the correlated continuous response variable from a marginal model y ij = x ij β + ε ij , for i = 1, . . . , n and j = 1, . . . , m, N (0, R) and β = (β 1 , β 2 ) = (1, 1) . The repeated responses are generated with a cluster size of m = 25, 50 or 100; and the sample size ranges from n = 50 to 500. We design a simulation setting based on a threeblock diagonal correlation structure R, where the first block has a 3m 5 × 3m 5 exchangeable structure with correlation parameter 0.7, the second block has an m 5 × m 5 AR1 structure with correlation 0.6, and the third block has an m 5 × m 5 exchangeable structure with correlation 0.8. The basis matrices are obtained via an eigenvector decomposition, R −1 ≈ a 0 I + m j=1 a j B j , where B j = e j e j and e j is the eigenvector corresponding to the jth largest eigenvalue of the sample correlation matrix of y i . There are a total of m + 1 basis matrices. When n = 50, the number of moment conditions 2(m+1) exceeds the sample size for any given cluster size of 25, 50 and 100. That is, the QIF estimator constructed from moment conditions using all eigenvector bases is infeasible due to the singularity problem if the inverse of the sample covariance matrix of the moment conditions is used for the weighting matrix in (2.4).
We compare the performance of the proposed method to the GEE estimators under two types of working correlation structures: exchangeable correlation structure (GEE EX ) and AR1 correlation structure (GEE AR1 ) based on 200 simulations. Here we suppose that the first set of moment conditions containing the identity basis matrix are preselected moment conditions (QIF PC1 ). To illustrate the importance of utilizing all valid moment conditions using a consistent weighting matrix, we perform QIF parameter estimations based on all valid moment conditions with the identity weighting matrix (QIF I ), and a subset of moment conditions (QIF Sub ). In addition, we compare all these estimators with the GEE estimator using the true correlation structure, denoted as the oracle estimator. In practice, the oracle estimator cannot be achieved since the true correlation structure is unknown.
To illustrate estimation efficiency, we define the mean squared error mse(β) = is the estimator from the ith simulation, β 0 is the true parameter, and · denotes the Euclidean-norm. Figure 1 provides the mean squared errors of the estimators corresponding to various cluster sizes and sample sizes. In addition, Table 1 provides the means and standard errors of the estimators, and the ratio of the mean squared error obtained from other approaches to the mean squared error from the proposed method (QIF PC1 ).
Our simulations show that the proposed method is superior compared to the GEE under exchangeable and AR1 correlation structures, the QIF using the identity weighting matrix, and a subset of moment conditions in terms of the standard errors and the mean squared errors of the estimators. Specifically, Figure 1 indicates that the mean squared errors of the proposed method's estimators decrease and are closer to those of the oracle estimator as the cluster size increases when the sample size is 50, while the mean squared errors of the GEE approach increase. The relatively low efficiency of the GEE estimator can be explained in that the GEE is inefficient under the misspecified working correlation structure. In contrast, the proposed method is able to improve the finite sample performance of the QIF estimator with a small loss of efficiency.
When the sample size increases to 500, the mean squared errors of the pro- posed method and the oracle estimator are the same regardless of the cluster size. On the other hand, the QIF using a subset of moment conditions is not able to recover estimation efficiency, as a subset selection approach fails to capture information from the remaining unselected moment conditions. For the QIF approach using the identity weighting matrix, the weighting matrix is not optimal and therefore has a clear loss in efficiency. The GEE estimators with exchangeable or AR1 working correlations also have poor performance with more than eight times the mean squared errors of the proposed method when the cluster size m = 100. We also apply the proposed approach assuming that there are no preselected moment conditions where G n1 in (3.1) is an empty set (QIF PC2 ). When the sample size is small, the resulting estimator is not as efficient as QIF PC1 , although it still outperforms the existing methods. However, estimation efficiency of QIF PC2 can be achieved with a large sample size n = 500. We further investigate whether the BIC criterion selects the optimal number of principal components in a finite sample. Figure 2 illustrates the mean squared errors of the QIF approach using the reduced moment condition conditions G * n generated by t principal components when t varies between 0 and 30 for n = 50 and m = 25, 50 and 100. Figure 2 shows that the minimum of the MSE of the  QIF estimator is slightly below the one selected by the BIC when m = 25. However, when the cluster size m increases to 50 and 100, the MSE of the QIF estimator based on the BIC reaches the minimum. This indicates that the BIC criterion is quite effective in selecting the number of moment conditions if the cluster size and sample size are moderately large.

Binary responses
We also conduct simulation studies with correlated binary responses, where the covariates are x ij , x (2) ij and β = (β 1 , β 2 ) = (0.5, −0.5) . We choose the sample sizes to be 50 and 500, and the cluster sizes to be 25, 50 and 100 from 200 simulations, respectively. The R package mvtBinaryEP is implemented to generate the correlated binary responses with three-block exchangeable correlation matrices. The dimensions for each block are m 5 × m 5 , 3m 5 × 3m 5 and m 5 × m 5 respectively, and the correlation coefficients are ρ = (0.8, 0.5, 0.7).
Similar to the continuous case, we compare the proposed method with the QIF based on the identity weighting matrix, a subset of moment conditions, and the GEE approach with two working correlation matrices. Simulation results with various cluster sizes and sample sizes are reported in Table 2 and Figure 1, which confirms that the proposed method combining all moment conditions outperforms the other methods.
When the sample size increases, Table 2 indicates that the ratio of the mean squared error of the oracle estimators to the mean squared error for the proposed approach is closer to 1. Moreover, Figure 1 shows that the proposed method provides more efficient estimation as the cluster size increases. However, this does not hold for the GEE method and the QIF approaches based on the identity weighting matrix or a subset of moment conditions even when the sample size reaches 500. The simulation results from binary responses are quite comparable to those reported for the continuous responses.

Fortune 500 data example
We apply Fortune 500 data between 2000 and 2010 to illustrate the proposed approach. The 136 largest US corporations were ranked among the Global 500 in 2010, and 105 of these companies have been ranked over 11 consecutive years in the Fortune 500 data. Therefore we choose the sample size as 105 with an equal cluster size of 11. For this data, we apply a log-linear model based on the employee demand equation, where the response variable is the number of employees (Employees) from each firm, and corresponding covariates of interest are the revenue and the assets. The log-linear model is formulated as follows: log(Employees) ij = β 0 + β 1 log(Revenue) ij + β 2 log(Assets) ij + ε ij for i = 1, . . . , 105 and j = 1, . . . , 11, where log(Employees) ij , log(Revenue) ij and log(Assets) ij are the log of the employees, the revenue and the assets for the firm i at the jth year respectively.
Through an eigenvector decomposition of the sample correlation matrix for the response, a total of 12 basis matrices are generated as R −1 ≈ a 0 I+ 11 j=1 a j B j , where B j = e j e j and e j is the eigenvector corresponding to the jth largest eigenvalue of its sample correlation matrix. Therefore a total of 36 valid moment conditions are constructed for parameter estimation. We implement the proposed method, and compare it with the QIF using the identity weighting matrix and a subset of moment conditions, and the GEE method with exchangeable and AR1 working correlation structures. Note that the sample covariance matrix from all available moment conditions is not applicable for the weighting matrix, since it is nearly singular due to high collinearity among some of the moment conditions. Table 3 provides the parameter estimators, the standard errors of the corresponding estimators, Z test statistics and the p-values. In general, the estimators obtained by the proposed method are the most sensible compared to other approaches. Specifically, the coefficients of log(Revenue) and log(Assets) from the proposed method are all positive, implying that the response variable of the number of employees and the predictive variables of revenue and assets are positively associated, with the corresponding p-values all less than 0.001. On the other hand, the p-value of the log(Revenue) using the GEE under the AR1 working correlation is insignificant (p-value=0.252). The QIF using the identity weighting matrix and a subset of moment conditions produces more extreme coefficient estimators for log(Revenue) and intercept compared to the other approaches, and negative coefficients of log(Assets) with insignificant p-values. This data example confirms that the proposed method utilizing all available moment conditions with a consistent weighting matrix provides better interpretable estimations.

Discussion
We propose an efficient and stable QIF estimation procedure through combining all available moment conditions when the dimension of moment conditions is large compared to the sample size. The proposed procedure utilizes a set of preselected moment conditions in addition to optimal linear combinations of remaining moment conditions through principle component analysis. The new approach allows one to reduce the dimensionality of moment conditions, while retaining most of the important information from all valid moment conditions. This is very different from existing approaches which obtain information from a subset of moment conditions only. The performance of the QIF approach relies on selecting the number of principal components accurately, which is essential for estimation efficiency. We provide a new objective function based on the Bayesian information type of criterion. This selects the optimal number of principle components consistently and leads to desirable asymptotic properties such as consistency, asymptotic normality and efficiency for the proposed estimator. We also try other selection criteria such as the AIC, the corrected AIC and the corrected RIC. However, our simulation studies, which are not provided here, indicate that the BIC provides better performance than the other selection criteria in that the BIC selects more accurate t 0 and leads to more efficient parameter estimation. This is because the other criteria tend to over-select the number of principal components, which leads to a loss of estimation efficiency.
In recent years, estimating the inverse of the high-dimensional covariance matrix has become increasingly important due to the rise of big data. The proposed method can also be applied in choosing an appropriate rank for a singular or nearly singular covariance matrix. This is quite useful in low-rank approximation for high-dimensional matrix problems, which has wide applications such as in data compression, large-dimensional matrix operations, recommender systems and machine learning.
where V 21 = cov{G n1 (β), G n2 (β)} and V 11 = cov{G n1 (β)}. Through the orthogonalization, cov{G n1 (β), G o n2 (β)} = 0. We denote G o n (β) = G n1 (β) , G o n2 (β) . The estimatorβ A is obtained by minimizing G n (β) V −1 G n (β), which is equivalent to G o n (β) V o−1 G o n (β), where V = cov{G n (β)} and V o = cov{G o n (β)}. The information matrix ofβ A is proportional tȯ On the other hand, we obtainβ S by minimizing G n1 (β) V −1 11 G n1 (β), which utilizes the first set of moment conditions. Since the estimatorβ S converges to β 0 , we haveĠ Considering that V o 22 in (5.6) is a non-negative definite weighting matrix, it consequently follows from (5.6) and (5.7) thaṫ in the sense of Loewner ordering. Therefore, the efficiency of the estimatorβ A is improved by utilizing all moment conditions instead of using some of the moment conditions.