Abstract
We consider the strong consistency of a log-likelihood-based information criterion in a normality-assumed canonical correlation analysis between q- and p-dimensional random vectors for a high-dimensional case such that the sample size n and number of dimensions p are large but p/n is less than 1. In general, strong consistency is a stricter property than weak consistency; thus, sufficient conditions for the former do not always coincide with those for the latter. We derive the sufficient conditions for the strong consistency of this log-likelihood-based information criterion for the high-dimensional case. It is shown that the sufficient conditions for strong consistency of several criteria are the same as those for weak consistency obtained by Yanagihara et al. (J. Multivariate Anal. 157, 70–86: 2017).
Similar content being viewed by others
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. Akadémiai Kiadó, Budapest, Akaike, H. (ed.), p. 995–1010.
Akaike, H. (1974). A new look at the statistical model identification. Institute of Electrical and Electronics Engineers Transactions on Automatic Control AC-19, 716–723.
Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52, 345–370.
Fukui, K (2015). Consistency of log-likelihood-based information criteria for selecting variables in high-dimensional canonical correlation analysis under nonnormality. Hiroshima Math. J. 45, 175–205.
Fujikoshi, Y (1982). A test for additional information in canonical correlation analysis. Ann. Inst. Statist. Math. 34, 523–530.
Fujikoshi, Y (1985). Selection of variables in discriminant analysis and canonical correlation analysis. North-Holland, Fujikoshi, Y. (ed.), p. 219–236.
Fujikoshi, Y., Ulyanov, V. V. and Shimizu, R. (2010). Multivariate statistics: High-dimensional and large-sample approximations. John Wiley & Sons Inc., Hoboken.
Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an autoregression. J. Roy. Statist. Soc. Ser. B 26, 270–273.
McKay, R. J. (1977). Variable selection in multivariate regression: an application of simultaneous test procedures. J. Roy. Statist. Soc., Ser. B 39, 371–380.
Nishii, R., Bai, Z. D. and Krishnaiah, P. R. (1988). Strong consistency information criterion for model selection in multivariate analysis. Hiroshima Math. J. 18, 451–462.
Oda, R. and Yanagihara, H. (2019). A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables. TR No 19–1, Statistical Research Group, Hiroshima University.
Ogura, T. (2010). A variable selection method in principal canonical correlation analysis. Comput. Statist. Data Anal. 54, 1117–1123.
Srivastava, M. S. (2002). Methods of multivariate statistics. Wiley, New York.
Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6, 461–464.
Timm, N. H. (2002). Applied multivariate analysis. Springer-Verlag, New York.
Yanagihara, H., Oda, R., Hashiyama, Y. and Fujikoshi, Y. (2017). High-Dimensional asymptotic behaviors of differences between the log-determinants of two Wishart matrices. J. Multivariate Anal. 157, 70–86.
Acknowledgments
The authors would like to thank the reviewers for valuable comments. Ryoya Oda was supported by a Research Fellowship for Young Scientists from the Japan Society for the Promotion of Science, #18J12123. Hirokazu Yanagihara and Yasunori Fujikoshi were partially supported by Grants-in-Aid for Scientific Research (C) from the Ministry of Education, Science, Sports, and Culture, #18K03415 and #16K00047, respectively.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Lemma 1
From the assumption of Lemma 1, the following reductions can be derived:
Hence, we have
This completes the proof of Lemma 1.
Appendix B: Proof of Lemma 2
Let us take an arbitrary ε > 0, and let k be a natural number such that k > (2ε)− 1. By using Markov’s inequality, for all δ > 0, we have
Then, since p = p(n) and k > (2ε)− 1, it holds that \(\sum ^{\infty }_{n=1}p^{-2k\varepsilon }<\infty \) and \(\sum ^{\infty }_{n=1}n^{-1-\varepsilon }<\infty \). These equations and the Borel-Cantelli lemma complete the proof of Lemma 2.
Appendix C: Proof of Theorem 1
To prove Theorem 1, we use three lemmas from Yanagihara et al. (2017) and Oda and Yanagihara (2019). Before Lemma 3 is introduced, let Q be an n × (n − 1) matrix satisfying \(\boldsymbol {I}_{n}-n^{-1}\boldsymbol {1}_{n}\boldsymbol {1}^{\prime }_{n}=\boldsymbol {Q}\boldsymbol {Q}^{\prime }\) and Q′Q = In− 1. Further, let X = (x1,…, xn)′, where xi is the i-th individual from x. The following lemma is Lemma C.1 by Yanagihara et al. (2017).
Lemma C.1.
For a subset \(j \in \mathcal {J}\), let \(\boldsymbol {\mathcal {E}}\),Aj, andBj be mutually independent random matrices, which are distributed according to
where \(\boldsymbol {\mathcal {E}}\) andB are independent and do not rely on j, and Bj : (n − 1) × qj. Then, we have
whereP = B(B′B)− 1B′, \(\boldsymbol {P}_{j}=\boldsymbol {B}_{j}(\boldsymbol {B}^{\prime }_{j}\boldsymbol {B}_{j})^{-1}\boldsymbol {B}^{\prime }_{j}\), and Γj is defined in Eq. 3.3.
The following lemma is given by using (23) and (B.6) in Yanagihara et al. (2017).
Lemma C.2.
For a subset \(j \in \mathcal {J}\), let U1 and U2 be independent random matrices distributed according to
Further, let W1 and W2 be random matrices distributed according to
Then, we have
where δj is defined in Eq. 3.3.
The following lemma is Lemma C.2 in Oda and Yanagihara (2019).
Lemma C.3.
Suppose that N − 4k > 0 for \(k\in \mathbb {N}\). Let u and v be independent random variables distributed according tou ∼ χ2(N) andv ∼ χ2(p). Then, we have
First, we consider the case of \(j \in \mathcal {J}_{+}\backslash \{j_{*}\}\). The distinct elements of \(j\cap \bar {j}_{*}\) denote \(a_{1},\ldots ,a_{q_{j}-q_{*}}\). Let j0 = j, ji = ji− 1∖{ai} (1 ≤ i ≤ qj − q∗). Then, \(j_{q_{j}-q_{*}}=j\) holds, and we can express LLIC(j) −LLIC(j∗) as follows:
Then, from Lemma C.1, \(\boldsymbol {S}_{yy\cdot j_{i-1}}\) can be expressed as follows:
where \(\boldsymbol {\mathcal {E}} \sim N_{(n-1)\times p}(\boldsymbol {O}_{n-1,p},\boldsymbol {I}_{p}\otimes \boldsymbol {I}_{n-1})\), \(\boldsymbol {P}_{j_{i-1}}=\boldsymbol {B}_{j_{i-1}}(\boldsymbol {B}^{\prime }_{j_{i-1}}\boldsymbol {B}_{j_{i-1}})^{-1}\boldsymbol {B}^{\prime }_{j_{i-1}}\), \(\boldsymbol {B}_{j_{i-1}}\sim N_{(n-1)\times (q_{j}-i+1)}(\boldsymbol {O}_{n-1,q_{j}-i+1},\boldsymbol {\varSigma }_{j_{i-1}j_{i-1}}\otimes \boldsymbol {I}_{n-1})\), and \(\boldsymbol {\mathcal {E}}\) is independent of \(\boldsymbol {B}_{j_{i-1}}\). Moreover, by applying Lemma C.1 to \(\boldsymbol {S}_{yy\cdot j_{i}}\), we have
where \(\boldsymbol {P}_{j_{i}}=\boldsymbol {B}_{j_{i}}(\boldsymbol {B}^{\prime }_{j_{i}}\boldsymbol {B}_{j_{i}})^{-1}\boldsymbol {B}^{\prime }_{j_{i}}\), and \(\boldsymbol {B}_{j_{i}}\) is the (n − 1) × (qj − i) sub matrix of \(\boldsymbol {B}_{j_{i-1}}=(\boldsymbol {B}_{j_{i}},\boldsymbol {b}_{j_{i}})\). Let
Since \((\boldsymbol {I}_{n-1}-\boldsymbol {P}_{j_{i-1}})(\boldsymbol {P}_{j_{i-1}}-\boldsymbol {P}_{j_{i}})=\boldsymbol {O}_{n-1,n-1}\) holds, we observe that Vi,1 and Vi,2 are independent, and Vi,1 ∼ Wp(n − qj + i − 2, Ip), Vi,2 ∼ Wp(1, Ip) from a property of the Wishart distribution and Cochran’s Theorem (see, e.g., Fujikoshi et al. (2010), Theorem 2.4.2). By using Eqs. C.2, C.3, and C.4, we have
Since Vi,2 ∼ Wp(1, Ip), we can express \(\boldsymbol {V}_{i,2}=\boldsymbol {v}_{i}\boldsymbol {v}^{\prime }_{i}\), where vi ∼ Np(0p, Ip) and vi is independent of Vi,1. Then, Eq. C.5 is calculated as
Let \(\tilde {v}_{i}=||\boldsymbol {v}_{i}||^{2}\) and \(\tilde {u}_{i}=\left (||\boldsymbol {v}_{i}||^{-1}\boldsymbol {v}^{\prime }_{i}\boldsymbol {V}^{-1}_{i,1}\boldsymbol {v}_{i}||\boldsymbol {v}_{i}||^{-1}\right )^{-1}\). Then, from a property of the Wishart distribution (see, e.g., Fujikoshi et al. (2010), Theorem 2.3.3), we see that \(\tilde {v}_{i}\) and \(\tilde {u}_{i}\) are independent, and \(\tilde {v}_{i} \sim \chi ^{2}(p)\) and \(\tilde {u}_{i} \sim \chi ^{2}(n-p-q_{j}+i-1)\). Then, Eq. C.6 is expressed as
From Lemma C.3, by applying Eq. 3.1 in Lemma 2 to the above equation, for all ε satisfying 0 < ε ≤ 1/2, the following equation can be derived:
From the above equation, we have
Therefore, from Eqs. C.1 and C.7, we can expand p− 1{LLIC(j) −LLIC(j∗)} as follows:
Next, we consider the case of \(j \in \mathcal {J}_{-}\). By using Lemma C.2, we have
where U1, U2, W1, and W2 are defined in Lemma C.2. Let
From a property of the Wishart distribution, we observe that \(\tilde {\boldsymbol {U}}\) and U2 are independent and \(\tilde {\boldsymbol {U}} \sim W_{q-q_{j}}(n-p-q_{j}-1,\boldsymbol {I}_{q-q_{j}})\). Then, Eq. C.9 is expressed as
By a simple calculation, we can note that \(E[||\tilde {\boldsymbol {U}}-E[\tilde {\boldsymbol {U}}]||^{4}]=O(n^{2})\), \(E[||\boldsymbol {U}_{2}\boldsymbol {U}^{\prime }_{2}-E[\boldsymbol {U}_{2}\boldsymbol {U}^{\prime }_{2}]||^{4}]=O(p^{2})\), and E[||W1 − E[W1]||4] = O(n2). Hence, we can apply Eq. 3.2 in Lemma 2 to \(\tilde {\boldsymbol {U}}\), \(\boldsymbol {U}_{2}\boldsymbol {U}^{\prime }_{2}\), W1, and W2. From Taylor expansion, for all δ satisfying 0 < δ < 1/4, the following equations can be derived:
Therefore, from Eqs. C.7 and C.13, we can expand n− 1{LLIC(j) −LLIC(j∗)} as follows:
Lemma 1, Eqs. C.8 and C.14 complete the proof of Theorem 1.
Appendix D: Proof of Corollary 1
First, we derive Condition C1′ from Condition C1. When \(j \in \mathcal {J}_{+}\backslash \{j_{*}\}\), the distinct elements of \(j\cap \bar {j}_{*}\) denote \(a_{1},\ldots ,a_{q_{j}-q_{*}}\) in the same way as the proof of Theorem 1. Let j0 = j, ji = ji− 1∖{ai} (1 ≤ i ≤ qj − q∗). Then, we have
Since qj − q∗ > 0, it follows from Condition C1 and the above equation that Condition C1′ is derived.
Next, we derive Condition C2′ from Condition C2. When \(j \in \mathcal {J}_{-}\), let j+ = j ∪ j∗, and let k0 = j, ki = ki− 1∖{bi} \((1 \leq i \leq q_{j_{*}\cap \bar {j}})\) and s0 = j∗, si = si− 1∖{ci} \((1 \leq i \leq q_{j\cap \bar {j}_{*}})\), where \(b_{1},\ldots ,b_{q_{j_{*}\cap \bar {j}}}\) and \(c_{1},\ldots ,c_{q_{j\cap \bar {j}_{*}}}\) are the distinct elements of \(j_{*}\cap \bar {j}\) and \(j\cap \bar {j}_{*}\), respectively. Then, we have
Therefore, Condition C2 and the above equations complete Condition C2′.
Rights and permissions
About this article
Cite this article
Oda, R., Yanagihara, H. & Fujikoshi, Y. Strong Consistency of Log-Likelihood-Based Information Criterion in High-Dimensional Canonical Correlation Analysis. Sankhya A 83, 109–127 (2021). https://doi.org/10.1007/s13171-019-00174-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-019-00174-3
Keywords and phrases.
- Canonical correlation analysis
- High-dimensional asymptotic framework
- Strong consistency
- Variable selection