Abstract
Probabilistic graphical models enjoy great popularity in a wide range of domains due to their ability to model the conditional dependency relationships among random variables. This paper explores the structure learning for the exponential family graphical model with false discovery rate (FDR) control. Most existing FDR-controlled structure learning procedures have been designed for the Gaussian graphical model (GGM). A systematic approach for more general exponential family graphical models is still lacking. In this paper, we introduce a unified procedure to learn the structure of the exponential family graphical model with FDR control utilizing the symmetrized data aggregation (SDA) technique via sample splitting, data screening, and information pooling. We show that our method controls FDR asymptotically under some mild conditions. Extensive simulation results and two real-data examples validate the effectiveness of our method.
Similar content being viewed by others
References
Allen, G. I., & Liu, Z. (2012). A log-linear graphical model for inferring genetic networks from high-throughput sequencing data. In 2012 IEEE International Conference on Bioinformatics and Biomedicine (pp. 1–6). IEEE.
Allen, G. I., & Liu, Z. (2013). A local Poisson graphical model for inferring networks from sequencing data. IEEE Transactions on Nanobioscience, 12(3), 189–198.
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Barber, R. F., & Candès, E. J. (2019). A knockoff filter for high-dimensional selective inference. The Annals of Statistics, 47(5), 2504–2537.
Barber, R. F., & Drton, M. (2015). High-dimensional ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9(1), 567–607.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300.
Bühlmann, P., & Mandozzi, J. (2014). High-dimensional variable screening and bias in subsequent inference, with an empirical comparison. Computational Statistics, 29(3), 407–430.
Cai, T., Li, H., Ma, J., et al. (2019). Differential Markov random field analysis with an application to detecting differential microbial community networks. Biometrika, 106(2), 401–416.
Cai, T., Liu, W., & Luo, X. (2011). A constrained \(\ell _1\) minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494), 594–607.
Cheng, J., Li, T., Levina, E., et al. (2017). High-dimensional mixed graphical models. Journal of Computational and Graphical Statistics, 26(2), 367–378.
Chen, X., & Liu, W. (2019). Graph estimation for matrix-variate gaussian data. Statistica Sinica, 29(1), 479–504.
d’Aspremont, A., Banerjee, O., & El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM Journal on Matrix Analysis and Applications, 30(1), 56–66.
Drton, M., & Maathuis, M. H. (2017). Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 4, 365–393.
Drton, M., & Perlman, M. D. (2007). Multiple testing and error control in gaussian graphical model selection. Statistical Science, 22(3), 430–449.
Du, L., Guo, X., Sun, W., et al. (2021). False discovery rate control under general dependence by symmetrized data aggregation. Journal of the American Statistical Association, 1–15.
Fan, J., Feng, Y., & Wu, Y. (2009). Network exploration via the adaptive lasso and scad penalties. The Annals of Applied Statistics, 3(2), 521.
Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for gaussian graphical models. Advances in Neural Information Processing Systems, 23, 1–9.
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441.
He, Y., Zhang, X., Wang, P., et al. (2017). High dimensional gaussian copula graphical model with FDR control. Computational Statistics & Data Analysis, 113, 457–474.
Höfling, H., & Tibshirani, R. (2009). Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. Journal of Machine Learning Research, 10(4), 883–906.
Jeon, M., Jin, I. H., Schweinberger, M., et al. (2021). Mapping unobserved item-respondent interactions: A latent space item response model with interaction map. Psychometrika, 86(2), 378–403.
Keener, R. W. (2010). Theoretical statistics: Topics for a core course. Springer.
Kouros-Mehr, H., Slorach, E. M., Sternlicht, M. D., et al. (2006). Gata-3 maintains the differentiation of the luminal cell fate in the mammary gland. Cell, 127(5), 1041–1055.
Lam, C., & Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics, 37(6B), 4254.
Lauritzen, S. L. (1996). Graphical models (Vol. 17). Clarendon Press.
Lee, J. D., & Hastie, T. J. (2015). Learning the structure of mixed graphical models. Journal of Computational and Graphical Statistics, 24(1), 230–253.
Lee, S., Sobczyk, P., & Bogdan, M. (2019). Structure learning of gaussian Markov random fields with false discovery rate control. Symmetry, 11(10), 1311.
Lehmann, E. L., & Casella, G. (2006). Theory of point estimation. Springer Science & Business Media.
Li, J., & Maathuis, M. H. (2021). GGM knockoff filter: False discovery rate control for gaussian graphical models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(3), 534–558.
Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control. The Annals of Statistics, 41(6), 2948–2978.
Liu, W., & Shao, Q. M. (2014). Phase transition and regularized bootstrap in large-scale \(t\)-tests with false discovery rate control. The Annals of Statistics, 42(5), 2003–2025.
Liu, H., & Wang, L. (2017). Tiger: A tuning-insensitive approach for optimally estimating gaussian graphical models. Electronic Journal of Statistics, 11(1), 241–294.
Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462.
Meinshausen, N., Meier, L., & Bühlmann, P. (2009). \(P\)-values for high-dimensional regression. Journal of the American Statistical Association, 104(488), 1671–1681.
Natali, P., Nicotra, M., Sures, I., et al. (1992). Breast cancer is associated with loss of the c-kit oncogene product. International Journal of Cancer, 52(5), 713–717.
Ravikumar, P., Wainwright, M. J., & Lafferty, J. D. (2010). High-dimensional ising model selection using \(\ell _1\)-regularized logistic regression. The Annals of Statistics, 38(3), 1287–1319.
Ravikumar, P., Wainwright, M. J., Raskutti, G., et al. (2011). High-dimensional covariance estimation by minimizing \(l_1\)-penalized log-determinant divergence. Electronic Journal of Statistics, 5, 935–980.
Ross-Innes, C. S., Stark, R., Teschendorff, A. E., et al. (2012). Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature, 481(7381), 389–393.
Rothman, A. J., Bickel, P. J., Levina, E., et al. (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2, 494–515.
Salesse, S., Odoul, L., Chazée, L., et al. (2018). Elastin molecular aging promotes mda-mb-231 breast cancer cell invasiveness. FEBS Open Bio, 8(9), 1395–1404.
Sun, T., & Zhang, C. H. (2013). Sparse matrix inversion with scaled lasso. The Journal of Machine Learning Research, 14(1), 3385–3418.
Teng, Y. H. F., Tan, W. J., Thike, A. A., et al. (2011). Mutations in the epidermal growth factor receptor (EGFR) gene in triple negative breast cancer: Possible implications for targeted therapy. Breast Cancer Research, 13(2), 1–9.
Van De Geer, S. A., & Bühlmann, P. (2009). On the conditions used to prove oracle results for the lasso. Electronic Journal of Statistics, 3, 1360–1392.
Wasserman, L., & Roeder, K. (2009). High dimensional variable selection. The Annals of Statistics, 37(5A), 2178.
Xia, Y., Cai, T., & Cai, T. T. (2015). Testing differential networks with applications to the detection of gene-gene interactions. Biometrika, 102(2), 247–266.
Xue, L., Zou, H., & Cai, T. (2012). Nonconcave penalized composite conditional likelihood estimation of sparse ising models. The Annals of Statistics, 40(3), 1403–1429.
Yang, E., Allen, G., Liu, Z., et al. (2012). Graphical models via generalized linear models. Advances in Neural Information Processing Systems, 25, 1–9.
Yang, E., Baker, Y., & Ravikumar, P., et al. (2014). Mixed graphical models via exponential families. In Artificial Intelligence And Statistics, PMLR (pp. 1042–1050).
Yang, E., Ravikumar, P., Allen, G. I., et al. (2015). Graphical models via univariate exponential family distributions. The Journal of Machine Learning Research, 16(1), 3813–3847.
Yang, E., Ravikumar, P. K., Allen, G. I., et al. (2013). Conditional random fields via univariate exponential families. Advances in Neural Information Processing Systems, 26, 1–9.
Yu, L., Kaufmann, T., & Lederer, J. (2021). False discovery rates in biological networks. In International Conference on Artificial Intelligence and Statistics, PMLR (pp. 163–171).
Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 11, 2261–2286.
Yuan, M., & Lin, Y. (2007). Model selection and estimation in the gaussian graphical model. Biometrika, 94(1), 19–35.
Zhang, Y., Duchi, J. C., & Wainwright, M. J. (2013). Communication-efficient algorithms for statistical optimization. The Journal of Machine Learning Research, 14(1), 3321–3363.
Zhang, R., Ren, Z., Celedón, J. C., et al. (2021). Inference of large modified Poisson-type graphical models: Application to RNA-SEQ data in childhood atopic asthma studies. The Annals of Applied Statistics, 15(2), 831–855.
Zhao, T., & Liu, H. (2014). Calibrated precision matrix estimation for high-dimensional elliptical distributions. IEEE Transactions on Information Theory, 60(12), 7874–7887.
Zhao, T., Liu, H., Roeder, K., et al. (2012). The huge package for high-dimensional undirected graph estimation in r. The Journal of Machine Learning Research, 13(1), 1059–1062.
Zheng, Z., Zhou, J., Guo, X., et al. (2018). Recovering the graphical structures via knockoffs. Procedia Computer Science, 129, 201–207.
Acknowledgements
The authors are grateful to the editor, the associate editor, and two anonymous referees for their comments that have greatly improved this paper. The first two authors contributed equally to this paper. This research was supported by National Key R &D Program of China (Grant Nos. 2022ZD0114801, 2019YFC1908502, 2022YFA1003703, 2022YFA1003802, 2022YFA1003803) and NNSF of China Grants (Nos. 12071233, 11925106, 12231011, 11931001 and 11971247).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: The proof of Theorem 1
Recall that
Let \(q_{0n}=|\mathcal {S}\cap E^c|\), then \(q_{0n}<p\bar{q}\). Let \(\tilde{\Phi }(x)=1-\Phi (x)\), \(G(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (W_{rt}\ge s\mid {\mathcal D}_{1})\), \(G_{-}(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (W_{rt}\le -s\mid {\mathcal D}_{1})\) and \(G^{-1}(y)=\inf \{s\ge 0: G(s)\le y\}\) for \(0\le y\le 1\). In order to prove Theorem 1, we introduce an intermediate quantity \(\tilde{W}_{rt}\), it is constructed as follows: define \(\tilde{\varvec{\theta }}_r^{(2)}={\varvec{\theta }}_r^{*}-\left\{ \Upsilon \left( \varvec{\theta }_r^{*}\right) \right\} ^{-1} \nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right)\). For \(1\le r\le p,\) we define \(T_{1rt}=\sqrt{n_1} \hat{\theta }_{rt}^{(1)} / {{\hat{\sigma }}}_{2rt}\) and \(\tilde{T}_{2rt}=\sqrt{n_2} \tilde{\theta }_{rt}^{(2)} /{{\hat{\sigma }}}_{2rt}\), and \(\tilde{W}^{1}_{rt}=T_{1rt}\tilde{T}_{2rt}\), then obtain \(\tilde{W}_{rt}\) by the AND rule similar to formula (8). We first establish the symmetry property and uniform convergence of \(\tilde{W}_{rt}\) under the null and then show the distance of \(W_{rt}\) and \(\tilde{W}_{rt}\) is negligible, thus proving Theorem 1. For \(\tilde{W}_{rt},\) we accordingly define \(\tilde{G}(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (\tilde{W}_{rt}\ge s\mid {\mathcal D}_{1})\), \(\tilde{G}_{-}(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (\tilde{W}_{rt}\le -s\mid {\mathcal D}_{1})\) and \(\tilde{G}^{-1}(y)=\inf \{s\ge 0: \tilde{G}(s)\le y\}\) for \(0\le y\le 1\). We first provide a lemma that establishes the uniform bounds for \(\tilde{\theta }_{rt}^{(2)}\), which is the counterpart of Lemma S.8 in Du et al. (2021).
Lemma 1
Suppose Conditions 1–6 hold, then for \(C>4\), as \(n\rightarrow \infty\), we have,
Proof
Let \(\mathcal {B}=\left\{ \max _{1 \le r \le p}\max _{ 1\le i\le n_2}\left\| \left\{ {\Upsilon }\left( {\varvec{\theta }}_r^{*}\right) \right\} ^{-1} \nabla \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r;X\right) \right\| _{\infty } \le m_{n}\right\}\), where \(m_{n}=(n_2p\bar{q})^{1 / \varpi +\gamma } K_{n_2}\) for some small \(\gamma >0\). By Condition 6 and Markov inequality, \(\Pr \left( \mathcal {B}^{c}\right) \le 1 / p\bar{q}\). Let \(x=\sqrt{C \log p\bar{q} / n_{2}}\). Conditioned on the post-selection model \(\mathcal {S}=\bigcup _{r=1}^{p}\mathcal {S}_r\) and \(\mathcal {B}\), the Bernstein inequality in Lemma 9 yields that
holds uniformly for \(1\le r\le p\) and \(t\in \mathcal {S}_r\), where we use the fact that \(m_{n} \sqrt{\log p\bar{q} / n_2} \rightarrow 0\). The lemma is proved. \(\square\)
The next lemma establishes the symmetry property of \(\tilde{W}_{rt}\), and it is the counterpart of Lemma S.1 in Du et al. (2021).
Lemma 2
Suppose Conditions 1–6 hold. Define a sequence \(a_{p}\) satisfying \(a_{p} / \bar{q} \rightarrow \infty\) and \(a_{p}=o(p \bar{q})\). Then for any \(0 \leqslant s \leqslant \tilde{G}_{-}^{-1}(a_{p} /(p \bar{q}))\), we have
The proof of Lemma 2 mainly utilizes Lemmas 1 and 10, and its proof is similar to the proof of Lemma S.1 in Du et al. (2021) and thus is omitted.
The next lemma characterizes the uniform convergence of \((p\bar{q})^{-1} \sum _{(r,t)\in E^c} \mathbb {I}(\tilde{W}_{rt} \ge s)\) and \((p \bar{q})^{-1} \sum _{(r,t)\in E^c} \mathbb {I}(\tilde{W}_{rt} \le -s)\).
Lemma 3
Suppose Conditions 1–8 hold. For any sequence \(a_{p}\) satisfying \(a_{p} / \bar{q} \rightarrow \infty\) and \(a_{p}=o(p \bar{q})\), we have
and
Under the Condition 8, Lemma 3 is easily proved using the technique in Du et al. (2021) and thus its proof is omitted. Lemmas 2 and 3 respectively establish the symmetry property and uniform consistency for \(\tilde{W}_{rt}\)’s. In the following, we show that the distance between \(W_{rt}\) and \(\tilde{W}_{rt}\) is small. To show the distance between \(W_{rt}\) and \(\tilde{W}_{rt}\) is negligible, we need the following lemmas. For the rth node (\(1\le r\le p\)), define the event
where \(\delta _{\rho _r}=\min \left\{ \rho _r,\rho _rl_{-} /(4 L_r)\right\}\). And we further define the event \(\xi =\bigcap _{r=1}^{p}\xi _r\).
Lemma 4
Suppose Conditions 3–6 hold. Under the event \(\xi _r\), we have,
Proof
This lemma follows exactly the same as Lemma 6 in Zhang et al. (2013), and thus its proof is omitted. \(\square\)
Lemma 5
Suppose Conditions 3–6 hold. Under the event \(\xi _r\), we have
Proof
By the integral form of Taylor’s expansion, we have
where \(\textbf{H}_{n}=\int _{0}^{1} \nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}+u\left( \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }_r^{*}\right) \right) du\). Then simple linear algebra yields
where \(\textbf{U}_{n}=\textbf{H}_{n}-\nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r\right)\) and \(\textbf{V}_{n}=\nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r\right) -\Upsilon \left( \varvec{\theta }^{*}_r\right)\). Then, the claimed expansion is a direct consequence of Lemma 4 and Conditions 5–6. \(\square\)
Lemma 6
Suppose Conditions 1–6 hold. Under the event \(\xi =\bigcap _{r=1}^{p}\xi _r\), we have,
holds uniformly for \(1\le r\le p\) and \(t\in \mathcal {S}_r\), where, \(a_{n} /\left( \bar{q}^{3 / 2} \log p\bar{q} / \sqrt{n}\right) \rightarrow \infty .\)
Proof
By the Bernstein inequality and Conditions 5–6, we can show that \(\mathop {\max }\nolimits _{1\le r \le p}\mathop {\max }\nolimits _{t\in \mathcal {S}_r}|\sigma _{rt}^{-1}\hat{\sigma }_{2 rt}-1|=O_{p}(\sqrt{\log p\bar{q} / n})\). Similar to Lemma 1, we can show that \(\left\| \max _{1\le r\le p}\nabla \mathcal {L}_{2 n}\left( \varvec{\theta }^{*}_r\right) \right\| _{\infty }=O_{p}(\sqrt{\log p\bar{q} / n})\), and accordingly \(\left\| \max _{1\le r\le p}\nabla \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r\right) \right\| \le \sqrt{\bar{q}}\left\| \max _{1\le r\le p}\nabla \mathcal {L}_{2 n}\left( \varvec{\theta }^{*}_r\right) \right\| _{\infty }=O_{p}(\sqrt{\bar{q} \log p\bar{q} / n})\), uniformly in \(1\le r\le p\) and \(t\in \mathcal {S}_r\). By using similar arguments in the proof of Lemma 1 and Conditions 5–6, we can also show that \(\left\| \max _{1\le r\le p}\{\nabla ^{2} \mathcal {L}_{2 n}\left( \varvec{\theta }^{*}_r\right) -{\Upsilon }\left( \varvec{\theta }^{*}_r\right) \}\right\| =O_{p}(\bar{q} \sqrt{\log p\bar{q} / n})\). The assertion for \(\hat{\theta }^{(2)}_{rt}\) follows immediately from Lemma 5. \(\square\)
Lemma 7
Under Conditions 1–6, we have \(\Pr (\xi ) \lesssim o(1 / p\bar{q})\).
Proof
Since \(\Pr (\xi _r)\lesssim o(1/\bar{q})\), we have
. \(\square\)
The following Lemma establishes the distance between \(W_{rt}\) and \(\tilde{W}_{rt}\).
Lemma 8
Suppose conditions 1–6 hold and \(\bar{q}^{3/2}\left( \log p\bar{q}\right) ^{2+c}c_{np} \rightarrow 0\) for a small \(c>0\). Then, for any \(M>0\), we have,
Proof
By similar arguments in the proof of Lemma S.4 and S.5 in Du et al. (2021), Lemma 8 can be proved, thus its proof is omitted. \(\square\)
Proof of Theorem 1
In order to prove Theorem 1, we consider another EFC procedure with the statistics \(\tilde{W}_{rt}\), and choose a threshold \(\tilde{L}>0\) by setting \(\tilde{L}=\inf \left\{ s>0:\frac{\#\{(r,t): \tilde{W}_{rt}\le -s\}}{\#\{(r,t): \tilde{W}_{rt}\ge s\}\vee 1}\le \alpha \right\}\). Define \(\mathcal {G}=\left\{ (r,t): \theta _{rt}=o\left( c_{n p}\right) \right\}\). Similar to the statements of the proof of Theorem 4 in Du et al. (2021), for any \((r,t)\in \mathcal {G}\), under the condition that \(\bar{q}^{3/2}\left( \log p\bar{q}\right) ^{2+c}c_{np} \rightarrow 0\) for a small \(c>0\), the absolute difference between \(W_{rt}\) and \(\tilde{W}_{rt}\) is negligible from Lemma 6, and for \((r,t)\in \mathcal {G}^{c}\), we have, \(\tilde{W}_{rt}=W_{rt}\{1+o_p(1)\}\). From Lemma 8, we conclude that,
Under Conditions 1–8, similar to the proof of Theorem 2 of Du et al. (2021), we can show that \(\textrm{FDP}_{\tilde{W}}(\tilde{L})\) is controlled at the nominal level asymptotically. Thus the desired result follows. \(\square\)
1.1 Additional lemmas
Lemma 9
(Bernstein’s inequality) Let \(X_{1}, \ldots , X_{n}\) be independent centered random variables a.s. bounded by \(A<\infty\) in absolute value. Let \(\sigma ^{2}=n^{-1}\sum _{i=1}^{n} \mathbb {E}\left( X_{i}^{2}\right)\). Then for all \(x>0\),
Lemma 10
(Moderate deviation for the independent sum) Suppose that \(X_{1}, \ldots , X_{n}\) are independent random variables with mean zero, satisfying \(\mathbb {E}\left( |X_{i}|^{2+\delta }\right) <\infty \ (i=1,2,\ldots )\). Let \(B_{n}=\sum _{i=1}^{n} \mathbb {E}\left( X_{i}^{2}\right)\) and assume that \(\liminf _{n} B_{n} / n>0\). Then,
as \(n \rightarrow \infty\) uniformly in x in the domain \(0 \le x \le C\left\{ 2 \log \left( 1 / L_{n}\right) \right\} ^{1 / 2}\), where \(L_{n}=B_{n}^{-1-\delta / 2} \sum _{i=1}^{n} \mathbb {E}|X_{i}|^{2+\delta }\) and C is a positive constant satisfying the condition \(C<1\).
Appendix B: Additional simulation results
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Zhang, Y. & Li, Z. Structure learning of exponential family graphical model with false discovery rate control. J. Korean Stat. Soc. 52, 554–580 (2023). https://doi.org/10.1007/s42952-023-00213-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-023-00213-8