Structure learning of exponential family graphical model with false discovery rate control

Liu, Yanhong; Zhang, Yuhao; Li, Zhonghua

doi:10.1007/s42952-023-00213-8

Structure learning of exponential family graphical model with false discovery rate control

Research Article
Published: 09 May 2023

Volume 52, pages 554–580, (2023)
Cite this article

Journal of the Korean Statistical Society Aims and scope Submit manuscript

190 Accesses
Explore all metrics

Abstract

Probabilistic graphical models enjoy great popularity in a wide range of domains due to their ability to model the conditional dependency relationships among random variables. This paper explores the structure learning for the exponential family graphical model with false discovery rate (FDR) control. Most existing FDR-controlled structure learning procedures have been designed for the Gaussian graphical model (GGM). A systematic approach for more general exponential family graphical models is still lacking. In this paper, we introduce a unified procedure to learn the structure of the exponential family graphical model with FDR control utilizing the symmetrized data aggregation (SDA) technique via sample splitting, data screening, and information pooling. We show that our method controls FDR asymptotically under some mild conditions. Extensive simulation results and two real-data examples validate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Support Recovery of Gaussian Graphical Model with False Discovery Rate Control

Article 12 December 2023

Confidence graphs for graphical model selection

Article 16 July 2021

Mutual conditional independence and its applications to model selection in Markov networks

Article Open access 21 July 2020

References

Allen, G. I., & Liu, Z. (2012). A log-linear graphical model for inferring genetic networks from high-throughput sequencing data. In 2012 IEEE International Conference on Bioinformatics and Biomedicine (pp. 1–6). IEEE.
Allen, G. I., & Liu, Z. (2013). A local Poisson graphical model for inferring networks from sequencing data. IEEE Transactions on Nanobioscience, 12(3), 189–198.
Google Scholar
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
MathSciNet MATH Google Scholar
Barber, R. F., & Candès, E. J. (2019). A knockoff filter for high-dimensional selective inference. The Annals of Statistics, 47(5), 2504–2537.
MathSciNet MATH Google Scholar
Barber, R. F., & Drton, M. (2015). High-dimensional ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9(1), 567–607.
MathSciNet MATH Google Scholar
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300.
MathSciNet MATH Google Scholar
Bühlmann, P., & Mandozzi, J. (2014). High-dimensional variable screening and bias in subsequent inference, with an empirical comparison. Computational Statistics, 29(3), 407–430.
MathSciNet MATH Google Scholar
Cai, T., Li, H., Ma, J., et al. (2019). Differential Markov random field analysis with an application to detecting differential microbial community networks. Biometrika, 106(2), 401–416.
MathSciNet MATH Google Scholar
Cai, T., Liu, W., & Luo, X. (2011). A constrained $\ell _1$ minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494), 594–607.
MathSciNet MATH Google Scholar
Cheng, J., Li, T., Levina, E., et al. (2017). High-dimensional mixed graphical models. Journal of Computational and Graphical Statistics, 26(2), 367–378.
MathSciNet Google Scholar
Chen, X., & Liu, W. (2019). Graph estimation for matrix-variate gaussian data. Statistica Sinica, 29(1), 479–504.
MathSciNet MATH Google Scholar
d’Aspremont, A., Banerjee, O., & El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM Journal on Matrix Analysis and Applications, 30(1), 56–66.
MathSciNet MATH Google Scholar
Drton, M., & Maathuis, M. H. (2017). Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 4, 365–393.
Google Scholar
Drton, M., & Perlman, M. D. (2007). Multiple testing and error control in gaussian graphical model selection. Statistical Science, 22(3), 430–449.
MathSciNet MATH Google Scholar
Du, L., Guo, X., Sun, W., et al. (2021). False discovery rate control under general dependence by symmetrized data aggregation. Journal of the American Statistical Association, 1–15.
Fan, J., Feng, Y., & Wu, Y. (2009). Network exploration via the adaptive lasso and scad penalties. The Annals of Applied Statistics, 3(2), 521.
MathSciNet MATH Google Scholar
Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for gaussian graphical models. Advances in Neural Information Processing Systems, 23, 1–9.
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441.
MATH Google Scholar
He, Y., Zhang, X., Wang, P., et al. (2017). High dimensional gaussian copula graphical model with FDR control. Computational Statistics & Data Analysis, 113, 457–474.
MathSciNet MATH Google Scholar
Höfling, H., & Tibshirani, R. (2009). Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. Journal of Machine Learning Research, 10(4), 883–906.
Jeon, M., Jin, I. H., Schweinberger, M., et al. (2021). Mapping unobserved item-respondent interactions: A latent space item response model with interaction map. Psychometrika, 86(2), 378–403.
MathSciNet MATH Google Scholar
Keener, R. W. (2010). Theoretical statistics: Topics for a core course. Springer.
Kouros-Mehr, H., Slorach, E. M., Sternlicht, M. D., et al. (2006). Gata-3 maintains the differentiation of the luminal cell fate in the mammary gland. Cell, 127(5), 1041–1055.
Google Scholar
Lam, C., & Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics, 37(6B), 4254.
MathSciNet MATH Google Scholar
Lauritzen, S. L. (1996). Graphical models (Vol. 17). Clarendon Press.
Lee, J. D., & Hastie, T. J. (2015). Learning the structure of mixed graphical models. Journal of Computational and Graphical Statistics, 24(1), 230–253.
MathSciNet Google Scholar
Lee, S., Sobczyk, P., & Bogdan, M. (2019). Structure learning of gaussian Markov random fields with false discovery rate control. Symmetry, 11(10), 1311.
Google Scholar
Lehmann, E. L., & Casella, G. (2006). Theory of point estimation. Springer Science & Business Media.
Li, J., & Maathuis, M. H. (2021). GGM knockoff filter: False discovery rate control for gaussian graphical models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(3), 534–558.
MathSciNet MATH Google Scholar
Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control. The Annals of Statistics, 41(6), 2948–2978.
MathSciNet MATH Google Scholar
Liu, W., & Shao, Q. M. (2014). Phase transition and regularized bootstrap in large-scale $t$-tests with false discovery rate control. The Annals of Statistics, 42(5), 2003–2025.
MathSciNet MATH Google Scholar
Liu, H., & Wang, L. (2017). Tiger: A tuning-insensitive approach for optimally estimating gaussian graphical models. Electronic Journal of Statistics, 11(1), 241–294.
MathSciNet MATH Google Scholar
Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462.
MathSciNet MATH Google Scholar
Meinshausen, N., Meier, L., & Bühlmann, P. (2009). $P$-values for high-dimensional regression. Journal of the American Statistical Association, 104(488), 1671–1681.
MathSciNet MATH Google Scholar
Natali, P., Nicotra, M., Sures, I., et al. (1992). Breast cancer is associated with loss of the c-kit oncogene product. International Journal of Cancer, 52(5), 713–717.
Google Scholar
Ravikumar, P., Wainwright, M. J., & Lafferty, J. D. (2010). High-dimensional ising model selection using $\ell _1$-regularized logistic regression. The Annals of Statistics, 38(3), 1287–1319.
MathSciNet MATH Google Scholar
Ravikumar, P., Wainwright, M. J., Raskutti, G., et al. (2011). High-dimensional covariance estimation by minimizing $l_1$-penalized log-determinant divergence. Electronic Journal of Statistics, 5, 935–980.
MathSciNet MATH Google Scholar
Ross-Innes, C. S., Stark, R., Teschendorff, A. E., et al. (2012). Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature, 481(7381), 389–393.
Google Scholar
Rothman, A. J., Bickel, P. J., Levina, E., et al. (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2, 494–515.
MathSciNet MATH Google Scholar
Salesse, S., Odoul, L., Chazée, L., et al. (2018). Elastin molecular aging promotes mda-mb-231 breast cancer cell invasiveness. FEBS Open Bio, 8(9), 1395–1404.
Google Scholar
Sun, T., & Zhang, C. H. (2013). Sparse matrix inversion with scaled lasso. The Journal of Machine Learning Research, 14(1), 3385–3418.
MathSciNet MATH Google Scholar
Teng, Y. H. F., Tan, W. J., Thike, A. A., et al. (2011). Mutations in the epidermal growth factor receptor (EGFR) gene in triple negative breast cancer: Possible implications for targeted therapy. Breast Cancer Research, 13(2), 1–9.
Google Scholar
Van De Geer, S. A., & Bühlmann, P. (2009). On the conditions used to prove oracle results for the lasso. Electronic Journal of Statistics, 3, 1360–1392.
MathSciNet MATH Google Scholar
Wasserman, L., & Roeder, K. (2009). High dimensional variable selection. The Annals of Statistics, 37(5A), 2178.
MathSciNet MATH Google Scholar
Xia, Y., Cai, T., & Cai, T. T. (2015). Testing differential networks with applications to the detection of gene-gene interactions. Biometrika, 102(2), 247–266.
MathSciNet MATH Google Scholar
Xue, L., Zou, H., & Cai, T. (2012). Nonconcave penalized composite conditional likelihood estimation of sparse ising models. The Annals of Statistics, 40(3), 1403–1429.
MathSciNet MATH Google Scholar
Yang, E., Allen, G., Liu, Z., et al. (2012). Graphical models via generalized linear models. Advances in Neural Information Processing Systems, 25, 1–9.
Yang, E., Baker, Y., & Ravikumar, P., et al. (2014). Mixed graphical models via exponential families. In Artificial Intelligence And Statistics, PMLR (pp. 1042–1050).
Yang, E., Ravikumar, P., Allen, G. I., et al. (2015). Graphical models via univariate exponential family distributions. The Journal of Machine Learning Research, 16(1), 3813–3847.
MathSciNet MATH Google Scholar
Yang, E., Ravikumar, P. K., Allen, G. I., et al. (2013). Conditional random fields via univariate exponential families. Advances in Neural Information Processing Systems, 26, 1–9.
Google Scholar
Yu, L., Kaufmann, T., & Lederer, J. (2021). False discovery rates in biological networks. In International Conference on Artificial Intelligence and Statistics, PMLR (pp. 163–171).
Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 11, 2261–2286.
MathSciNet MATH Google Scholar
Yuan, M., & Lin, Y. (2007). Model selection and estimation in the gaussian graphical model. Biometrika, 94(1), 19–35.
MathSciNet MATH Google Scholar
Zhang, Y., Duchi, J. C., & Wainwright, M. J. (2013). Communication-efficient algorithms for statistical optimization. The Journal of Machine Learning Research, 14(1), 3321–3363.
MathSciNet MATH Google Scholar
Zhang, R., Ren, Z., Celedón, J. C., et al. (2021). Inference of large modified Poisson-type graphical models: Application to RNA-SEQ data in childhood atopic asthma studies. The Annals of Applied Statistics, 15(2), 831–855.
MathSciNet MATH Google Scholar
Zhao, T., & Liu, H. (2014). Calibrated precision matrix estimation for high-dimensional elliptical distributions. IEEE Transactions on Information Theory, 60(12), 7874–7887.
MathSciNet MATH Google Scholar
Zhao, T., Liu, H., Roeder, K., et al. (2012). The huge package for high-dimensional undirected graph estimation in r. The Journal of Machine Learning Research, 13(1), 1059–1062.
MathSciNet MATH Google Scholar
Zheng, Z., Zhou, J., Guo, X., et al. (2018). Recovering the graphical structures via knockoffs. Procedia Computer Science, 129, 201–207.
Google Scholar

Download references

Acknowledgements

The authors are grateful to the editor, the associate editor, and two anonymous referees for their comments that have greatly improved this paper. The first two authors contributed equally to this paper. This research was supported by National Key R &D Program of China (Grant Nos. 2022ZD0114801, 2019YFC1908502, 2022YFA1003703, 2022YFA1003802, 2022YFA1003803) and NNSF of China Grants (Nos. 12071233, 11925106, 12231011, 11931001 and 11971247).

Author information

Authors and Affiliations

School of Statistics and Data Science, LPMC, LEBPS and KLMDASR, Nankai University, Tianjin, 300071, China
Yanhong Liu, Yuhao Zhang & Zhonghua Li

Authors

Yanhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhonghua Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhonghua Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: The proof of Theorem 1

Recall that

$$\begin{aligned} \textrm{FDP}&=\frac{\sum _{(r,t)\in E^c}\mathbb {I}\left( W_{rt}\ge L\right) }{1\vee \sum _{(r,t)}\mathbb {I}(W_{rt}\ge L)}= \frac{\sum _{(r,t)}\mathbb {I}\left( W_{rt}\le - L \right) }{1\vee \sum _{(r,t)}\mathbb {I}(W_{rt}\ge L)}\times \frac{\sum _{(r,t)\in E^c}\mathbb {I}\left( W_{rt}\ge L\right) }{\sum _{(r,t)}\mathbb {I}\left( W_{rt}\le -L\right) }\\&\le \alpha \times \frac{\sum _{(r,t)\in E^c}\mathbb {I}\left( W_{rt}\ge L\right) }{\sum _{(r,t)\in E^c}\mathbb {I}\left( W_{rt}\le -L\right) }. \end{aligned}$$

Let $q_{0n}=|\mathcal {S}\cap E^c|$, then $q_{0n}<p\bar{q}$. Let $\tilde{\Phi }(x)=1-\Phi (x)$, $G(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (W_{rt}\ge s\mid {\mathcal D}_{1})$, $G_{-}(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (W_{rt}\le -s\mid {\mathcal D}_{1})$ and $G^{-1}(y)=\inf \{s\ge 0: G(s)\le y\}$ for $0\le y\le 1$. In order to prove Theorem 1, we introduce an intermediate quantity $\tilde{W}_{rt}$, it is constructed as follows: define $\tilde{\varvec{\theta }}_r^{(2)}={\varvec{\theta }}_r^{*}-\left\{ \Upsilon \left( \varvec{\theta }_r^{*}\right) \right\} ^{-1} \nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right)$. For $1\le r\le p,$ we define $T_{1rt}=\sqrt{n_1} \hat{\theta }_{rt}^{(1)} / {{\hat{\sigma }}}_{2rt}$ and $\tilde{T}_{2rt}=\sqrt{n_2} \tilde{\theta }_{rt}^{(2)} /{{\hat{\sigma }}}_{2rt}$, and $\tilde{W}^{1}_{rt}=T_{1rt}\tilde{T}_{2rt}$, then obtain $\tilde{W}_{rt}$ by the AND rule similar to formula (8). We first establish the symmetry property and uniform convergence of $\tilde{W}_{rt}$ under the null and then show the distance of $W_{rt}$ and $\tilde{W}_{rt}$ is negligible, thus proving Theorem 1. For $\tilde{W}_{rt},$ we accordingly define $\tilde{G}(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (\tilde{W}_{rt}\ge s\mid {\mathcal D}_{1})$, $\tilde{G}_{-}(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (\tilde{W}_{rt}\le -s\mid {\mathcal D}_{1})$ and $\tilde{G}^{-1}(y)=\inf \{s\ge 0: \tilde{G}(s)\le y\}$ for $0\le y\le 1$. We first provide a lemma that establishes the uniform bounds for $\tilde{\theta }_{rt}^{(2)}$, which is the counterpart of Lemma S.8 in Du et al. (2021).

Lemma 1

Suppose Conditions 1–6 hold, then for $C>4$, as $n\rightarrow \infty$, we have,

$$\begin{aligned} \Pr \left( \max _{1\le r\le p}\max _{t\in \mathcal {S}_r}\sqrt{n_2} |\tilde{\theta }_{rt}^{(2)}-\theta _{rt}^{*}|/\sigma _{rt}>\sqrt{C\log (p\bar{q})}\right) =o(1/p\bar{q}) \end{aligned}$$

Proof

Let $\mathcal {B}=\left\{ \max _{1 \le r \le p}\max _{ 1\le i\le n_2}\left\| \left\{ {\Upsilon }\left( {\varvec{\theta }}_r^{*}\right) \right\} ^{-1} \nabla \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r;X\right) \right\| _{\infty } \le m_{n}\right\}$, where $m_{n}=(n_2p\bar{q})^{1 / \varpi +\gamma } K_{n_2}$ for some small $\gamma >0$. By Condition 6 and Markov inequality, $\Pr \left( \mathcal {B}^{c}\right) \le 1 / p\bar{q}$. Let $x=\sqrt{C \log p\bar{q} / n_{2}}$. Conditioned on the post-selection model $\mathcal {S}=\bigcup _{r=1}^{p}\mathcal {S}_r$ and $\mathcal {B}$, the Bernstein inequality in Lemma 9 yields that

$$\begin{aligned} \begin{aligned}&{\text {Pr}}\left( |\tilde{\theta }_{rt}^{(2)}-\theta _{rt}^{*}|/ \sigma _{rt}>x \text{ for } \text{ some } r,t \mid \mathcal {S}, \mathcal {B}\right) \\&\quad \le p\bar{q} \max _{r}\max _{t} {\text {Pr}}\left( |\textbf{e}_{t}^{\top }\left\{ \Upsilon \left( \varvec{\theta }^{*}_r;\textbf{X}_i\right) \right\} ^{-1}\sum _{i=1}^{n_{2}} \nabla \mathcal {L}\left( \varvec{\theta }^{*}_r; {\textbf{X}}_{i}\right) |>n_{2} \sigma _{rt} x \mid \mathcal {S}, \mathcal {B}\right) \\&\quad \le 2 p\bar{q} \max _{r}\max _{t} \exp \left\{ -\frac{n_{2}^{2} \sigma _{rt}^{2} x^{2}}{2 n_{2} \sigma _{rt}^{2} + 2 m_{n} n_{2} \sigma _{rt} x / 3}\right\} \\&\quad =o(1 / p\bar{q}) \end{aligned} \end{aligned}$$

holds uniformly for $1\le r\le p$ and $t\in \mathcal {S}_r$, where we use the fact that $m_{n} \sqrt{\log p\bar{q} / n_2} \rightarrow 0$. The lemma is proved. $\square$

The next lemma establishes the symmetry property of $\tilde{W}_{rt}$, and it is the counterpart of Lemma S.1 in Du et al. (2021).

Lemma 2

Suppose Conditions 1–6 hold. Define a sequence $a_{p}$ satisfying $a_{p} / \bar{q} \rightarrow \infty$ and $a_{p}=o(p \bar{q})$. Then for any $0 \leqslant s \leqslant \tilde{G}_{-}^{-1}(a_{p} /(p \bar{q}))$, we have

$$\begin{aligned} \frac{\tilde{G}(s)}{\tilde{G}_{-}(s)}-1 \rightarrow 0. \end{aligned}$$

The proof of Lemma 2 mainly utilizes Lemmas 1 and 10, and its proof is similar to the proof of Lemma S.1 in Du et al. (2021) and thus is omitted.

The next lemma characterizes the uniform convergence of $(p\bar{q})^{-1} \sum _{(r,t)\in E^c} \mathbb {I}(\tilde{W}_{rt} \ge s)$ and $(p \bar{q})^{-1} \sum _{(r,t)\in E^c} \mathbb {I}(\tilde{W}_{rt} \le -s)$.

Lemma 3

Suppose Conditions 1–8 hold. For any sequence $a_{p}$ satisfying $a_{p} / \bar{q} \rightarrow \infty$ and $a_{p}=o(p \bar{q})$, we have

$$\begin{aligned} \sup _{0 \le s \le \tilde{G}^{-1}\left( a_{p} /(p \bar{q})\right) }\left|\frac{\sum _{(r,t)\in E^c } \mathbb {I}\left( \tilde{W}_{rt} \ge s\right) }{p\bar{q} \tilde{G}(s)}-1\right|=o_{p}(1), \end{aligned}$$

(A1)

and

$$\begin{aligned} \sup _{0 \le s \le \tilde{G}_{-}^{-1}\left( a_{p} /(p \bar{q})\right) }\left|\frac{\sum _{(r,t)\in E^c} \mathbb {I}\left( \tilde{W}_{rt}\le -s\right) }{p \bar{q} \tilde{G}_{-}(s)}-1\right|=o_{p}(1). \end{aligned}$$

(A2)

Under the Condition 8, Lemma 3 is easily proved using the technique in Du et al. (2021) and thus its proof is omitted. Lemmas 2 and 3 respectively establish the symmetry property and uniform consistency for $\tilde{W}_{rt}$’s. In the following, we show that the distance between $W_{rt}$ and $\tilde{W}_{rt}$ is small. To show the distance between $W_{rt}$ and $\tilde{W}_{rt}$ is negligible, we need the following lemmas. For the rth node ($1\le r\le p$), define the event

$$\begin{aligned} {\small \xi _r:=\left\{ \frac{1}{n}\sum _{i=1}^{n} M_r\left( \textbf{X}_{i}\right) \le 2 L_r,\left\| \nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) -\Upsilon \left( \varvec{\theta }_r^{*}\right) \right\| \le \frac{\rho _r l_{-}}{2}, \left\| \nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) \right\| < \frac{(1-\rho _r) l_{-} \delta _{\rho _r}}{2}\right\} ,} \end{aligned}$$

where $\delta _{\rho _r}=\min \left\{ \rho _r,\rho _rl_{-} /(4 L_r)\right\}$. And we further define the event $\xi =\bigcap _{r=1}^{p}\xi _r$.

Lemma 4

Suppose Conditions 3–6 hold. Under the event $\xi _r$, we have,

$$\begin{aligned} \left\| \hat{\varvec{\theta }}^{(2)}_r-\varvec{\theta }_r^{*}\right\| \le \frac{2\left\| \nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) \right\| }{(1-\rho _r) l_{-}}. \end{aligned}$$

Proof

This lemma follows exactly the same as Lemma 6 in Zhang et al. (2013), and thus its proof is omitted. $\square$

Lemma 5

Suppose Conditions 3–6 hold. Under the event $\xi _r$, we have

$$\begin{aligned}&\left\| \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }_r^{*}+\left\{ {\varvec{\Upsilon }}\left( \varvec{\theta }_r^{*}\right) \right\} ^{-1}\nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) \right\| _{\infty }\\&\quad \le \frac{2}{(1-\rho _r) l_{-}^{2}}\left\| \nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) -{\Upsilon }\left( \varvec{\theta }_r^{*}\right) \right\| \left\| \nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) \right\| +\frac{8 L_r}{(1-\rho _r)^{2} l_{-}^{3}}\left\| \nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) \right\| ^{2}. \end{aligned}$$

Proof

By the integral form of Taylor’s expansion, we have

$$\begin{aligned} 0=\nabla \mathcal {L}_{2n}(\hat{\varvec{\theta }}_r^{(2)} )=\nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) +\textbf{H}_{n}\left( \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }_r^{*}\right) , \end{aligned}$$

where $\textbf{H}_{n}=\int _{0}^{1} \nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}+u\left( \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }_r^{*}\right) \right) du$. Then simple linear algebra yields

$$\begin{aligned} \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }^{*}_r=-\left\{ \varvec{\Upsilon }\left( \varvec{\theta }^{*}_r\right) \right\} ^{-1} \nabla \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r\right) -\left\{ \Upsilon \left( \varvec{\theta }^{*}_r\right) \right\} ^{-1} \textbf{U}_{n}\left( \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }^{*}_r\right) -\left\{ \varvec{\Upsilon }\left( \varvec{\theta }^{*}_r\right) \right\} ^{-1} \textbf{V}_{n}\left( \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }^{*}_r\right) , \end{aligned}$$

where $\textbf{U}_{n}=\textbf{H}_{n}-\nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r\right)$ and $\textbf{V}_{n}=\nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r\right) -\Upsilon \left( \varvec{\theta }^{*}_r\right)$. Then, the claimed expansion is a direct consequence of Lemma 4 and Conditions 5–6. $\square$

Lemma 6

Suppose Conditions 1–6 hold. Under the event $\xi =\bigcap _{r=1}^{p}\xi _r$, we have,

$$\begin{aligned} \left|\frac{\sqrt{n}\left( \hat{\theta }_{ rt}^{(2)}-\theta _{rt}^{*}\right) }{\hat{\sigma }_{2 rt}}-\frac{\sqrt{n}\left( \tilde{\theta }^{(2)}_{rt}- \theta _{rt}^{*}\right) }{\sigma _{rt}}\right|= o_{p}\left( a_{n}\right) , \end{aligned}$$

holds uniformly for $1\le r\le p$ and $t\in \mathcal {S}_r$, where, $a_{n} /\left( \bar{q}^{3 / 2} \log p\bar{q} / \sqrt{n}\right) \rightarrow \infty .$

Proof

By the Bernstein inequality and Conditions 5–6, we can show that $\mathop {\max }\nolimits _{1\le r \le p}\mathop {\max }\nolimits _{t\in \mathcal {S}_r}|\sigma _{rt}^{-1}\hat{\sigma }_{2 rt}-1|=O_{p}(\sqrt{\log p\bar{q} / n})$. Similar to Lemma 1, we can show that $\left\| \max _{1\le r\le p}\nabla \mathcal {L}_{2 n}\left( \varvec{\theta }^{*}_r\right) \right\| _{\infty }=O_{p}(\sqrt{\log p\bar{q} / n})$, and accordingly $\left\| \max _{1\le r\le p}\nabla \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r\right) \right\| \le \sqrt{\bar{q}}\left\| \max _{1\le r\le p}\nabla \mathcal {L}_{2 n}\left( \varvec{\theta }^{*}_r\right) \right\| _{\infty }=O_{p}(\sqrt{\bar{q} \log p\bar{q} / n})$, uniformly in $1\le r\le p$ and $t\in \mathcal {S}_r$. By using similar arguments in the proof of Lemma 1 and Conditions 5–6, we can also show that $\left\| \max _{1\le r\le p}\{\nabla ^{2} \mathcal {L}_{2 n}\left( \varvec{\theta }^{*}_r\right) -{\Upsilon }\left( \varvec{\theta }^{*}_r\right) \}\right\| =O_{p}(\bar{q} \sqrt{\log p\bar{q} / n})$. The assertion for $\hat{\theta }^{(2)}_{rt}$ follows immediately from Lemma 5. $\square$

Lemma 7

Under Conditions 1–6, we have $\Pr (\xi ) \lesssim o(1 / p\bar{q})$.

Proof

Since $\Pr (\xi _r)\lesssim o(1/\bar{q})$, we have

$$\begin{aligned} \Pr (\xi )=\Pr \left( \bigcap _{r=1}^{p}\xi _r\right) \lesssim o(1/p\bar{q}) \end{aligned}$$

. $\square$

The following Lemma establishes the distance between $W_{rt}$ and $\tilde{W}_{rt}$.

Lemma 8

Suppose conditions 1–6 hold and $\bar{q}^{3/2}\left( \log p\bar{q}\right) ^{2+c}c_{np} \rightarrow 0$ for a small $c>0$. Then, for any $M>0$, we have,

$$\begin{aligned} \sup _{M \le s \le G^{-1}\left( \alpha \eta _{n} /p\bar{q}\right) }\left|\frac{\sum _{(r,t)\in E^c} \mathbb {I}\left( \tilde{W}_{rt} \ge s\right) }{\sum _{(r,t)\in E^c} \mathbb {I}\left( W_{rt} \ge s\right) }-1\right|= & {} o_{p}(1), \\ \sup _{M \le s \le G_{-}^{-1}\left( \alpha \eta _{n} /p\bar{q}\right) }\left|\frac{\sum _{(r,t)\in E^c} \mathbb {I}\left( \tilde{W}_{rt} \le -s\right) }{\sum _{(r,t)\in E^c} \mathbb {I}\left( W_{rt}\le -s\right) }-1\right|= & {} o_{p}(1). \end{aligned}$$

Proof

By similar arguments in the proof of Lemma S.4 and S.5 in Du et al. (2021), Lemma 8 can be proved, thus its proof is omitted. $\square$

Proof of Theorem 1

In order to prove Theorem 1, we consider another EFC procedure with the statistics $\tilde{W}_{rt}$, and choose a threshold $\tilde{L}>0$ by setting $\tilde{L}=\inf \left\{ s>0:\frac{\#\{(r,t): \tilde{W}_{rt}\le -s\}}{\#\{(r,t): \tilde{W}_{rt}\ge s\}\vee 1}\le \alpha \right\}$. Define $\mathcal {G}=\left\{ (r,t): \theta _{rt}=o\left( c_{n p}\right) \right\}$. Similar to the statements of the proof of Theorem 4 in Du et al. (2021), for any $(r,t)\in \mathcal {G}$, under the condition that $\bar{q}^{3/2}\left( \log p\bar{q}\right) ^{2+c}c_{np} \rightarrow 0$ for a small $c>0$, the absolute difference between $W_{rt}$ and $\tilde{W}_{rt}$ is negligible from Lemma 6, and for $(r,t)\in \mathcal {G}^{c}$, we have, $\tilde{W}_{rt}=W_{rt}\{1+o_p(1)\}$. From Lemma 8, we conclude that,

$$\begin{aligned} {\textrm{FDP}}_{\tilde{W}}(\tilde{L}):=\frac{\#\left\{ (r,t): \tilde{W}_{rt} \ge \tilde{L}, (r,t)\in E^{c}\right\} }{\#\left\{ (r,t): \tilde{W}_{rt} \ge \tilde{L}\right\} \vee 1}={\textrm{FDP}}_{W}(L)\left\{ 1+o_{p}(1)\right\} . \end{aligned}$$

Under Conditions 1–8, similar to the proof of Theorem 2 of Du et al. (2021), we can show that $\textrm{FDP}_{\tilde{W}}(\tilde{L})$ is controlled at the nominal level asymptotically. Thus the desired result follows. $\square$

1.1 Additional lemmas

Lemma 9

(Bernstein’s inequality) Let $X_{1}, \ldots , X_{n}$ be independent centered random variables a.s. bounded by $A<\infty$ in absolute value. Let $\sigma ^{2}=n^{-1}\sum _{i=1}^{n} \mathbb {E}\left( X_{i}^{2}\right)$. Then for all $x>0$,

$$\begin{aligned} \Pr \left( \sum _{i=1}^{n} X_{i} \ge x\right) \le \exp \left( -\frac{x^{2}}{2 n \sigma ^{2}+2 A x / 3}\right) . \end{aligned}$$

Lemma 10

(Moderate deviation for the independent sum) Suppose that $X_{1}, \ldots , X_{n}$ are independent random variables with mean zero, satisfying $\mathbb {E}\left( |X_{i}|^{2+\delta }\right) <\infty \ (i=1,2,\ldots )$. Let $B_{n}=\sum _{i=1}^{n} \mathbb {E}\left( X_{i}^{2}\right)$ and assume that $\liminf _{n} B_{n} / n>0$. Then,

$$\begin{aligned} \frac{\Pr \left( \sum _{i=1}^{n} X_{i}>x \sqrt{B_{n}}\right) }{1-\Phi (x)} \rightarrow 1 \end{aligned}$$

as $n \rightarrow \infty$ uniformly in x in the domain $0 \le x \le C\left\{ 2 \log \left( 1 / L_{n}\right) \right\} ^{1 / 2}$, where $L_{n}=B_{n}^{-1-\delta / 2} \sum _{i=1}^{n} \mathbb {E}|X_{i}|^{2+\delta }$ and C is a positive constant satisfying the condition $C<1$.

Appendix B: Additional simulation results

See Tables 7 and 8.

Table 7 The empirical FDR(%)s of the EFC method with its test statistic matrix W combined by the OR rule, the significance level $\alpha =5\%,10\%,20\%$ when $(n,\delta )=(500,0.3)$ under Case (Ia)

Full size table

Table 8 The empirical FDR and TDR results of the EFC, GFC_L, and the EBIC methods under the PGM setup when $n=1000,p=50$

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Zhang, Y. & Li, Z. Structure learning of exponential family graphical model with false discovery rate control. J. Korean Stat. Soc. 52, 554–580 (2023). https://doi.org/10.1007/s42952-023-00213-8

Download citation

Received: 09 December 2022
Accepted: 10 April 2023
Published: 09 May 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s42952-023-00213-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structure learning of exponential family graphical model with false discovery rate control

Abstract

Access this article

Similar content being viewed by others

Support Recovery of Gaussian Graphical Model with False Discovery Rate Control

Confidence graphs for graphical model selection

Mutual conditional independence and its applications to model selection in Markov networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: The proof of Theorem 1

Lemma 1

Proof

Lemma 2

Lemma 3

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Lemma 8

Proof

Proof of Theorem 1

1.1 Additional lemmas

Lemma 9

Lemma 10

Appendix B: Additional simulation results

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Structure learning of exponential family graphical model with false discovery rate control

Abstract

Access this article

Similar content being viewed by others

Support Recovery of Gaussian Graphical Model with False Discovery Rate Control

Confidence graphs for graphical model selection

Mutual conditional independence and its applications to model selection in Markov networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: The proof of Theorem 1

Lemma 1

Proof

Lemma 2

Lemma 3

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Lemma 8

Proof

Proof of Theorem 1

1.1 Additional lemmas

Lemma 9

Lemma 10

Appendix B: Additional simulation results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation