Skip to main content
Log in

Structure learning of exponential family graphical model with false discovery rate control

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

Probabilistic graphical models enjoy great popularity in a wide range of domains due to their ability to model the conditional dependency relationships among random variables. This paper explores the structure learning for the exponential family graphical model with false discovery rate (FDR) control. Most existing FDR-controlled structure learning procedures have been designed for the Gaussian graphical model (GGM). A systematic approach for more general exponential family graphical models is still lacking. In this paper, we introduce a unified procedure to learn the structure of the exponential family graphical model with FDR control utilizing the symmetrized data aggregation (SDA) technique via sample splitting, data screening, and information pooling. We show that our method controls FDR asymptotically under some mild conditions. Extensive simulation results and two real-data examples validate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Allen, G. I., & Liu, Z. (2012). A log-linear graphical model for inferring genetic networks from high-throughput sequencing data. In 2012 IEEE International Conference on Bioinformatics and Biomedicine (pp. 1–6). IEEE.

  • Allen, G. I., & Liu, Z. (2013). A local Poisson graphical model for inferring networks from sequencing data. IEEE Transactions on Nanobioscience, 12(3), 189–198.

    Google Scholar 

  • Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

    MathSciNet  MATH  Google Scholar 

  • Barber, R. F., & Candès, E. J. (2019). A knockoff filter for high-dimensional selective inference. The Annals of Statistics, 47(5), 2504–2537.

    MathSciNet  MATH  Google Scholar 

  • Barber, R. F., & Drton, M. (2015). High-dimensional ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9(1), 567–607.

    MathSciNet  MATH  Google Scholar 

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300.

    MathSciNet  MATH  Google Scholar 

  • Bühlmann, P., & Mandozzi, J. (2014). High-dimensional variable screening and bias in subsequent inference, with an empirical comparison. Computational Statistics, 29(3), 407–430.

    MathSciNet  MATH  Google Scholar 

  • Cai, T., Li, H., Ma, J., et al. (2019). Differential Markov random field analysis with an application to detecting differential microbial community networks. Biometrika, 106(2), 401–416.

    MathSciNet  MATH  Google Scholar 

  • Cai, T., Liu, W., & Luo, X. (2011). A constrained \(\ell _1\) minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494), 594–607.

    MathSciNet  MATH  Google Scholar 

  • Cheng, J., Li, T., Levina, E., et al. (2017). High-dimensional mixed graphical models. Journal of Computational and Graphical Statistics, 26(2), 367–378.

    MathSciNet  Google Scholar 

  • Chen, X., & Liu, W. (2019). Graph estimation for matrix-variate gaussian data. Statistica Sinica, 29(1), 479–504.

    MathSciNet  MATH  Google Scholar 

  • d’Aspremont, A., Banerjee, O., & El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM Journal on Matrix Analysis and Applications, 30(1), 56–66.

    MathSciNet  MATH  Google Scholar 

  • Drton, M., & Maathuis, M. H. (2017). Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 4, 365–393.

    Google Scholar 

  • Drton, M., & Perlman, M. D. (2007). Multiple testing and error control in gaussian graphical model selection. Statistical Science, 22(3), 430–449.

    MathSciNet  MATH  Google Scholar 

  • Du, L., Guo, X., Sun, W., et al. (2021). False discovery rate control under general dependence by symmetrized data aggregation. Journal of the American Statistical Association, 1–15.

  • Fan, J., Feng, Y., & Wu, Y. (2009). Network exploration via the adaptive lasso and scad penalties. The Annals of Applied Statistics, 3(2), 521.

    MathSciNet  MATH  Google Scholar 

  • Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for gaussian graphical models. Advances in Neural Information Processing Systems, 23, 1–9.

  • Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441.

    MATH  Google Scholar 

  • He, Y., Zhang, X., Wang, P., et al. (2017). High dimensional gaussian copula graphical model with FDR control. Computational Statistics & Data Analysis, 113, 457–474.

    MathSciNet  MATH  Google Scholar 

  • Höfling, H., & Tibshirani, R. (2009). Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. Journal of Machine Learning Research, 10(4), 883–906.

  • Jeon, M., Jin, I. H., Schweinberger, M., et al. (2021). Mapping unobserved item-respondent interactions: A latent space item response model with interaction map. Psychometrika, 86(2), 378–403.

    MathSciNet  MATH  Google Scholar 

  • Keener, R. W. (2010). Theoretical statistics: Topics for a core course. Springer.

  • Kouros-Mehr, H., Slorach, E. M., Sternlicht, M. D., et al. (2006). Gata-3 maintains the differentiation of the luminal cell fate in the mammary gland. Cell, 127(5), 1041–1055.

    Google Scholar 

  • Lam, C., & Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics, 37(6B), 4254.

    MathSciNet  MATH  Google Scholar 

  • Lauritzen, S. L. (1996). Graphical models (Vol. 17). Clarendon Press.

  • Lee, J. D., & Hastie, T. J. (2015). Learning the structure of mixed graphical models. Journal of Computational and Graphical Statistics, 24(1), 230–253.

    MathSciNet  Google Scholar 

  • Lee, S., Sobczyk, P., & Bogdan, M. (2019). Structure learning of gaussian Markov random fields with false discovery rate control. Symmetry, 11(10), 1311.

    Google Scholar 

  • Lehmann, E. L., & Casella, G. (2006). Theory of point estimation. Springer Science & Business Media.

  • Li, J., & Maathuis, M. H. (2021). GGM knockoff filter: False discovery rate control for gaussian graphical models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(3), 534–558.

    MathSciNet  MATH  Google Scholar 

  • Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control. The Annals of Statistics, 41(6), 2948–2978.

    MathSciNet  MATH  Google Scholar 

  • Liu, W., & Shao, Q. M. (2014). Phase transition and regularized bootstrap in large-scale \(t\)-tests with false discovery rate control. The Annals of Statistics, 42(5), 2003–2025.

    MathSciNet  MATH  Google Scholar 

  • Liu, H., & Wang, L. (2017). Tiger: A tuning-insensitive approach for optimally estimating gaussian graphical models. Electronic Journal of Statistics, 11(1), 241–294.

    MathSciNet  MATH  Google Scholar 

  • Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462.

    MathSciNet  MATH  Google Scholar 

  • Meinshausen, N., Meier, L., & Bühlmann, P. (2009). \(P\)-values for high-dimensional regression. Journal of the American Statistical Association, 104(488), 1671–1681.

    MathSciNet  MATH  Google Scholar 

  • Natali, P., Nicotra, M., Sures, I., et al. (1992). Breast cancer is associated with loss of the c-kit oncogene product. International Journal of Cancer, 52(5), 713–717.

    Google Scholar 

  • Ravikumar, P., Wainwright, M. J., & Lafferty, J. D. (2010). High-dimensional ising model selection using \(\ell _1\)-regularized logistic regression. The Annals of Statistics, 38(3), 1287–1319.

    MathSciNet  MATH  Google Scholar 

  • Ravikumar, P., Wainwright, M. J., Raskutti, G., et al. (2011). High-dimensional covariance estimation by minimizing \(l_1\)-penalized log-determinant divergence. Electronic Journal of Statistics, 5, 935–980.

    MathSciNet  MATH  Google Scholar 

  • Ross-Innes, C. S., Stark, R., Teschendorff, A. E., et al. (2012). Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature, 481(7381), 389–393.

    Google Scholar 

  • Rothman, A. J., Bickel, P. J., Levina, E., et al. (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2, 494–515.

    MathSciNet  MATH  Google Scholar 

  • Salesse, S., Odoul, L., Chazée, L., et al. (2018). Elastin molecular aging promotes mda-mb-231 breast cancer cell invasiveness. FEBS Open Bio, 8(9), 1395–1404.

    Google Scholar 

  • Sun, T., & Zhang, C. H. (2013). Sparse matrix inversion with scaled lasso. The Journal of Machine Learning Research, 14(1), 3385–3418.

    MathSciNet  MATH  Google Scholar 

  • Teng, Y. H. F., Tan, W. J., Thike, A. A., et al. (2011). Mutations in the epidermal growth factor receptor (EGFR) gene in triple negative breast cancer: Possible implications for targeted therapy. Breast Cancer Research, 13(2), 1–9.

    Google Scholar 

  • Van De Geer, S. A., & Bühlmann, P. (2009). On the conditions used to prove oracle results for the lasso. Electronic Journal of Statistics, 3, 1360–1392.

    MathSciNet  MATH  Google Scholar 

  • Wasserman, L., & Roeder, K. (2009). High dimensional variable selection. The Annals of Statistics, 37(5A), 2178.

    MathSciNet  MATH  Google Scholar 

  • Xia, Y., Cai, T., & Cai, T. T. (2015). Testing differential networks with applications to the detection of gene-gene interactions. Biometrika, 102(2), 247–266.

    MathSciNet  MATH  Google Scholar 

  • Xue, L., Zou, H., & Cai, T. (2012). Nonconcave penalized composite conditional likelihood estimation of sparse ising models. The Annals of Statistics, 40(3), 1403–1429.

    MathSciNet  MATH  Google Scholar 

  • Yang, E., Allen, G., Liu, Z., et al. (2012). Graphical models via generalized linear models. Advances in Neural Information Processing Systems, 25, 1–9.

  • Yang, E., Baker, Y., & Ravikumar, P., et al. (2014). Mixed graphical models via exponential families. In Artificial Intelligence And Statistics, PMLR (pp. 1042–1050).

  • Yang, E., Ravikumar, P., Allen, G. I., et al. (2015). Graphical models via univariate exponential family distributions. The Journal of Machine Learning Research, 16(1), 3813–3847.

    MathSciNet  MATH  Google Scholar 

  • Yang, E., Ravikumar, P. K., Allen, G. I., et al. (2013). Conditional random fields via univariate exponential families. Advances in Neural Information Processing Systems, 26, 1–9.

    Google Scholar 

  • Yu, L., Kaufmann, T., & Lederer, J. (2021). False discovery rates in biological networks. In International Conference on Artificial Intelligence and Statistics, PMLR (pp. 163–171).

  • Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 11, 2261–2286.

    MathSciNet  MATH  Google Scholar 

  • Yuan, M., & Lin, Y. (2007). Model selection and estimation in the gaussian graphical model. Biometrika, 94(1), 19–35.

    MathSciNet  MATH  Google Scholar 

  • Zhang, Y., Duchi, J. C., & Wainwright, M. J. (2013). Communication-efficient algorithms for statistical optimization. The Journal of Machine Learning Research, 14(1), 3321–3363.

    MathSciNet  MATH  Google Scholar 

  • Zhang, R., Ren, Z., Celedón, J. C., et al. (2021). Inference of large modified Poisson-type graphical models: Application to RNA-SEQ data in childhood atopic asthma studies. The Annals of Applied Statistics, 15(2), 831–855.

    MathSciNet  MATH  Google Scholar 

  • Zhao, T., & Liu, H. (2014). Calibrated precision matrix estimation for high-dimensional elliptical distributions. IEEE Transactions on Information Theory, 60(12), 7874–7887.

    MathSciNet  MATH  Google Scholar 

  • Zhao, T., Liu, H., Roeder, K., et al. (2012). The huge package for high-dimensional undirected graph estimation in r. The Journal of Machine Learning Research, 13(1), 1059–1062.

    MathSciNet  MATH  Google Scholar 

  • Zheng, Z., Zhou, J., Guo, X., et al. (2018). Recovering the graphical structures via knockoffs. Procedia Computer Science, 129, 201–207.

    Google Scholar 

Download references

Acknowledgements

The authors are grateful to the editor, the associate editor, and two anonymous referees for their comments that have greatly improved this paper. The first two authors contributed equally to this paper. This research was supported by National Key R &D Program of China (Grant Nos. 2022ZD0114801, 2019YFC1908502, 2022YFA1003703, 2022YFA1003802, 2022YFA1003803) and NNSF of China Grants (Nos. 12071233, 11925106, 12231011, 11931001 and 11971247).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhonghua Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: The proof of Theorem 1

Recall that

$$\begin{aligned} \textrm{FDP}&=\frac{\sum _{(r,t)\in E^c}\mathbb {I}\left( W_{rt}\ge L\right) }{1\vee \sum _{(r,t)}\mathbb {I}(W_{rt}\ge L)}= \frac{\sum _{(r,t)}\mathbb {I}\left( W_{rt}\le - L \right) }{1\vee \sum _{(r,t)}\mathbb {I}(W_{rt}\ge L)}\times \frac{\sum _{(r,t)\in E^c}\mathbb {I}\left( W_{rt}\ge L\right) }{\sum _{(r,t)}\mathbb {I}\left( W_{rt}\le -L\right) }\\&\le \alpha \times \frac{\sum _{(r,t)\in E^c}\mathbb {I}\left( W_{rt}\ge L\right) }{\sum _{(r,t)\in E^c}\mathbb {I}\left( W_{rt}\le -L\right) }. \end{aligned}$$

Let \(q_{0n}=|\mathcal {S}\cap E^c|\), then \(q_{0n}<p\bar{q}\). Let \(\tilde{\Phi }(x)=1-\Phi (x)\), \(G(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (W_{rt}\ge s\mid {\mathcal D}_{1})\), \(G_{-}(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (W_{rt}\le -s\mid {\mathcal D}_{1})\) and \(G^{-1}(y)=\inf \{s\ge 0: G(s)\le y\}\) for \(0\le y\le 1\). In order to prove Theorem 1, we introduce an intermediate quantity \(\tilde{W}_{rt}\), it is constructed as follows: define \(\tilde{\varvec{\theta }}_r^{(2)}={\varvec{\theta }}_r^{*}-\left\{ \Upsilon \left( \varvec{\theta }_r^{*}\right) \right\} ^{-1} \nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right)\). For \(1\le r\le p,\) we define \(T_{1rt}=\sqrt{n_1} \hat{\theta }_{rt}^{(1)} / {{\hat{\sigma }}}_{2rt}\) and \(\tilde{T}_{2rt}=\sqrt{n_2} \tilde{\theta }_{rt}^{(2)} /{{\hat{\sigma }}}_{2rt}\), and \(\tilde{W}^{1}_{rt}=T_{1rt}\tilde{T}_{2rt}\), then obtain \(\tilde{W}_{rt}\) by the AND rule similar to formula (8). We first establish the symmetry property and uniform convergence of \(\tilde{W}_{rt}\) under the null and then show the distance of \(W_{rt}\) and \(\tilde{W}_{rt}\) is negligible, thus proving Theorem 1. For \(\tilde{W}_{rt},\) we accordingly define \(\tilde{G}(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (\tilde{W}_{rt}\ge s\mid {\mathcal D}_{1})\), \(\tilde{G}_{-}(s)=(q_{0n})^{-1}\sum _{(r,t)\in E^c}\Pr (\tilde{W}_{rt}\le -s\mid {\mathcal D}_{1})\) and \(\tilde{G}^{-1}(y)=\inf \{s\ge 0: \tilde{G}(s)\le y\}\) for \(0\le y\le 1\). We first provide a lemma that establishes the uniform bounds for \(\tilde{\theta }_{rt}^{(2)}\), which is the counterpart of Lemma S.8 in Du et al. (2021).

Lemma 1

Suppose Conditions 16 hold, then for \(C>4\), as \(n\rightarrow \infty\), we have,

$$\begin{aligned} \Pr \left( \max _{1\le r\le p}\max _{t\in \mathcal {S}_r}\sqrt{n_2} |\tilde{\theta }_{rt}^{(2)}-\theta _{rt}^{*}|/\sigma _{rt}>\sqrt{C\log (p\bar{q})}\right) =o(1/p\bar{q}) \end{aligned}$$

Proof

Let \(\mathcal {B}=\left\{ \max _{1 \le r \le p}\max _{ 1\le i\le n_2}\left\| \left\{ {\Upsilon }\left( {\varvec{\theta }}_r^{*}\right) \right\} ^{-1} \nabla \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r;X\right) \right\| _{\infty } \le m_{n}\right\}\), where \(m_{n}=(n_2p\bar{q})^{1 / \varpi +\gamma } K_{n_2}\) for some small \(\gamma >0\). By Condition 6 and Markov inequality, \(\Pr \left( \mathcal {B}^{c}\right) \le 1 / p\bar{q}\). Let \(x=\sqrt{C \log p\bar{q} / n_{2}}\). Conditioned on the post-selection model \(\mathcal {S}=\bigcup _{r=1}^{p}\mathcal {S}_r\) and \(\mathcal {B}\), the Bernstein inequality in Lemma 9 yields that

$$\begin{aligned} \begin{aligned}&{\text {Pr}}\left( |\tilde{\theta }_{rt}^{(2)}-\theta _{rt}^{*}|/ \sigma _{rt}>x \text{ for } \text{ some } r,t \mid \mathcal {S}, \mathcal {B}\right) \\&\quad \le p\bar{q} \max _{r}\max _{t} {\text {Pr}}\left( |\textbf{e}_{t}^{\top }\left\{ \Upsilon \left( \varvec{\theta }^{*}_r;\textbf{X}_i\right) \right\} ^{-1}\sum _{i=1}^{n_{2}} \nabla \mathcal {L}\left( \varvec{\theta }^{*}_r; {\textbf{X}}_{i}\right) |>n_{2} \sigma _{rt} x \mid \mathcal {S}, \mathcal {B}\right) \\&\quad \le 2 p\bar{q} \max _{r}\max _{t} \exp \left\{ -\frac{n_{2}^{2} \sigma _{rt}^{2} x^{2}}{2 n_{2} \sigma _{rt}^{2} + 2 m_{n} n_{2} \sigma _{rt} x / 3}\right\} \\&\quad =o(1 / p\bar{q}) \end{aligned} \end{aligned}$$

holds uniformly for \(1\le r\le p\) and \(t\in \mathcal {S}_r\), where we use the fact that \(m_{n} \sqrt{\log p\bar{q} / n_2} \rightarrow 0\). The lemma is proved. \(\square\)

The next lemma establishes the symmetry property of \(\tilde{W}_{rt}\), and it is the counterpart of Lemma S.1 in Du et al. (2021).

Lemma 2

Suppose Conditions 16 hold. Define a sequence \(a_{p}\) satisfying \(a_{p} / \bar{q} \rightarrow \infty\) and \(a_{p}=o(p \bar{q})\). Then for any \(0 \leqslant s \leqslant \tilde{G}_{-}^{-1}(a_{p} /(p \bar{q}))\), we have

$$\begin{aligned} \frac{\tilde{G}(s)}{\tilde{G}_{-}(s)}-1 \rightarrow 0. \end{aligned}$$

The proof of Lemma 2 mainly utilizes Lemmas 1 and 10, and its proof is similar to the proof of Lemma S.1 in Du et al. (2021) and thus is omitted.

The next lemma characterizes the uniform convergence of \((p\bar{q})^{-1} \sum _{(r,t)\in E^c} \mathbb {I}(\tilde{W}_{rt} \ge s)\) and \((p \bar{q})^{-1} \sum _{(r,t)\in E^c} \mathbb {I}(\tilde{W}_{rt} \le -s)\).

Lemma 3

Suppose Conditions 18 hold. For any sequence \(a_{p}\) satisfying \(a_{p} / \bar{q} \rightarrow \infty\) and \(a_{p}=o(p \bar{q})\), we have

$$\begin{aligned} \sup _{0 \le s \le \tilde{G}^{-1}\left( a_{p} /(p \bar{q})\right) }\left|\frac{\sum _{(r,t)\in E^c } \mathbb {I}\left( \tilde{W}_{rt} \ge s\right) }{p\bar{q} \tilde{G}(s)}-1\right|=o_{p}(1), \end{aligned}$$
(A1)

and

$$\begin{aligned} \sup _{0 \le s \le \tilde{G}_{-}^{-1}\left( a_{p} /(p \bar{q})\right) }\left|\frac{\sum _{(r,t)\in E^c} \mathbb {I}\left( \tilde{W}_{rt}\le -s\right) }{p \bar{q} \tilde{G}_{-}(s)}-1\right|=o_{p}(1). \end{aligned}$$
(A2)

Under the Condition 8, Lemma 3 is easily proved using the technique in Du et al. (2021) and thus its proof is omitted. Lemmas 2 and 3 respectively establish the symmetry property and uniform consistency for \(\tilde{W}_{rt}\)’s. In the following, we show that the distance between \(W_{rt}\) and \(\tilde{W}_{rt}\) is small. To show the distance between \(W_{rt}\) and \(\tilde{W}_{rt}\) is negligible, we need the following lemmas. For the rth node (\(1\le r\le p\)), define the event

$$\begin{aligned} {\small \xi _r:=\left\{ \frac{1}{n}\sum _{i=1}^{n} M_r\left( \textbf{X}_{i}\right) \le 2 L_r,\left\| \nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) -\Upsilon \left( \varvec{\theta }_r^{*}\right) \right\| \le \frac{\rho _r l_{-}}{2}, \left\| \nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) \right\| < \frac{(1-\rho _r) l_{-} \delta _{\rho _r}}{2}\right\} ,} \end{aligned}$$

where \(\delta _{\rho _r}=\min \left\{ \rho _r,\rho _rl_{-} /(4 L_r)\right\}\). And we further define the event \(\xi =\bigcap _{r=1}^{p}\xi _r\).

Lemma 4

Suppose Conditions 36 hold. Under the event \(\xi _r\), we have,

$$\begin{aligned} \left\| \hat{\varvec{\theta }}^{(2)}_r-\varvec{\theta }_r^{*}\right\| \le \frac{2\left\| \nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) \right\| }{(1-\rho _r) l_{-}}. \end{aligned}$$

Proof

This lemma follows exactly the same as Lemma 6 in Zhang et al. (2013), and thus its proof is omitted. \(\square\)

Lemma 5

Suppose Conditions 36 hold. Under the event \(\xi _r\), we have

$$\begin{aligned}&\left\| \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }_r^{*}+\left\{ {\varvec{\Upsilon }}\left( \varvec{\theta }_r^{*}\right) \right\} ^{-1}\nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) \right\| _{\infty }\\&\quad \le \frac{2}{(1-\rho _r) l_{-}^{2}}\left\| \nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) -{\Upsilon }\left( \varvec{\theta }_r^{*}\right) \right\| \left\| \nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) \right\| +\frac{8 L_r}{(1-\rho _r)^{2} l_{-}^{3}}\left\| \nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) \right\| ^{2}. \end{aligned}$$

Proof

By the integral form of Taylor’s expansion, we have

$$\begin{aligned} 0=\nabla \mathcal {L}_{2n}(\hat{\varvec{\theta }}_r^{(2)} )=\nabla \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}\right) +\textbf{H}_{n}\left( \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }_r^{*}\right) , \end{aligned}$$

where \(\textbf{H}_{n}=\int _{0}^{1} \nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }_r^{*}+u\left( \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }_r^{*}\right) \right) du\). Then simple linear algebra yields

$$\begin{aligned} \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }^{*}_r=-\left\{ \varvec{\Upsilon }\left( \varvec{\theta }^{*}_r\right) \right\} ^{-1} \nabla \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r\right) -\left\{ \Upsilon \left( \varvec{\theta }^{*}_r\right) \right\} ^{-1} \textbf{U}_{n}\left( \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }^{*}_r\right) -\left\{ \varvec{\Upsilon }\left( \varvec{\theta }^{*}_r\right) \right\} ^{-1} \textbf{V}_{n}\left( \hat{\varvec{\theta }}_r^{(2)}-\varvec{\theta }^{*}_r\right) , \end{aligned}$$

where \(\textbf{U}_{n}=\textbf{H}_{n}-\nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r\right)\) and \(\textbf{V}_{n}=\nabla ^{2} \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r\right) -\Upsilon \left( \varvec{\theta }^{*}_r\right)\). Then, the claimed expansion is a direct consequence of Lemma 4 and Conditions 56. \(\square\)

Lemma 6

Suppose Conditions 16 hold. Under the event \(\xi =\bigcap _{r=1}^{p}\xi _r\), we have,

$$\begin{aligned} \left|\frac{\sqrt{n}\left( \hat{\theta }_{ rt}^{(2)}-\theta _{rt}^{*}\right) }{\hat{\sigma }_{2 rt}}-\frac{\sqrt{n}\left( \tilde{\theta }^{(2)}_{rt}- \theta _{rt}^{*}\right) }{\sigma _{rt}}\right|= o_{p}\left( a_{n}\right) , \end{aligned}$$

holds uniformly for \(1\le r\le p\) and \(t\in \mathcal {S}_r\), where, \(a_{n} /\left( \bar{q}^{3 / 2} \log p\bar{q} / \sqrt{n}\right) \rightarrow \infty .\)

Proof

By the Bernstein inequality and Conditions 56, we can show that \(\mathop {\max }\nolimits _{1\le r \le p}\mathop {\max }\nolimits _{t\in \mathcal {S}_r}|\sigma _{rt}^{-1}\hat{\sigma }_{2 rt}-1|=O_{p}(\sqrt{\log p\bar{q} / n})\). Similar to Lemma 1, we can show that \(\left\| \max _{1\le r\le p}\nabla \mathcal {L}_{2 n}\left( \varvec{\theta }^{*}_r\right) \right\| _{\infty }=O_{p}(\sqrt{\log p\bar{q} / n})\), and accordingly \(\left\| \max _{1\le r\le p}\nabla \mathcal {L}_{2n}\left( \varvec{\theta }^{*}_r\right) \right\| \le \sqrt{\bar{q}}\left\| \max _{1\le r\le p}\nabla \mathcal {L}_{2 n}\left( \varvec{\theta }^{*}_r\right) \right\| _{\infty }=O_{p}(\sqrt{\bar{q} \log p\bar{q} / n})\), uniformly in \(1\le r\le p\) and \(t\in \mathcal {S}_r\). By using similar arguments in the proof of Lemma 1 and Conditions 56, we can also show that \(\left\| \max _{1\le r\le p}\{\nabla ^{2} \mathcal {L}_{2 n}\left( \varvec{\theta }^{*}_r\right) -{\Upsilon }\left( \varvec{\theta }^{*}_r\right) \}\right\| =O_{p}(\bar{q} \sqrt{\log p\bar{q} / n})\). The assertion for \(\hat{\theta }^{(2)}_{rt}\) follows immediately from Lemma 5. \(\square\)

Lemma 7

Under Conditions 16, we have \(\Pr (\xi ) \lesssim o(1 / p\bar{q})\).

Proof

Since \(\Pr (\xi _r)\lesssim o(1/\bar{q})\), we have

$$\begin{aligned} \Pr (\xi )=\Pr \left( \bigcap _{r=1}^{p}\xi _r\right) \lesssim o(1/p\bar{q}) \end{aligned}$$

. \(\square\)

The following Lemma establishes the distance between \(W_{rt}\) and \(\tilde{W}_{rt}\).

Lemma 8

Suppose conditions 16 hold and \(\bar{q}^{3/2}\left( \log p\bar{q}\right) ^{2+c}c_{np} \rightarrow 0\) for a small \(c>0\). Then, for any \(M>0\), we have,

$$\begin{aligned} \sup _{M \le s \le G^{-1}\left( \alpha \eta _{n} /p\bar{q}\right) }\left|\frac{\sum _{(r,t)\in E^c} \mathbb {I}\left( \tilde{W}_{rt} \ge s\right) }{\sum _{(r,t)\in E^c} \mathbb {I}\left( W_{rt} \ge s\right) }-1\right|= & {} o_{p}(1), \\ \sup _{M \le s \le G_{-}^{-1}\left( \alpha \eta _{n} /p\bar{q}\right) }\left|\frac{\sum _{(r,t)\in E^c} \mathbb {I}\left( \tilde{W}_{rt} \le -s\right) }{\sum _{(r,t)\in E^c} \mathbb {I}\left( W_{rt}\le -s\right) }-1\right|= & {} o_{p}(1). \end{aligned}$$

Proof

By similar arguments in the proof of Lemma S.4 and S.5 in Du et al. (2021), Lemma 8 can be proved, thus its proof is omitted. \(\square\)

Proof of Theorem 1

In order to prove Theorem 1, we consider another EFC procedure with the statistics \(\tilde{W}_{rt}\), and choose a threshold \(\tilde{L}>0\) by setting \(\tilde{L}=\inf \left\{ s>0:\frac{\#\{(r,t): \tilde{W}_{rt}\le -s\}}{\#\{(r,t): \tilde{W}_{rt}\ge s\}\vee 1}\le \alpha \right\}\). Define \(\mathcal {G}=\left\{ (r,t): \theta _{rt}=o\left( c_{n p}\right) \right\}\). Similar to the statements of the proof of Theorem 4 in Du et al. (2021), for any \((r,t)\in \mathcal {G}\), under the condition that \(\bar{q}^{3/2}\left( \log p\bar{q}\right) ^{2+c}c_{np} \rightarrow 0\) for a small \(c>0\), the absolute difference between \(W_{rt}\) and \(\tilde{W}_{rt}\) is negligible from Lemma 6, and for \((r,t)\in \mathcal {G}^{c}\), we have, \(\tilde{W}_{rt}=W_{rt}\{1+o_p(1)\}\). From Lemma 8, we conclude that,

$$\begin{aligned} {\textrm{FDP}}_{\tilde{W}}(\tilde{L}):=\frac{\#\left\{ (r,t): \tilde{W}_{rt} \ge \tilde{L}, (r,t)\in E^{c}\right\} }{\#\left\{ (r,t): \tilde{W}_{rt} \ge \tilde{L}\right\} \vee 1}={\textrm{FDP}}_{W}(L)\left\{ 1+o_{p}(1)\right\} . \end{aligned}$$

Under Conditions 18, similar to the proof of Theorem 2 of Du et al. (2021), we can show that \(\textrm{FDP}_{\tilde{W}}(\tilde{L})\) is controlled at the nominal level asymptotically. Thus the desired result follows. \(\square\)

1.1 Additional lemmas

Lemma 9

(Bernstein’s inequality) Let \(X_{1}, \ldots , X_{n}\) be independent centered random variables a.s. bounded by \(A<\infty\) in absolute value. Let \(\sigma ^{2}=n^{-1}\sum _{i=1}^{n} \mathbb {E}\left( X_{i}^{2}\right)\). Then for all \(x>0\),

$$\begin{aligned} \Pr \left( \sum _{i=1}^{n} X_{i} \ge x\right) \le \exp \left( -\frac{x^{2}}{2 n \sigma ^{2}+2 A x / 3}\right) . \end{aligned}$$

Lemma 10

(Moderate deviation for the independent sum) Suppose that \(X_{1}, \ldots , X_{n}\) are independent random variables with mean zero, satisfying \(\mathbb {E}\left( |X_{i}|^{2+\delta }\right) <\infty \ (i=1,2,\ldots )\). Let \(B_{n}=\sum _{i=1}^{n} \mathbb {E}\left( X_{i}^{2}\right)\) and assume that \(\liminf _{n} B_{n} / n>0\). Then,

$$\begin{aligned} \frac{\Pr \left( \sum _{i=1}^{n} X_{i}>x \sqrt{B_{n}}\right) }{1-\Phi (x)} \rightarrow 1 \end{aligned}$$

as \(n \rightarrow \infty\) uniformly in x in the domain \(0 \le x \le C\left\{ 2 \log \left( 1 / L_{n}\right) \right\} ^{1 / 2}\), where \(L_{n}=B_{n}^{-1-\delta / 2} \sum _{i=1}^{n} \mathbb {E}|X_{i}|^{2+\delta }\) and C is a positive constant satisfying the condition \(C<1\).

Appendix B: Additional simulation results

See Tables 7 and 8.

Table 7 The empirical FDR(%)s of the EFC method with its test statistic matrix W combined by the OR rule, the significance level \(\alpha =5\%,10\%,20\%\) when \((n,\delta )=(500,0.3)\) under Case (Ia)
Table 8 The empirical FDR and TDR results of the EFC, GFC_L, and the EBIC methods under the PGM setup when \(n=1000,p=50\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Zhang, Y. & Li, Z. Structure learning of exponential family graphical model with false discovery rate control. J. Korean Stat. Soc. 52, 554–580 (2023). https://doi.org/10.1007/s42952-023-00213-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-023-00213-8

Keywords

Navigation