skip to main content
10.1145/3406325.3451006acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open Access

Sample-optimal and efficient learning of tree Ising models

Published:15 June 2021Publication History

ABSTRACT

We show that n-variable tree-structured Ising models can be learned computationally-efficiently to within total variation distance є from an optimal O(n lnn2) samples, where O(·) hides an absolute constant which, importantly, does not depend on the model being learned—neither its tree nor the magnitude of its edge strengths, on which we place no assumptions. Our guarantees hold, in fact, for the celebrated Chow-Liu algorithm [1968], using the plug-in estimator for estimating mutual information. While this (or any other) algorithm may fail to identify the structure of the underlying model correctly from a finite sample, we show that it will still learn a tree-structured model that is є-close to the true one in total variation distance, a guarantee called “proper learning.”

Our guarantees do not follow from known results for the Chow-Liu algorithm and the ensuing literature on learning graphical models, including the very recent renaissance of algorithms on this learning challenge, which only yield asymptotic consistency results, or sample-suboptimal and/or time-inefficient algorithms, unless further assumptions are placed on the model, such as bounds on the “strengths” of the model’s edges. While we establish guarantees for a widely known and simple algorithm, the analysis that this algorithm succeeds and is sample-optimal is quite complex, requiring a hierarchical classification of the edges into layers with different reconstruction guarantees, depending on their strength, combined with delicate uses of the subadditivity of the squared Hellinger distance over graphical models to control the error accumulation.

References

  1. Guy Bresler. 2015. Efficiently learning Ising models on arbitrary graphs. In Proceedings of the forty-seventh annual ACM Symposium on Theory Of Computing (STOC).Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C Chow and Cong Liu. 1968. Approximating discrete probability distributions with dependence trees. IEEE transactions on Information Theory 14, 3 (1968), 462–467.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C Chow and T Wagner. 1973. Consistency of an estimate of tree-dependent probability distributions. IEEE Transactions on Information Theory 19, 3 (1973), 369–371.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Constantinos Daskalakis, Nishanth Dikkala, and Gautam Kamath. 2019. Testing Ising Models. IEEE Trans. Inf. Theory 65, 11 (2019), 6829–6852. Google ScholarGoogle ScholarCross RefCross Ref
  5. Constantinos Daskalakis and Qinxuan Pan. 2017. Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing. In the 30th Conference on Learning Theory (COLT).Google ScholarGoogle Scholar
  6. Constantinos Daskalakis and Qinxuan Pan. 2020. Sample-Optimal and Efficient Learning of Tree Ising models. CoRR abs/2010.14864 (2020). arxiv:2010.14864 https://arxiv.org/abs/2010.14864Google ScholarGoogle Scholar
  7. Luc Devroye, Abbas Mehrabian, and Tommy Reddad. 2019. The minimax learning rate of normal and Ising undirected graphical models. Electronic Journal of Statistics (2019).Google ScholarGoogle Scholar
  8. Linus Hamilton, Frederic Koehler, and Ankur Moitra. 2017. Information theoretic properties of Markov random fields, and their algorithmic applications. In Advances in Neural Information Processing Systems. 2463–2472.Google ScholarGoogle Scholar
  9. Ali Jalali, Pradeep Ravikumar, Vishvas Vasuki, and Sujay Sanghavi. 2011. On learning discrete graphical models using group-sparse regularization. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. 378–387.Google ScholarGoogle Scholar
  10. Adam Klivans and Raghu Meka. 2017. Learning graphical models using multiplicative weights. In Proceedings of the forty-ninth annual ACM Symposium on Theory Of Computing (STOC).Google ScholarGoogle ScholarCross RefCross Ref
  11. Frederic Koehler. 2020. A Note on TV Learning of Tree Models. Personal Communication, http://math.mit.edu/~fkoehler/tv_note.pdf.Google ScholarGoogle Scholar
  12. Steffen L Lauritzen. 1996. Graphical models. Vol. 17. Clarendon Press.Google ScholarGoogle Scholar
  13. Mukund Narasimhan and Jeff A. Bilmes. 2004. PAC-learning Bounded Tree-width Graphical Models. In the 20th Conference in Uncertainty in Artificial Intelligence (UAI).Google ScholarGoogle Scholar
  14. Judea Pearl. 2014. Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Pradeep Ravikumar, Martin J Wainwright, and John D Lafferty. 2010. High-dimensional Ising model selection using $\ell_1$-regularized logistic regression. The Annals of Statistics 38, 3 (2010), 1287–1319.Google ScholarGoogle ScholarCross RefCross Ref
  16. Narayana P. Santhanam and Martin J. Wainwright. 2012. Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions. IEEE Trans. Information Theory 58, 7 (2012), 4117–4134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Vincent Y. F. Tan, Animashree Anandkumar, Lang Tong, and Alan S. Willsky. 2011. A Large-Deviation Analysis of the Maximum-Likelihood Learning of Markov Tree Structures. IEEE Trans. Information Theory 57, 3 (2011), 1714–1735.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Marc Vuffray, Sidhant Misra, Andrey Lokhov, and Michael Chertkov. 2016. Interaction screening: Efficient and sample-optimal learning of Ising models. In Advances in Neural Information Processing Systems. 2595–2603.Google ScholarGoogle Scholar
  19. Marc Vuffray, Sidhant Misra, and Andrey Y. Lokhov. 2019. Efficient Learning of Discrete Graphical Models. arxiv:1902.00600 [cs.LG]Google ScholarGoogle Scholar
  20. Martin J Wainwright, Michael I Jordan, et al\mbox. 2008. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1, 1–2 (2008), 1–305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Shanshan Wu, Sujay Sanghavi, and Alexandros G. Dimakis. 2019. Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models. In the 32nd Annual Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar

Index Terms

  1. Sample-optimal and efficient learning of tree Ising models

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader