ABSTRACT
We show that n-variable tree-structured Ising models can be learned computationally-efficiently to within total variation distance є from an optimal O(n lnn/є2) samples, where O(·) hides an absolute constant which, importantly, does not depend on the model being learned—neither its tree nor the magnitude of its edge strengths, on which we place no assumptions. Our guarantees hold, in fact, for the celebrated Chow-Liu algorithm [1968], using the plug-in estimator for estimating mutual information. While this (or any other) algorithm may fail to identify the structure of the underlying model correctly from a finite sample, we show that it will still learn a tree-structured model that is є-close to the true one in total variation distance, a guarantee called “proper learning.”
Our guarantees do not follow from known results for the Chow-Liu algorithm and the ensuing literature on learning graphical models, including the very recent renaissance of algorithms on this learning challenge, which only yield asymptotic consistency results, or sample-suboptimal and/or time-inefficient algorithms, unless further assumptions are placed on the model, such as bounds on the “strengths” of the model’s edges. While we establish guarantees for a widely known and simple algorithm, the analysis that this algorithm succeeds and is sample-optimal is quite complex, requiring a hierarchical classification of the edges into layers with different reconstruction guarantees, depending on their strength, combined with delicate uses of the subadditivity of the squared Hellinger distance over graphical models to control the error accumulation.
- Guy Bresler. 2015. Efficiently learning Ising models on arbitrary graphs. In Proceedings of the forty-seventh annual ACM Symposium on Theory Of Computing (STOC).Google ScholarDigital Library
- C Chow and Cong Liu. 1968. Approximating discrete probability distributions with dependence trees. IEEE transactions on Information Theory 14, 3 (1968), 462–467.Google ScholarDigital Library
- C Chow and T Wagner. 1973. Consistency of an estimate of tree-dependent probability distributions. IEEE Transactions on Information Theory 19, 3 (1973), 369–371.Google ScholarDigital Library
- Constantinos Daskalakis, Nishanth Dikkala, and Gautam Kamath. 2019. Testing Ising Models. IEEE Trans. Inf. Theory 65, 11 (2019), 6829–6852. Google ScholarCross Ref
- Constantinos Daskalakis and Qinxuan Pan. 2017. Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing. In the 30th Conference on Learning Theory (COLT).Google Scholar
- Constantinos Daskalakis and Qinxuan Pan. 2020. Sample-Optimal and Efficient Learning of Tree Ising models. CoRR abs/2010.14864 (2020). arxiv:2010.14864 https://arxiv.org/abs/2010.14864Google Scholar
- Luc Devroye, Abbas Mehrabian, and Tommy Reddad. 2019. The minimax learning rate of normal and Ising undirected graphical models. Electronic Journal of Statistics (2019).Google Scholar
- Linus Hamilton, Frederic Koehler, and Ankur Moitra. 2017. Information theoretic properties of Markov random fields, and their algorithmic applications. In Advances in Neural Information Processing Systems. 2463–2472.Google Scholar
- Ali Jalali, Pradeep Ravikumar, Vishvas Vasuki, and Sujay Sanghavi. 2011. On learning discrete graphical models using group-sparse regularization. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. 378–387.Google Scholar
- Adam Klivans and Raghu Meka. 2017. Learning graphical models using multiplicative weights. In Proceedings of the forty-ninth annual ACM Symposium on Theory Of Computing (STOC).Google ScholarCross Ref
- Frederic Koehler. 2020. A Note on TV Learning of Tree Models. Personal Communication, http://math.mit.edu/~fkoehler/tv_note.pdf.Google Scholar
- Steffen L Lauritzen. 1996. Graphical models. Vol. 17. Clarendon Press.Google Scholar
- Mukund Narasimhan and Jeff A. Bilmes. 2004. PAC-learning Bounded Tree-width Graphical Models. In the 20th Conference in Uncertainty in Artificial Intelligence (UAI).Google Scholar
- Judea Pearl. 2014. Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier.Google ScholarDigital Library
- Pradeep Ravikumar, Martin J Wainwright, and John D Lafferty. 2010. High-dimensional Ising model selection using $\ell_1$-regularized logistic regression. The Annals of Statistics 38, 3 (2010), 1287–1319.Google ScholarCross Ref
- Narayana P. Santhanam and Martin J. Wainwright. 2012. Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions. IEEE Trans. Information Theory 58, 7 (2012), 4117–4134.Google ScholarDigital Library
- Vincent Y. F. Tan, Animashree Anandkumar, Lang Tong, and Alan S. Willsky. 2011. A Large-Deviation Analysis of the Maximum-Likelihood Learning of Markov Tree Structures. IEEE Trans. Information Theory 57, 3 (2011), 1714–1735.Google ScholarDigital Library
- Marc Vuffray, Sidhant Misra, Andrey Lokhov, and Michael Chertkov. 2016. Interaction screening: Efficient and sample-optimal learning of Ising models. In Advances in Neural Information Processing Systems. 2595–2603.Google Scholar
- Marc Vuffray, Sidhant Misra, and Andrey Y. Lokhov. 2019. Efficient Learning of Discrete Graphical Models. arxiv:1902.00600 [cs.LG]Google Scholar
- Martin J Wainwright, Michael I Jordan, et al\mbox. 2008. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1, 1–2 (2008), 1–305.Google ScholarDigital Library
- Shanshan Wu, Sujay Sanghavi, and Alexandros G. Dimakis. 2019. Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models. In the 32nd Annual Conference on Neural Information Processing Systems.Google Scholar
Index Terms
- Sample-optimal and efficient learning of tree Ising models
Recommendations
Predictive learning on hidden tree-structured Ising models
We provide high-probability sample complexity guarantees for exact structure recovery and accurate predictive learning using noise-corrupted samples from an acyclic (tree-shaped) graphical model. The hidden variables follow a tree-structured Ising model ...
Optimal quantum sample complexity of learning algorithms
In learning theory, the VC dimension of a concept class ℒ is the most common way to measure its "richness." A fundamental result says that the number of examples needed to learn an unknown target concept c ∈ ℒ under an unknown distribution D, is tightly ...
Efficiently Learning Ising Models on Arbitrary Graphs
STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computinggraph underlying an Ising model from i.i.d. samples. Over the last fifteen years this problem has been of significant interest in the statistics, machine learning, and statistical physics communities, and much of the effort has been directed towards ...
Comments