Skip to main content
Log in

On the computational tractability of statistical estimation on amenable graphs

  • Published:
Probability Theory and Related Fields Aims and scope Submit manuscript

Abstract

We consider the problem of estimating a vector of discrete variables \(\theta = (\theta _1,\cdots ,\theta _n)\), based on noisy observations \(Y_{uv}\) of the pairs \((\theta _u,\theta _v)\) on the edges of a graph \(G=([n],E)\). This setting comprises a broad family of statistical estimation problems, including group synchronization on graphs, community detection, and low-rank matrix estimation. A large body of theoretical work has established sharp thresholds for weak and exact recovery, and sharp characterizations of the optimal reconstruction accuracy in such models, focusing however on the special case of Erdös–Rényi-type random graphs. An important finding of this line of work is the ubiquity of an information-computation gap. Namely, for many models of interest, a large gap is found between the optimal accuracy achievable by any statistical method, and the optimal accuracy achieved by known polynomial-time algorithms. Moreover, it is expected in many situations that this gap is robust to small amounts of additional side information revealed about the \(\theta _i\)’s. How does the structure of the graph G affect this picture? Is the information-computation gap a general phenomenon or does it only apply to specific families of graphs? We prove that the picture is dramatically different for graph sequences converging to amenable graphs (including, for instance, d-dimensional grids). We consider a model in which an arbitrarily small fraction of the vertex labels is revealed, and show that a linear-time local algorithm can achieve reconstruction accuracy that is arbitrarily close to the information-theoretic optimum. We contrast this to the case of random graphs. Indeed, focusing on group synchronization on random regular graphs, we prove that local algorithms are unable to have non-trivial performance below the so-called Kesten–Stigum threshold, even when a small amount of side information is revealed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The careful reader will notice that this statement does not apply to the planted clique problem. If the label of \(\varepsilon n\) random vertices is revealed (i.e., whether or not they belong to the clique), then it is easy to find planted cliques of size \(k\gg (1/\varepsilon )\log n\), i.e., far below the best known polynomial algorithms for \(\varepsilon =0\). This behavior is however related to the fact that, in the planted clique problem, the underlying graph is the complete graph. Hence a small amount of vertex side information can have large impact even on local algorithms.

  2. In the physics literature, see e.g. [21], the BP accuracy is not formally defined in this way, but rather by considering an initialization that is slightly biased towards the true parameters vector. The two definitions are expected to coincide generically.

  3. For grids in \(d=2\) dimension, the situation is expected to be similar, although the proof is more complicated. Indeed, [4] proves that a threshold exists in the case \(q=2\), and indeed the same is expected to hold for \(q\ge 3\) as well. For \(d=1\) no non-trivial threshold exists and weak recovery is always impossible.

  4. Unlike for the model described in the introduction, the variables \(\theta _u\)’s typically take any value in \({\mathbb R}\), and their distribution is non-uniform. However, it is easy to reduce from one case to the other. For instance, we can let \(Y_{uv} = \mathcal {N}(h(\theta _u)h(\theta _v),\sigma ^2_n)\). We can choose the nonlinear function \(h:{\mathbb R}\rightarrow {\mathbb R}\) so that \(h(\theta _v)\sim P_0\) when \(\theta _v\sim \mathsf{Unif}([0,1])\).

References

  1. Abbe, E.: Community detection and stochastic block models: recent developments. J. Mach. Learn. Res. 18(1), 6446–6531 (2017)

    MathSciNet  Google Scholar 

  2. Abbe, E., Boix, E.: An information-percolation bound for spin synchronization on general graphs. arXiv:1806.03227 (2018)

  3. Abbe, E., Boix, E., Ralli, P., Sandon, C.: Graph powering and spectral robustness. arXiv:1809.04818 (2018)

  4. Abbe, E., Massoulie, L., Montanari, A., Sly, A., Srivastava, N.: Group synchronization on grids. arXiv:1706.08561 (2017)

  5. Abbe, E., Sandon, C.: Proof of the achievability conjectures for the general stochastic block model. Commun. Pure Appl. Math. 71(7), 1334–1406 (2018)

    Article  MathSciNet  Google Scholar 

  6. Achlioptas, D., Naor, A.: The two possible values of the chromatic number of a random graph. In: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of computing, pp. 587–593. ACM (2004)

  7. Aldous, D., Lyons, R.: Processes on unimodular random networks. Electron. J. Probab. 12(54), 1454–1508 (2007)

    MathSciNet  MATH  Google Scholar 

  8. Alon, N., Krivelevich, M., Vu, V.H.: On the concentration of eigenvalues of random symmetric matrices. Israel J. Math. 131(1), 259–267 (2002)

    Article  MathSciNet  Google Scholar 

  9. Amini, A.A., Wainwright, M.J.: High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Stat. 37(5B), 2877–2921 (2009)

    Article  MathSciNet  Google Scholar 

  10. Bandeira, A.S., Banks, J., Kunisky, D., Moore, C., Wein, A.S.: Spectral planting and the hardness of refuting cuts, colorability, and communities in random graphs. arXiv preprint arXiv:2008.12237 (2020)

  11. Banks, J., Kleinberg, R., Moore, C.: The lov\(\backslash \)’asz theta function for random regular graphs and community detection in the hard regime. arXiv preprint arXiv:1705.01194 (2017)

  12. Banks, J., Moore, C., Neeman, J., Netrapalli, P.: Information-theoretic thresholds for community detection in sparse networks. In: Conference on Learning Theory, pp. 383–416. PMLR (2016)

  13. Barak, B., Hopkins, S.B., Kelner, J., Kothari, P., Moitra, A., Potechin, A.: A nearly tight sum-of-squares lower bound for the planted clique problem. In: 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 428–437. IEEE (2016)

  14. Barbier, J., Krzakala, F., Macris, N., Miolane, L., Zdeborová, L.: Optimal errors and phase transitions in high-dimensional generalized linear models. Proc. Natl. Acad. Sci. 116(12), 5451–5460 (2019)

    Article  MathSciNet  Google Scholar 

  15. Benjamini, I., Schramm, O.: Recurrence of distributional limits of finite planar graphs. Electron. J. Probab. 6, 1–13 (2001)

    Article  MathSciNet  Google Scholar 

  16. Berthet, Q., Rigollet, P.: Complexity theoretic lower bounds for sparse principal component detection. In: Conference on Learning Theory, pp. 1046–1066 (2013)

  17. Bollobás, B.: A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. Eur. J. Comb. 1(4), 311–316 (1980)

    Article  MathSciNet  Google Scholar 

  18. Celentano, M., Montanari, A.: Fundamental barriers to high-dimensional regression with convex penalties. arXiv:1903.10603 (2019)

  19. Coja-Oghlan, A., Krzakala, F., Perkins, W., Zdeborová, L.: Information-theoretic thresholds from the cavity method. Adv. Math. 333, 694–795 (2018)

    Article  MathSciNet  Google Scholar 

  20. Csiszár, I., Shields, P.C.: Information theory and statistics: A tutorial (2004)

  21. Decelle, A., Krzakala, F., Moore, C., Zdeborová, L.: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84(6), 066106 (2011)

    Article  Google Scholar 

  22. Dembo, A., Montanari, A., Sun, N.: Factor models on locally tree-like graphs. Ann. Probab. 41(6), 4162–4213 (2013)

    Article  MathSciNet  Google Scholar 

  23. Deshpande, Y., Abbe, E., Montanari, A.: Asymptotic mutual information for the balanced binary stochastic block model. Inform. Inference: J. IMA 6(2), 125–170 (2017)

    MathSciNet  MATH  Google Scholar 

  24. Deshpande, Y., Montanari, A.: Finding hidden cliques of size \(sqrt{N/e}\) in nearly linear time. Foundations of Computational Mathematics , 1–60 (2013)

  25. Evans, W., Kenyon, C., Peres, Y., Schulman, L.J.: Broadcasting on trees and the ising model. Ann. Appl. Probab. 10(2), 410–433 (2000)

    Article  MathSciNet  Google Scholar 

  26. Fan, Z., Montanari, A.: How well do local algorithms solve semidefinite programs? In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 604–614. ACM (2017)

  27. Feige, U., Krauthgamer, R.: Finding and certifying a large hidden clique in a semirandom graph. Random Struct. Algor. 16(2), 195–208 (2000)

    Article  MathSciNet  Google Scholar 

  28. Gallager, R.G.: Information Theory and Reliable Communication, vol. 2. Springer, New York (1968)

    MATH  Google Scholar 

  29. Hopkins, S.B., Kothari, P.K., Potechin, A., Raghavendra, P., Schramm, T., Steurer, D.: The power of sum-of-squares for detecting hidden structures. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pp. 720–731. IEEE (2017)

  30. Hopkins, S.B., Schramm, T., Shi, J., Steurer, D.: Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors. In: Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, pp. 178–191. ACM (2016)

  31. Hopkins, S.B., Shi, J., Steurer, D.: Tensor principal component analysis via sum-of-square proofs. In: Conference on Learning Theory, pp. 956–1006 (2015)

  32. Hopkins, S.B., Steurer, D.: Efficient bayesian estimation from few samples: community detection and related problems. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pp. 379–390. IEEE (2017)

  33. Janson, S., Mossel, E.: Robust reconstruction on trees is determined by the second eigenvalue. Ann. Probab. 32(3B), 2630–2649 (2004)

    Article  MathSciNet  Google Scholar 

  34. Jerrum, M.: Large cliques elude the metropolis process. Random Struct. Algor. 3(4), 347–359 (1992)

    Article  MathSciNet  Google Scholar 

  35. Johnstone, I.: High Dimensional Statistical Inference and Random Matrices. In: Proceedings of International Congress of Mathematicians, Madrid (2006)

  36. Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), (2009)

  37. Kunisky, D., Wein, A.S., Bandeira, A.S.: Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio. arXiv preprint arXiv:1907.11636 (2019)

  38. Lelarge, M., Miolane, L.: Fundamental limits of symmetric low-rank matrix estimation. Probab. Theory Relat. Fields, 1–71 (2017)

  39. Lyons, R., Peres, Y.: Probability on Trees and Networks, vol. 42. Cambridge University Press, Cambridge (2017)

    MATH  Google Scholar 

  40. Ma, T., Wigderson, A.: Sum-of-squares lower bounds for sparse pca. In: Advances in Neural Information Processing Systems, pp. 1612–1620 (2015)

  41. Massoulié, L.: Community detection thresholds and the weak ramanujan property. In: Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, pp. 694–703. ACM (2014)

  42. Mézard, M., Montanari, A.: Reconstruction on trees and spin glass transition. J. Stat. Phys. 124(6), 1317–1350 (2006)

    Article  MathSciNet  Google Scholar 

  43. Mézard, M., Montanari, A.: Information, Physics and Computation. Oxford University Press, Oxford (2009)

    Book  Google Scholar 

  44. Montanari, A.: Estimating random variables from random sparse observations. Eur. Trans. Telecom. 19, 385–403 (2008)

    Article  Google Scholar 

  45. Montanari, A., Richard, E.: A statistical model for tensor pca. In: Advances in Neural Information Processing Systems, pp. 2897–2905 (2014)

  46. Mossel, E., Neeman, J., Sly, A.: A proof of the block model threshold conjecture. Combinatorica 38(3), 665–708 (2018)

    Article  MathSciNet  Google Scholar 

  47. Mossel, E., Peres, Y.: Information flow on trees. Ann. Appl. Probab. 13(3), 817–844 (2003)

    Article  MathSciNet  Google Scholar 

  48. Newman, C., Schulman, L.: Number and density of percolating clusters. J. Phys. A: Math. Gen. 14(7), 1735 (1981)

    Article  MathSciNet  Google Scholar 

  49. Penrose, M., et al.: Random Geometric Graphs, vol. 5. Oxford University Press, Oxford (2003)

    Book  Google Scholar 

  50. Polyanskiy, Y., Wu, Y.: Application of information-percolation method to reconstruction problems on graphs. arXiv:1806.04195 (2018)

  51. Sankararaman, A., Baccelli, F.: Community detection on euclidean random graphs. In: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 2181–2200. SIAM (2018)

  52. Sly, A.: Reconstruction for the potts model. Ann. Probab. 39(4), 1365–1406 (2011)

    Article  MathSciNet  Google Scholar 

  53. Wormald, N.C.: Models of random regular graphs. London Mathematical Society Lecture Note Series, pp. 239–298 (1999)

Download references

Acknowledgements

This work was supported by NSF grants CCF-1714305, IIS-1741162 and by the ONR grant N00014-18-1-2729.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed El Alaoui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was partially supported by grants NSF Grants CCF-1714305, IIS-1741162 and by the ONR Grant N00014-18-1-2729.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

El Alaoui, A., Montanari, A. On the computational tractability of statistical estimation on amenable graphs. Probab. Theory Relat. Fields 181, 815–864 (2021). https://doi.org/10.1007/s00440-021-01092-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-021-01092-y

Keywords

Mathematics Subject Classification

Navigation