• Open Access

Statistical Mechanics of Optimal Convex Inference in High Dimensions

Madhu Advani and Surya Ganguli
Phys. Rev. X 6, 031034 – Published 29 August 2016
PDFHTMLExport Citation

Abstract

A fundamental problem in modern high-dimensional data analysis involves efficiently inferring a set of P unknown model parameters governing the relationship between the inputs and outputs of N noisy measurements. Various methods have been proposed to regress the outputs against the inputs to recover the P parameters. What are fundamental limits on the accuracy of regression, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we optimally combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density α=(N/P). However, these classical results are not relevant to modern high-dimensional inference problems, which instead occur at finite α. We employ replica theory to answer these questions for a class of inference algorithms, known in the statistics literature as M-estimators. These algorithms attempt to recover the P model parameters by solving an optimization problem involving minimizing the sum of a loss function that penalizes deviations between the data and model predictions, and a regularizer that leverages prior information about model parameters. Widely cherished algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference arise as special cases of M-estimators. Our analysis uncovers fundamental limits on the inference accuracy of a subclass of M-estimators corresponding to computationally tractable convex optimization problems. These limits generalize classical statistical theorems like the Cramer-Rao bound to the high-dimensional setting with prior information. We further discover the optimal M-estimator for log-concave signal and noise distributions; we demonstrate that it can achieve our high-dimensional limits on inference accuracy, while ML and MAP cannot. Intriguingly, in high dimensions, these optimal algorithms become computationally simpler than ML and MAP while still outperforming them. For example, such optimal M-estimation algorithms can lead to as much as a 20% reduction in the amount of data to achieve the same performance relative to MAP. Moreover, we demonstrate a prediction of replica theory that no inference procedure whatsoever can outperform our optimal M-estimation procedure when signal and noise distributions are log-concave, by uncovering an equivalence between optimal M-estimation and optimal Bayesian inference in this setting. Our analysis also reveals insights into the nature of generalization and predictive power in high dimensions, information theoretic limits on compressed sensing, phase transitions in quadratic inference, and connections to central mathematical objects in convex optimization theory and random matrix theory.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Received 22 February 2016

DOI:https://doi.org/10.1103/PhysRevX.6.031034

This article is available under the terms of the Creative Commons Attribution 3.0 License. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.

Published by the American Physical Society

Physics Subject Headings (PhySH)

Interdisciplinary PhysicsStatistical Physics & ThermodynamicsPhysics of Living Systems

Authors & Affiliations

Madhu Advani* and Surya Ganguli

  • Department of Applied Physics, Stanford University, Stanford, California 94305, USA

  • *msadvani@stanford.edu
  • sganguli@stanford.edu

Popular Summary

Remarkable new measurement technologies have created a deluge of “big data” that promises to revolutionize our understanding in fields ranging from neuroscience and systems biology to social science and history. However, achieving accurate statistical inferences from big data remains a tremendous challenge. Classical statistical theory provides firm theoretical guidance for the analysis of small (i.e., low-dimensional) data, revealing fundamental limits on the accuracy of statistical inference and the algorithms that achieve such limits. However, classical theory cannot reveal these fundamental limits or optimal algorithms for big (i.e., high-dimensional) data. Here, we reformulate high-dimensional statistical inference in the framework of the statistical physics of quenched disorder to address these fundamental issues for big data. We are accordingly able to obtain powerful generalizations of time-honored classical statistical theorems dating back to the 1940s.

Using theoretical analyses and simulations focusing on regression under diverse signal and noise distributions, we show that widely cherished inference algorithms such as maximum likelihood and maximum a posteriori inference are suboptimal for big data, and we determine the optimal algorithms that can replace these methodologies. Intriguingly, these optimal algorithms can be computationally simpler than maximum a posteriori and maximum likelihood while still outperforming them, sometimes using 20% less data to achieve the same performance. Moreover, our statistical physics theory of high-dimensional inference unifies disparate fields, including information theory, learning theory, Bayesian estimation, compressed sensing, convex optimization theory, and random matrix theory.

We anticipate that our unified physical framework will provide firm theoretical guidance and algorithmic advances in the modern age of big data, just as classical statistics guided us in the age of small data.

Key Image

Article Text

Click to Expand

Supplemental Material

Click to Expand

References

Click to Expand
Issue

Vol. 6, Iss. 3 — July - September 2016

Subject Areas
Reuse & Permissions
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review X

Reuse & Permissions

It is not necessary to obtain permission to reuse this article or its components as it is available under the terms of the Creative Commons Attribution 3.0 License. This license permits unrestricted use, distribution, and reproduction in any medium, provided attribution to the author(s) and the published article's title, journal citation, and DOI are maintained. Please note that some figures may have been included with permission from other third parties. It is your responsibility to obtain the proper permission from the rights holder directly for these figures.

×

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×