• Open Access

Feynman-Hellmann Theorem and Signal Identification from Sample Covariance Matrices

Lucy J. Colwell, Yu Qin, Miriam Huntley, Alexander Manta, and Michael P. Brenner
Phys. Rev. X 4, 031032 – Published 27 August 2014

Abstract

A common method for extracting true correlations from large data sets is to look for variables with unusually large coefficients on those principal components with the biggest eigenvalues. Here, we show that even if the top principal components have no unusually large coefficients, large coefficients on lower principal components can still correspond to a valid signal. This contradicts the typical mathematical justification for principal component analysis, which requires that eigenvalue distributions from relevant random matrix ensembles have compact support, so that any eigenvalue above the upper threshold corresponds to signal. The new possibility arises via a mechanism based on a variant of the Feynman-Hellmann theorem, and leads to significant correlations between a signal and principal components when the underlying noise is not both independent and uncorrelated, so the eigenvalue spacing of the noise distribution can be sufficiently large. This mechanism justifies a new way of using principal component analysis and rationalizes recent empirical findings that lower principal components can have information about the signal, even if the largest ones do not.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Received 17 September 2013

DOI:https://doi.org/10.1103/PhysRevX.4.031032

This article is available under the terms of the Creative Commons Attribution 3.0 License. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.

Published by the American Physical Society

Authors & Affiliations

Lucy J. Colwell1, Yu Qin1, Miriam Huntley1, Alexander Manta2, and Michael P. Brenner1

  • 1School of Engineering and Applied Sciences and Kavli Institute for Bionano Science and Technology, Harvard University, Cambridge, Massachusetts 02138, USA
  • 2Roche Diagnostics GmbH, Penzberg 82377, Germany

Popular Summary

Technological advances have made it possible to measure an ever-increasing number of variables during an experiment. Determining correlations between different variables yields insights into the system, potentially leading to predictive theoretical models. For example, the multidimensional data sets generated by the Cancer Genome Atlas allow for the detection of correlated genetic perturbations that result in cancer phenotypes.

However, when a data set measures many more quantities (i.e., the expression levels of different genes in a genome) than the number of measurements that are made, there is a chance that a measured correlation could be spurious. It is therefore necessary to develop a rigorous procedure for determining when an observed correlation is spurious and when it is statistically reliable. A common technique is to look for variables with unusually large coefficients on principal components with the biggest eigenvalues. We show that even if the top principal components have no unusually large coefficients, large coefficients on lower principal components can still correspond to a valid signal. The mechanism that allows for this methodology is based on a variant of the Feynman-Hellmann theorem, developed for level splittings in quantum mechanics.

Our findings suggest that information about the structure of true covariance between variables can be recovered by examining the component distributions of different eigenvectors (not necessarily those with the largest eigenvalues, as is the case in principal component analysis).

Key Image

Article Text

Click to Expand

References

Click to Expand
Issue

Vol. 4, Iss. 3 — July - September 2014

Subject Areas
Reuse & Permissions
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review X

Reuse & Permissions

It is not necessary to obtain permission to reuse this article or its components as it is available under the terms of the Creative Commons Attribution 3.0 License. This license permits unrestricted use, distribution, and reproduction in any medium, provided attribution to the author(s) and the published article's title, journal citation, and DOI are maintained. Please note that some figures may have been included with permission from other third parties. It is your responsibility to obtain the proper permission from the rights holder directly for these figures.

×

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×