Abstract
The learning of signal directions in high-dimensional data through orthogonal decomposition or principal component analysis (PCA) has many important applications in physics and engineering disciplines, e.g., wireless communication, information theory, and econophysics. The accuracy of the orthogonal decomposition can be studied using mean-field theory. Previous analysis of data produced from a model with a single signal direction has predicted a retarded learning phase transition below which learning is not possible, i.e., if the signal is too weak or the data set is too small then it is impossible to learn anything about the signal direction or magnitude. In this contribution we show that the result can be generalized to the case where there are multiple signal directions. Each nondegenerate signal is associated with a retarded learning transition. However, fluctuations around the mean-field solution lead to large finite size effects unless the signal strengths are very well separated. We evaluate the one-loop contribution to the mean-field theory, which shows that signal directions are indistinguishable from one another if their corresponding population eigenvalues are separated by with exponent , where is the data dimension. Numerical simulations are consistent with the analysis and show that finite size effects can persist even for very large data sets.
- Received 18 May 2006
DOI:https://doi.org/10.1103/PhysRevE.75.016101
©2007 American Physical Society