Abstract
We consider the problem of approximating a given m × n matrix A by another matrix of specified rank k, which is smaller than m and n. The Singular Value Decomposition (SVD) can be used to find the "best" such approximation. However, it takes time polynomial in m, n which is prohibitive for some modern applications. In this article, we develop an algorithm that is qualitatively faster, provided we may sample the entries of the matrix in accordance with a natural probability distribution. In many applications, such sampling can be done efficiently. Our main result is a randomized algorithm to find the description of a matrix D* of rank at most k so that holds with probability at least 1 − δ (where |·|F is the Frobenius norm). The algorithm takes time polynomial in k,1/ϵ, log(1/δ) only and is independent of m and n. In particular, this implies that in constant time, it can be determined if a given matrix of arbitrary size has a good low-rank approximation.
- Achlioptas, D., and McSherry, F. 2001. Fast computation of low rank approximations. In Proceedings of the 33rd Annual Symposium on Theory of Computing. ACM, New York, pp. 611--618. Google Scholar
- Alon, N., Duke, R. A., Lefmann, H., Rödl, V., and Yuster, R. 1994. The algorithmic aspects of the regularity lemma," J. Algorithms 16, 80--109. Google Scholar
- Bar-Yossef, Z. 2003. Sampling lower bounds via information theory. In Proceedings of the 35th Annual Symposium on Theory of Computing. ACM, New York, pp. 335--344. Google Scholar
- Berry, M. W., Dumais, S. T., and O'Brien, G. W. 1995. Using linear algebra for intelligent information retrieval. SIAM Rev. 37, 4, 573--595. Google Scholar
- Deerwester, S., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A. 1990. Indexing by latent semantic analysis. J. Soc. Inf. Sci. 41, 6, 391--407.Google Scholar
- Drineas, P., and Kannan, R. 2001. Fast Monte-Carlo algorithms for approximate matrix multiplication. In Proceedings of the 42nd IEEE Annual Symposium on the Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 452--459. Google Scholar
- Drineas, P., and Kannan, R. 2003. Pass efficient algorithms for approximating large matrices. In Proceedings of the Symposium on Discrete Algortihms. ACM, New York, pp. 223--232. Google Scholar
- Drineas, P., Frieze, A., Kannan, R., Vempala, S., and Vinay, V. 2004a. Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 9--33. Google Scholar
- Drineas, P., Mahoney, M. W., and Kannan, R. 2004b. Fast monte carlo algorithms for matrices II: Computing low-rank approximations to a matrix. Tech. Rep. Yale University, YALEU/DCS/TR-1270, 2004.Google Scholar
- Dumais, S. T. 1991. Improving the retrieval of information from external sources. Behav. Res. Meth. Instrum. Comput. 23, 2, 229--236.Google Scholar
- Dumais, S. T., Furnas, G. W., Landauer, T. K., and Deerwester, S. 1988. Using latent semantic analysis to improve information retrieval. In Proceedings of CHI'88: Conference on Human Factors in Computing. ACM, New York, 281--285. Google Scholar
- Frieze, A. M., and Kannan, R. 1996. The regularity lemma and approximation schemes for dense problems. In Proceedings of the 37th Annual IEEE Symposium on Foundations of Computing. IEEE Computer Society Press, Los Alamitos, Calif., pp. 12--20. Google Scholar
- Frieze, A. M., and Kannan, R. 1999a. Quick approximations to matrices and applications. Combinatorica 19, 175--220.Google Scholar
- Frieze, A. M., and Kannan, R. 1999b. A simple algorithm for constructing Szemeredi's Regularity Partition. Elect. J. Combinat. 6, 1, R17.Google Scholar
- Frieze, A., Kannan, R., and Vempala, S. 1998. Fast Monte-Carlo algorithms for finding low-rank approximations. In Proceedings of 39th Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 370--378. Google Scholar
- Golub, G. H., and Van Loan, C. F. 1989. Matrix Computations, Johns Hopkins University Press, London, England.Google Scholar
- Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. JACM 46, 5, 604--632. Google Scholar
- Komlós, J., and Simonovits, M. 1996. Szemerédi's Regularity Lemma and its applications in graph theory. Combinatorics, Paul Erdos is Eighty, Bolyai Society Mathematical Studies, D. Miklos et al. Eds. 2, 295--352.Google Scholar
- Papadimitriou, C., Raghavan, P., Tamaki, H., and Vempala, S. 2000. Latent Semantic Indexing: A Probabilistic Analysis. J. Comput. Syst. Sci. 61, 217--235. Google Scholar
- Szemeredi, E. 1978. Regular partitions of graphs. Proceedings, Colloque Inter. CNRS, J.-C. Bermond, J.-C. Fournier, M. Las Vergnas and D. Sotteau, Eds. 399--401.Google Scholar
Index Terms
- Fast monte-carlo algorithms for finding low-rank approximations
Recommendations
Randomized Projection for Rank-Revealing Matrix Factorizations and Low-Rank Approximations
Rank-revealing matrix decompositions provide an essential tool in spectral analysis of matrices, including the Singular Value Decomposition (SVD) and related low-rank approximation techniques. QR with Column Pivoting (QRCP) is usually suitable for these ...
Low rank approximation and regression in input sparsity time
STOC '13: Proceedings of the forty-fifth annual ACM symposium on Theory of ComputingWe design a new distribution over poly(r ε-1) x n matrices S so that for any fixed n x d matrix A of rank r, with probability at least 9/10, SAx2 = (1 pm ε)Ax2 simultaneously for all x ∈ Rd. Such a matrix S is called a subspace embedding. Furthermore, ...
Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix
In many applications, the data consist of (or may be naturally formulated as) an $m \times n$ matrix $A$. It is often of interest to find a low-rank approximation to $A$, i.e., an approximation $D$ to the matrix $A$ of rank not greater than a specified ...
Comments