ABSTRACT
Approximation of non-linear kernels using random feature mapping has been successfully employed in large-scale data analysis applications, accelerating the training of kernel machines. While previous random feature mappings run in O(ndD) time for $n$ training samples in d-dimensional space and D random feature maps, we propose a novel randomized tensor product technique, called Tensor Sketching, for approximating any polynomial kernel in O(n(d+D \log{D})) time. Also, we introduce both absolute and relative error bounds for our approximation to guarantee the reliability of our estimation algorithm. Empirically, Tensor Sketching achieves higher accuracy and often runs orders of magnitude faster than the state-of-the-art approach for large-scale real-world datasets.
- C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Google ScholarDigital Library
- M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In Proceedings of ICALP'02, pages 693--703, 2002. Google ScholarDigital Library
- R. Chitta, R. Jin, T. C. Havens, and A. K. Jain. Approximate kernel k-means: solution to large scale kernel clustering. In Proceedings of KDD'11, pages 895--903, 2011. Google ScholarDigital Library
- R. Chitta, R. Jin, and A. K. Jain. Efficient kernel clustering using random fourier features. In Proceedings of ICDM'12, pages 161--170, 2012. Google ScholarDigital Library
- P. Drineas and M. W. Mahoney. On the Nyström method for approximating a gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 6:2153--2175, 2005. Google ScholarDigital Library
- R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008. Google ScholarDigital Library
- S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2:243--264, 2001. Google ScholarDigital Library
- A. Frank and A. Asuncion. UCI machine learning repository, 2010.Google Scholar
- T. Joachims. Training linear SVMs in linear time. In Proceedings of KDD'06, pages 217--226, 2006. Google ScholarDigital Library
- P. Kar and H. Karnick. Random feature maps for dot product kernels. In Proceedings of AISTATS'12, pages 583--591, 2012.Google Scholar
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86:2278--2324, 1998.Google ScholarCross Ref
- S. Maji and A. C. Berg. Max-margin additive classifiers for detection. In Proceedings of ICCV'09, pages 40--47, 2009.Google ScholarCross Ref
- E. Osuna, R. Freund, and F. Girosi. An improved training algorithm for support vector machines. In Proceedings of NNSP'97, pages 276--285, 1997.Google ScholarCross Ref
- R. Pagh. Compressed matrix multiplication. In Proceedings of ICTS'12, pages 442--451, 2012. Google ScholarDigital Library
- M. Patraşcu and M. Thorup. The power of simple tabulation hashing. In Proceedings of STOC'11, pages 1--10, 2011. Google ScholarDigital Library
- A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Advances in NIPS'08, pages 1177--1184, 2007.Google Scholar
- B. Schökopf and A. J. Smola. Learning with kernels: Support vector machines, regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001. Google ScholarDigital Library
- S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of ICML'07, pages 807--814, 2007. Google ScholarDigital Library
- A. J. Smola and B. Schökopf. Sparse greedy matrix approximation for machine learning. In Proceedings of ICML'00, pages 911--918, 2000. Google ScholarDigital Library
- A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. In Proceedings of CVPR'10, pages 3539--3546, 2010.Google ScholarCross Ref
- S. Vempati, A. Vedaldi, A. Zisserman, and C. V. Jawahar. Generalized RBF feature maps for efficient detection. In Proceedings of BMVC'10, pages 1--11, 2010.Google ScholarCross Ref
- K. Q. Weinberger, A. Dasgupta, J. Langford, A. J. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proceedings of ICML'09, pages 1113--1120, 2009. Google ScholarDigital Library
- C. K. I. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. In Advances in NIPS'01, pages 682--688, 2001.Google Scholar
- T. Yang, Y.-F. Li, M. Mahdavi, R. Jin, and Z.-H. Zhou. Nyström method vs random fourier features: A theoretical and empirical comparison". In Advances in NIPS'12, pages 485--493, 2012.Google Scholar
Index Terms
- Fast and scalable polynomial kernels via explicit feature maps
Recommendations
Efficient Additive Kernels via Explicit Feature Maps
Large scale nonlinear support vector machines (SVMs) can be approximated by linear ones using a suitable feature map. The linear SVMs are in general much faster to learn and evaluate (test) than the original nonlinear SVMs. This work introduces explicit ...
Polynomial summaries of positive semidefinite kernels
Polynomials have proven to be useful tools to tailor generic kernels to specific applications. Nevertheless, we had only restricted knowledge for selecting fertile polynomials which consistently produce positive semidefinite kernels. For example, the ...
Analysis of legendre polynomial kernel in support vector machines
For several types of machines learning problems, the support vector machine is a method of choice. The kernel functions are a basic ingredient in support vector machine theory. Kernels based on the concepts of orthogonal polynomials gave the great ...
Comments