ABSTRACT
Support Vector Machines (SVMs) have been very successful in text classification. However, the intrinsic geometric structure of text data has been ignored by standard kernels commonly used in SVMs. It is natural to assume that the documents are on the multinomial manifold, which is the simplex of multinomial models furnished with the Riemannian structure induced by the Fisher information metric. We prove that the Negative Geodesic Distance (NGD) on the multinomial manifold is conditionally positive definite (cpd), thus can be used as a kernel in SVMs. Experiments show the NGD kernel on the multinomial manifold to be effective for text classification, significantly outperforming standard kernels on the ambient Euclidean space.
- Amari, S., Nagaoka, H. and Amari, S.-I. Methods of Information Geometry. American Mathematical Society, 2001.Google Scholar
- Baeza-Yates, R. and Ribeiro-Neto, B. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarDigital Library
- Bahlmann, C., Haasdonk, B. and Burkhardt, H. On-Line Handwriting Recognition with Support Vector Machines -- A Kernel Approach. in Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition (IWFHR), 2002, 49--54. Google ScholarDigital Library
- Berg, C., Christensen, J.P.R. and Ressel, P. Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Springer-Verlag, 1984.Google Scholar
- Chang, C.-C. and Lin, C.-J. LIBSVM: a Library for Support Vector Machines. 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google Scholar
- Chapelle, O., Haffner, P. and Vapnik, V.N. SVMs for Histogram Based Image Classification. IEEE Transactions on Neural Networks, 10 (5). 1055--1064. Google ScholarDigital Library
- Dabak, A.G. and Johnson, D.H. Relations between Kullback-Leibler distance and Fisher information Manscript, 2002.Google Scholar
- Dumais, S., Platt, J., Heckerman, D. and Sahami, M. Inductive Learning Algorithms and Representations for Text Categorization. in Proceedings of the 7th ACM International Conference on Information and Knowledge Management (CIKM), Bethesda, MD, 1998, 148--155. Google ScholarDigital Library
- Graepel, T., Herbrich, R., Bollmann-Sdorra, P. and Obermayer, K. Classification on Pairwise Proximity Data. in Advances in Neural Information Processing Systems (NIPS), Denver, CO, 1998, 438--444. Google ScholarDigital Library
- Haasdonk, B. and Bahlmann, C. Learning with Distance Substitution Kernels. in Proceedings of the 26th DAGM Symposium, Tubingen, Germany, 2004, 220--227.Google Scholar
- Haasdonk, B. and Keysers, D. Tangent Distance Kernels for Support Vector Machines. in Proceedings of the 16th International Conference on Pattern Recognition (ICPR), Quebec, Canada, 2002, 864--868.Google Scholar
- Hsu, C.-W. and Lin, C.-J. A Comparison of Methods for Multi-class Support Vector Machines. IEEE Transactions on Neural Networks, 13 (2). 415--425. Google ScholarDigital Library
- Jaakkola, T. and Haussler, D. Exploiting Generative Models in Discriminative Classifiers. in Advances in Neural Information Processing Systems (NIPS), Denver, CO, 1998, 487--493. Google ScholarDigital Library
- Jebara, T., Kondor, R. and Howard, A. Probability Product Kernels. Journal of Machine Learning Research, 5. 819--844. Google ScholarDigital Library
- Joachims, T. Learning to Classify Text using Support Vector Machines. Kluwer, 2002. Google ScholarDigital Library
- Joachims, T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. in Proceedings of the 10th European Conference on Machine Learning (ECML), Chemnitz, Germany, 1998, 137--142. Google ScholarDigital Library
- Joachims, T., Cristianini, N. and Shawe-Taylor, J. Composite Kernels for Hypertext Categorisation. in Proceedings of the 18th International Conference on Machine Learning (ICML), Williamstown, MA, 2001, 250--257. Google ScholarDigital Library
- Kass, R.E. The Geometry of Asymptotic Inference. Statistical Science, 4 (3). 188--234.Google ScholarCross Ref
- Kass, R.E. and Vos, P.W. Geometrical Foundations of Asymptotic Inference. Wiley, 1997.Google ScholarCross Ref
- Kullback, S. Information Theory and Statistics. Wiley, 1959.Google Scholar
- Lafferty, J.D. and Lebanon, G. Information Diffusion Kernels. in Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada, 2002, 375--382.Google Scholar
- Lang, K. NewsWeeder: Learning to Filter Netnews. in Proceedings of the 12th International Conference on Machine Learning (ICML), Tahoe City, CA, 1995, 331--339.Google Scholar
- Lebanon, G. Learning Riemannian Metrics. in Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI), Acapulco, Mexico, 2003, 362--369. Google ScholarDigital Library
- Lebanon, G. and Lafferty, J.D. Hyperplane Margin Classifiers on the Multinomial Manifold. in Proceedings of the 21st International Conference on Machine Learning (ICML), Alberta, Canada, 2004. Google ScholarDigital Library
- McCallum, A.K. Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering. 1996. http://www.cs.cmu.edu/~mccallum/bow.Google Scholar
- Moreno, P.J., Ho, P. and Vasconcelos, N. A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications. in Advances in Neural Information Processing Systems (NIPS), Vancouver and Whistler, Canada, 2003.Google Scholar
- Pekalska, E., Paclik, P. and Duin, R.P.W. A Generalized Kernel Approach to Dissimilarity-based Classification. Journal of Machine Learning Research, 2. 175--211. Google ScholarDigital Library
- Roweis, S.T. and Saul, L.K. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290 (5500). 2323--2326.Google ScholarCross Ref
- Schoenberg, I.J. Metric Spaces and Positive Definite Functions. Transactions of the American Mathematical Society, 44. 522--536.Google Scholar
- Schoenberg, I.J. Positive Definite Functions on Spheres. Duke Mathematical Journal, 9 (1). 96--08.Google ScholarCross Ref
- Scholkopf, B. The Kernel Trick for Distances. in Advances in Neural Information Processing Systems (NIPS), Denver, CO, 2000, 301--307.Google Scholar
- Scholkopf, B. and Smola, A.J. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google Scholar
- Seung, H.S. and Lee, D.D. The Manifold Ways of Perception. Science, 290 (5500). 2268--2269.Google Scholar
- Shawe-Taylor, J. and Cristianini, N. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. Google ScholarDigital Library
- Tenenbaum, J.B., Silva, V.d. and Langford, J.C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290 (5500). 2319--2323.Google Scholar
- Yang, Y. and Liu, X. A Re-examination of Text Categorization Methods. in Proceedings of the 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR), Berkeley, CA, 1999, 42--49. Google ScholarDigital Library
Index Terms
- Text classification with kernels on the multinomial manifold
Recommendations
Question classification using support vector machines
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrievalQuestion classification is very important for question answering. This paper presents our research work on automatic question classification through machine learning approaches. We have experimented with five machine learning algorithms: Nearest ...
Probabilistic classification vector machines
In this paper, a sparse learning algorithm, probabilistic classification vector machines (PCVMs), is proposed. We analyze relevance vector machines (RVMs) for classification problems and observe that adopting the same prior for different classes may ...
Nonlinear Dimensionality Reduction by Topologically Constrained Isometric Embedding
Many manifold learning procedures try to embed a given feature data into a flat space of low dimensionality while preserving as much as possible the metric in the natural feature space. The embedding process usually relies on distances between ...
Comments