Article

Text classification with kernels on the multinomial manifold

Authors:
Dell Zhang

National University of Singapore, Singapore, Singapore-MIT Alliance, Singapore

National University of Singapore, Singapore, Singapore-MIT Alliance, Singapore
View Profile

,
Xi Chen

University of Alberta, Edmonton, AB, Canada

University of Alberta, Edmonton, AB, Canada
View Profile

,
Wee Sun Lee

National University of Singapore, Singapore, Singapore-MIT Alliance, Singapore

National University of Singapore, Singapore, Singapore-MIT Alliance, Singapore
View Profile

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrievalAugust 2005Pages 266–273https://doi.org/10.1145/1076034.1076081

Published:15 August 2005Publication History

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 266–273

ABSTRACT

Support Vector Machines (SVMs) have been very successful in text classification. However, the intrinsic geometric structure of text data has been ignored by standard kernels commonly used in SVMs. It is natural to assume that the documents are on the multinomial manifold, which is the simplex of multinomial models furnished with the Riemannian structure induced by the Fisher information metric. We prove that the Negative Geodesic Distance (NGD) on the multinomial manifold is conditionally positive definite (cpd), thus can be used as a kernel in SVMs. Experiments show the NGD kernel on the multinomial manifold to be effective for text classification, significantly outperforming standard kernels on the ambient Euclidean space.

References

Amari, S., Nagaoka, H. and Amari, S.-I. Methods of Information Geometry. American Mathematical Society, 2001.Google Scholar
Baeza-Yates, R. and Ribeiro-Neto, B. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarDigital Library
Bahlmann, C., Haasdonk, B. and Burkhardt, H. On-Line Handwriting Recognition with Support Vector Machines -- A Kernel Approach. in Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition (IWFHR), 2002, 49--54. Google ScholarDigital Library
Berg, C., Christensen, J.P.R. and Ressel, P. Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Springer-Verlag, 1984.Google Scholar
Chang, C.-C. and Lin, C.-J. LIBSVM: a Library for Support Vector Machines. 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google Scholar
Chapelle, O., Haffner, P. and Vapnik, V.N. SVMs for Histogram Based Image Classification. IEEE Transactions on Neural Networks, 10 (5). 1055--1064. Google ScholarDigital Library
Dabak, A.G. and Johnson, D.H. Relations between Kullback-Leibler distance and Fisher information Manscript, 2002.Google Scholar
Dumais, S., Platt, J., Heckerman, D. and Sahami, M. Inductive Learning Algorithms and Representations for Text Categorization. in Proceedings of the 7th ACM International Conference on Information and Knowledge Management (CIKM), Bethesda, MD, 1998, 148--155. Google ScholarDigital Library
Graepel, T., Herbrich, R., Bollmann-Sdorra, P. and Obermayer, K. Classification on Pairwise Proximity Data. in Advances in Neural Information Processing Systems (NIPS), Denver, CO, 1998, 438--444. Google ScholarDigital Library
Haasdonk, B. and Bahlmann, C. Learning with Distance Substitution Kernels. in Proceedings of the 26th DAGM Symposium, Tubingen, Germany, 2004, 220--227.Google Scholar
Haasdonk, B. and Keysers, D. Tangent Distance Kernels for Support Vector Machines. in Proceedings of the 16th International Conference on Pattern Recognition (ICPR), Quebec, Canada, 2002, 864--868.Google Scholar
Hsu, C.-W. and Lin, C.-J. A Comparison of Methods for Multi-class Support Vector Machines. IEEE Transactions on Neural Networks, 13 (2). 415--425. Google ScholarDigital Library
Jaakkola, T. and Haussler, D. Exploiting Generative Models in Discriminative Classifiers. in Advances in Neural Information Processing Systems (NIPS), Denver, CO, 1998, 487--493. Google ScholarDigital Library
Jebara, T., Kondor, R. and Howard, A. Probability Product Kernels. Journal of Machine Learning Research, 5. 819--844. Google ScholarDigital Library
Joachims, T. Learning to Classify Text using Support Vector Machines. Kluwer, 2002. Google ScholarDigital Library
Joachims, T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. in Proceedings of the 10th European Conference on Machine Learning (ECML), Chemnitz, Germany, 1998, 137--142. Google ScholarDigital Library
Joachims, T., Cristianini, N. and Shawe-Taylor, J. Composite Kernels for Hypertext Categorisation. in Proceedings of the 18th International Conference on Machine Learning (ICML), Williamstown, MA, 2001, 250--257. Google ScholarDigital Library
Kass, R.E. The Geometry of Asymptotic Inference. Statistical Science, 4 (3). 188--234.Google ScholarCross Ref
Kass, R.E. and Vos, P.W. Geometrical Foundations of Asymptotic Inference. Wiley, 1997.Google ScholarCross Ref
Kullback, S. Information Theory and Statistics. Wiley, 1959.Google Scholar
Lafferty, J.D. and Lebanon, G. Information Diffusion Kernels. in Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada, 2002, 375--382.Google Scholar
Lang, K. NewsWeeder: Learning to Filter Netnews. in Proceedings of the 12th International Conference on Machine Learning (ICML), Tahoe City, CA, 1995, 331--339.Google Scholar
Lebanon, G. Learning Riemannian Metrics. in Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI), Acapulco, Mexico, 2003, 362--369. Google ScholarDigital Library
Lebanon, G. and Lafferty, J.D. Hyperplane Margin Classifiers on the Multinomial Manifold. in Proceedings of the 21st International Conference on Machine Learning (ICML), Alberta, Canada, 2004. Google ScholarDigital Library
McCallum, A.K. Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering. 1996. http://www.cs.cmu.edu/~mccallum/bow.Google Scholar
Moreno, P.J., Ho, P. and Vasconcelos, N. A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications. in Advances in Neural Information Processing Systems (NIPS), Vancouver and Whistler, Canada, 2003.Google Scholar
Pekalska, E., Paclik, P. and Duin, R.P.W. A Generalized Kernel Approach to Dissimilarity-based Classification. Journal of Machine Learning Research, 2. 175--211. Google ScholarDigital Library
Roweis, S.T. and Saul, L.K. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290 (5500). 2323--2326.Google ScholarCross Ref
Schoenberg, I.J. Metric Spaces and Positive Definite Functions. Transactions of the American Mathematical Society, 44. 522--536.Google Scholar
Schoenberg, I.J. Positive Definite Functions on Spheres. Duke Mathematical Journal, 9 (1). 96--08.Google ScholarCross Ref
Scholkopf, B. The Kernel Trick for Distances. in Advances in Neural Information Processing Systems (NIPS), Denver, CO, 2000, 301--307.Google Scholar
Scholkopf, B. and Smola, A.J. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google Scholar
Seung, H.S. and Lee, D.D. The Manifold Ways of Perception. Science, 290 (5500). 2268--2269.Google Scholar
Shawe-Taylor, J. and Cristianini, N. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. Google ScholarDigital Library
Tenenbaum, J.B., Silva, V.d. and Langford, J.C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290 (5500). 2319--2323.Google Scholar
Yang, Y. and Liu, X. A Re-examination of Text Categorization Methods. in Proceedings of the 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR), Berkeley, CA, 1999, 42--49. Google ScholarDigital Library

Index Terms

Text classification with kernels on the multinomial manifold
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Question classification using support vector machines
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Question classification is very important for question answering. This paper presents our research work on automatic question classification through machine learning approaches. We have experimented with five machine learning algorithms: Nearest ...
Read More
Probabilistic classification vector machines

In this paper, a sparse learning algorithm, probabilistic classification vector machines (PCVMs), is proposed. We analyze relevance vector machines (RVMs) for classification problems and observe that adopting the same prior for different classes may ...
Read More
Nonlinear Dimensionality Reduction by Topologically Constrained Isometric Embedding

Many manifold learning procedures try to embed a given feature data into a flat space of low dimensionality while preserving as much as possible the metric in the natural feature space. The embedding process usually relies on distances between ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
General Chairs:
Ricardo Baeza-Yates
University of Chile, Chile
,
Nivio Ziviani
Federal University of Minas Gerais, Brazil
,
Program Chairs:
Gary Marchionini
University of North Carolina, USA
,
Alistair Moffat
University of Melbourne, Australia
,
John Tait
University of Sunderland, UK
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 August 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
differential geometry
kernels
machine learning
manifolds
support vector machine
text classification
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 784
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Text classification with kernels on the multinomial manifold

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Question classification using support vector machines

Probabilistic classification vector machines

Nonlinear Dimensionality Reduction by Topologically Constrained Isometric Embedding