skip to main content
10.1145/1076034.1076081acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Text classification with kernels on the multinomial manifold

Authors Info & Claims
Published:15 August 2005Publication History

ABSTRACT

Support Vector Machines (SVMs) have been very successful in text classification. However, the intrinsic geometric structure of text data has been ignored by standard kernels commonly used in SVMs. It is natural to assume that the documents are on the multinomial manifold, which is the simplex of multinomial models furnished with the Riemannian structure induced by the Fisher information metric. We prove that the Negative Geodesic Distance (NGD) on the multinomial manifold is conditionally positive definite (cpd), thus can be used as a kernel in SVMs. Experiments show the NGD kernel on the multinomial manifold to be effective for text classification, significantly outperforming standard kernels on the ambient Euclidean space.

References

  1. Amari, S., Nagaoka, H. and Amari, S.-I. Methods of Information Geometry. American Mathematical Society, 2001.Google ScholarGoogle Scholar
  2. Baeza-Yates, R. and Ribeiro-Neto, B. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bahlmann, C., Haasdonk, B. and Burkhardt, H. On-Line Handwriting Recognition with Support Vector Machines -- A Kernel Approach. in Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition (IWFHR), 2002, 49--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Berg, C., Christensen, J.P.R. and Ressel, P. Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Springer-Verlag, 1984.Google ScholarGoogle Scholar
  5. Chang, C.-C. and Lin, C.-J. LIBSVM: a Library for Support Vector Machines. 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google ScholarGoogle Scholar
  6. Chapelle, O., Haffner, P. and Vapnik, V.N. SVMs for Histogram Based Image Classification. IEEE Transactions on Neural Networks, 10 (5). 1055--1064. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dabak, A.G. and Johnson, D.H. Relations between Kullback-Leibler distance and Fisher information Manscript, 2002.Google ScholarGoogle Scholar
  8. Dumais, S., Platt, J., Heckerman, D. and Sahami, M. Inductive Learning Algorithms and Representations for Text Categorization. in Proceedings of the 7th ACM International Conference on Information and Knowledge Management (CIKM), Bethesda, MD, 1998, 148--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Graepel, T., Herbrich, R., Bollmann-Sdorra, P. and Obermayer, K. Classification on Pairwise Proximity Data. in Advances in Neural Information Processing Systems (NIPS), Denver, CO, 1998, 438--444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Haasdonk, B. and Bahlmann, C. Learning with Distance Substitution Kernels. in Proceedings of the 26th DAGM Symposium, Tubingen, Germany, 2004, 220--227.Google ScholarGoogle Scholar
  11. Haasdonk, B. and Keysers, D. Tangent Distance Kernels for Support Vector Machines. in Proceedings of the 16th International Conference on Pattern Recognition (ICPR), Quebec, Canada, 2002, 864--868.Google ScholarGoogle Scholar
  12. Hsu, C.-W. and Lin, C.-J. A Comparison of Methods for Multi-class Support Vector Machines. IEEE Transactions on Neural Networks, 13 (2). 415--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jaakkola, T. and Haussler, D. Exploiting Generative Models in Discriminative Classifiers. in Advances in Neural Information Processing Systems (NIPS), Denver, CO, 1998, 487--493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jebara, T., Kondor, R. and Howard, A. Probability Product Kernels. Journal of Machine Learning Research, 5. 819--844. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Joachims, T. Learning to Classify Text using Support Vector Machines. Kluwer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Joachims, T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. in Proceedings of the 10th European Conference on Machine Learning (ECML), Chemnitz, Germany, 1998, 137--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joachims, T., Cristianini, N. and Shawe-Taylor, J. Composite Kernels for Hypertext Categorisation. in Proceedings of the 18th International Conference on Machine Learning (ICML), Williamstown, MA, 2001, 250--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kass, R.E. The Geometry of Asymptotic Inference. Statistical Science, 4 (3). 188--234.Google ScholarGoogle ScholarCross RefCross Ref
  19. Kass, R.E. and Vos, P.W. Geometrical Foundations of Asymptotic Inference. Wiley, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  20. Kullback, S. Information Theory and Statistics. Wiley, 1959.Google ScholarGoogle Scholar
  21. Lafferty, J.D. and Lebanon, G. Information Diffusion Kernels. in Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada, 2002, 375--382.Google ScholarGoogle Scholar
  22. Lang, K. NewsWeeder: Learning to Filter Netnews. in Proceedings of the 12th International Conference on Machine Learning (ICML), Tahoe City, CA, 1995, 331--339.Google ScholarGoogle Scholar
  23. Lebanon, G. Learning Riemannian Metrics. in Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI), Acapulco, Mexico, 2003, 362--369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lebanon, G. and Lafferty, J.D. Hyperplane Margin Classifiers on the Multinomial Manifold. in Proceedings of the 21st International Conference on Machine Learning (ICML), Alberta, Canada, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. McCallum, A.K. Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering. 1996. http://www.cs.cmu.edu/~mccallum/bow.Google ScholarGoogle Scholar
  26. Moreno, P.J., Ho, P. and Vasconcelos, N. A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications. in Advances in Neural Information Processing Systems (NIPS), Vancouver and Whistler, Canada, 2003.Google ScholarGoogle Scholar
  27. Pekalska, E., Paclik, P. and Duin, R.P.W. A Generalized Kernel Approach to Dissimilarity-based Classification. Journal of Machine Learning Research, 2. 175--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Roweis, S.T. and Saul, L.K. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290 (5500). 2323--2326.Google ScholarGoogle ScholarCross RefCross Ref
  29. Schoenberg, I.J. Metric Spaces and Positive Definite Functions. Transactions of the American Mathematical Society, 44. 522--536.Google ScholarGoogle Scholar
  30. Schoenberg, I.J. Positive Definite Functions on Spheres. Duke Mathematical Journal, 9 (1). 96--08.Google ScholarGoogle ScholarCross RefCross Ref
  31. Scholkopf, B. The Kernel Trick for Distances. in Advances in Neural Information Processing Systems (NIPS), Denver, CO, 2000, 301--307.Google ScholarGoogle Scholar
  32. Scholkopf, B. and Smola, A.J. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google ScholarGoogle Scholar
  33. Seung, H.S. and Lee, D.D. The Manifold Ways of Perception. Science, 290 (5500). 2268--2269.Google ScholarGoogle Scholar
  34. Shawe-Taylor, J. and Cristianini, N. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tenenbaum, J.B., Silva, V.d. and Langford, J.C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290 (5500). 2319--2323.Google ScholarGoogle Scholar
  36. Yang, Y. and Liu, X. A Re-examination of Text Categorization Methods. in Proceedings of the 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR), Berkeley, CA, 1999, 42--49. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Text classification with kernels on the multinomial manifold

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
          August 2005
          708 pages
          ISBN:1595930345
          DOI:10.1145/1076034

          Copyright © 2005 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 August 2005

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader