Skip to main content

Bhattacharyya and Expected Likelihood Kernels

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2777))

Abstract

We introduce a new class of kernels between distributions. These induce a kernel on the input space between data points by associating to each datum a generative model fit to the data point individually. The kernel is then computed by integrating the product of the two generative models corresponding to two data points. This kernel permits discriminative estimation via, for instance, support vector machines, while exploiting the properties, assumptions, and invariances inherent in the choice of generative model. It satisfies Mercer’s condition and can be computed in closed form for a large class of models, including exponential family models, mixtures, hidden Markov models and Bayesian networks. For other models the kernel can be approximated by sampling methods. Experiments are shown for multinomial models in text classification and for hidden Markov models for protein sequence classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aherne, F., Thacker, N., Rockett, P.: The Bhattacharyya metric as an absolute similarity measure for frequency coded data. Kybernetika 32(4), 1–7 (1997)

    MathSciNet  Google Scholar 

  2. Barndorff-Nielsen, O.: Information and Exponential Families in Statistical Theory. John Wiley & Sons, Chichester (1978)

    MATH  Google Scholar 

  3. Bengio, Y., Frasconi, P.: Input-output HMM’s for sequence processing. IEEE Transactions on Neural Networks 7(5), 1231–1249 (1996)

    Article  Google Scholar 

  4. Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. (1943)

    Google Scholar 

  5. Bishop, C.: Neural Networks for Pattern Recognition. Oxford Press, Oxford (1996)

    MATH  Google Scholar 

  6. Collins, M., Duffy, N.: Convolution kernels for natural language. Neural Information Processing Systems 14 (2002)

    Google Scholar 

  7. Cortes, C., Haffner, P., Mohri, M.: Rational kernels. In: Neural Information Processing Systems, vol. 15 (2002)

    Google Scholar 

  8. Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural network architectures. Neural Computation 7, 219–269 (1995)

    Article  Google Scholar 

  9. Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSCCRL- 9-10, University of California at Santa Cruz (1999)

    Google Scholar 

  10. Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Neural Information Processing Systems, vol. 11 (1998)

    Google Scholar 

  11. Jaakkola, T., Meila, M., Jebara, T.: Maximum entropy discrimination. In: Neural Information Processing Systems, vol. 12 (1999)

    Google Scholar 

  12. Joachims, T., Cristianini, N., Shawe-Taylor, J.: Composite kernels for hypertext categorisation. In: International Conference on Machine Learning (2001)

    Google Scholar 

  13. Jordan, M.: Learning in Graphical Models. Kluwer Academic, Dordrecht (1997)

    Google Scholar 

  14. Kin, T., Tsuda, K., Asai, K.: Marginalized kernels for rna sequence data analysis. In: Proc. Genome Informatics (2002)

    Google Scholar 

  15. Kondor, R., Jebara, T.: A kernel between sets of vectors. Machine Learning: 10th International Conference. In: ICML 2003 (February 2003)

    Google Scholar 

  16. Lafferty, J., Lebanon, G.: Information diffusion kernels. In: Neural Information Processing Systems (2002)

    Google Scholar 

  17. Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for svm protein classification. In: Neural Information Processing Systems (2002)

    Google Scholar 

  18. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)

    Article  MATH  Google Scholar 

  19. Lyngso, R.B., Pedersen, C.N.S., Nielsen, H.: Metrics and similarity measures for hidden markov models. In: Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB) (1999)

    Google Scholar 

  20. Ong, C., Smola, A., Williamson, R.: Superkernels. In: Neural Information Processing Systems (2002)

    Google Scholar 

  21. Rathinavelu, C., Deng, L.: Speech trajectory discrimination using the minimum classification error learning. In: IEEE Trans. on Speech and Audio Processing (1997)

    Google Scholar 

  22. Smola, A.J., Scholkopf, B.: From regularization operators to support vector machines. In: Neural Information Processing Systems, pp. 343–349 (1998)

    Google Scholar 

  23. Tishby, N., Bialek, W., Pereira, F.: The information bottleneck method: Extracting relevant information from concurrent data. Technical report, NEC Research Institute (1998)

    Google Scholar 

  24. Topsoe, F.: Some inequalities for information divergence and related measures of discrimination. J. of Inequalities in Pure and Applied Mathematics 2(1) (1999)

    Google Scholar 

  25. Vishawanathan, S.V.N., Smola, A.J.: Fast kernels for string and tree matching. In: Neural Information Processing Systems, vol. 15 (2002)

    Google Scholar 

  26. Watkins, C.: Dynamic Alignment Kernels. In: Watkins, C. (ed.) Advances in kernel methods. MT Press, Cambridge (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jebara, T., Kondor, R. (2003). Bhattacharyya and Expected Likelihood Kernels. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45167-9_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40720-1

  • Online ISBN: 978-3-540-45167-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics