Skip to main content
Log in

A Dynamic Probabilistic Model to Visualise Topic Evolution in Text Streams

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

We propose a novel probabilistic method, based on latent variable models, for unsupervised topographic visualisation of dynamically evolving, coherent textual information. This can be seen as a complementary tool for topic detection and tracking applications. This is achieved by the exploitation of the a priori domain knowledge available, that there are relatively homogeneous temporal segments in the data stream. In a different manner from topographical techniques previously utilized for static text collections, the topography is an outcome of the coherence in time of the data stream in the proposed model. Simulation results on both toy-data settings and an actual application on Internet chat line discussion analysis is presented by way of demonstration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. (1998). Topic Detection and Tracking Pilot Study Final Report. In Proc. of DARPA Broadcast News Transcription and Understanding Workshop, Feb. 1998 (pp. 194–218).

  • Attias, H. (1999). Independent Factor Analysis. Neural Computation, 11(4), 803–851.

    Google Scholar 

  • Beeferman, D., Berger, A., and Lafferty, J. (1999). Statistical Models for Text Segmentation. In C. Cardie and R. Mooney (Eds.), Machine Learning, Special Issue on Natural Language Learning, 34(1–3), 177–210.

    Google Scholar 

  • Bishop, C.M., Hinton, G.E., and Strachan, I.G.D. (1997). GTM Through Time. In Proc. IEE Fifth International Conference on Artificial Neural Networks, IEE, London (pp. 111–116).

    Google Scholar 

  • Bishop, C.M., Svensen, M., and Williams, C.K.I. (1998). GTM: The Generative Topographic Mapping. Neural Computation, 10(1), 215–235.

    Google Scholar 

  • Deerwester, S., Dumais, S.-T., Furnas, G.-W., Landauer, T.-K., and Harshman, R. (1990). Indexing by Latent Semantic Analysis. J. Amer. Soc. Inf. Sci, 41(6), 391–407.

    Google Scholar 

  • Ghahramani, Z. and Beal, M.J. (To appear). Graphical Models and Variational Methods. In Saad &; Opper (Eds.), Advanced Mean Field Method—Theory and Practice. Cambridge, MA: MIT Press.

  • Girolami, M. (2001). Latent Class and Trait Models for Data Classification and Visualisation. Invited Chapter for book ‘ICA: Principles and Practice’, Cambridge University Press.

  • Hollmén, J. and Tresp, V. (1999). Call-Based Fraud Detection in Mobile Communications Networks Using a Hierarchical Regime-Switching Model. In M. Kearns, S. Solla, and D.A. Cone (Eds.), Neural Information Processing Systems, Vol. 11 (pp. 889–895). Cambridge, MA: MIT Press.

    Google Scholar 

  • Hyvarinen, A. (To appear). Complexity Pursuit: Separating Interesting Components from Time-Series. Neural Computation.

  • Jebara, T., Ivanov, Y., Rahimi, A., and Pentland, A. (2000). Tracking Conversational Context for Machine Mediation of Human Discourse. In AAAI Fall 2000 Symposium—Socially Intelligent Agents—The Human in the Loop, Nov. 2000.

  • Kabán, A. and Girolami, M. (in press). A Combined Latent Class and Trait Model for the analysis and visualization of Discrete Data. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Kimber, D. and Bush, M. (1993). Situated State Hidden Markov Models. ICASSP.

  • Lagus, K., Honkela, T., Kaski, S., and Kohonen, T. (1999). WEBSOM for Textual Data Mining. Artificial Intelligence Review, 13(5/6), 345–364.

    Google Scholar 

  • McCallum, A. and Nigam, K. (1998). A Comparison of Event Models for Naive Bayes Text Classification. In Proc. of AAAI/ICML-98 Workshop on Learning for Text Categorization (pp. 41–48).

  • McCullagh, P. and Nelder, L.A. (1985). Generalized Linear Models. London: Chapman and Hall.

    Google Scholar 

  • McLachlan, G. and Peel, D. (2000). Finite Mixture Models. New York: John Wiley &; Sons.

    Google Scholar 

  • Rabiner, L.R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In Proc. of the IEEE, 77(2), 257–285.

    Google Scholar 

  • Roweis, S. (1999). Constrained Hidden Markov Models. In Neural Information Processing Systems, Vol. 12 (NIPS'99) (pp. 782–788).

    Google Scholar 

  • Sahami, M. (1998). Using Machine Learning to Improve Information Access. Ph.D. Thesis, Stanford University.

  • Salton, G. and McGill, M. (1983). Introduction to Modern Information Retrieval. New York: McGraw-Hill.

    Google Scholar 

  • Sammon, J.W. (1969). A Nonlinear Mapping for Data Structure Analysis. IEEE Transactions on Computers, C-18(5), 401–409.

    Google Scholar 

  • Saul, L. and Roweis, S. (2000). Nonlinear Dimensionality Reduction by Local Linear Embedding. Science.

  • Tenenbaum, J.B. (1997). Mapping a Manifold of Perceptual Observations. In Advances in Neural Information Processing Systems, Vol. 10 (NIPS'97).

  • Valpola, H. (2000). Unsupervised Learning of Nonlinear Dynamic State-Space Models, Publications in Computer and Information Science A59, Helsinki University of Technology, Espoo, Finland.

    Google Scholar 

  • Yamron, J. (1998). Topic Detection and Tracking Segmentation Task. In Proc. of Broadcast News Transcription and Understanding Workshop.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kabán, A., Girolami, M.A. A Dynamic Probabilistic Model to Visualise Topic Evolution in Text Streams. Journal of Intelligent Information Systems 18, 107–125 (2002). https://doi.org/10.1023/A:1013673310093

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1013673310093

Navigation