Skip to main content

Random Set to Interpret Topic Models in Terms of Ontology Concepts

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10400))

Abstract

Topic modelling is a popular technique in text mining. However, discovered topic models are difficult to interpret due to incoherence and lack of background context. Many applications require an accurate interpretation of topic models so that both users and machines can use them effectively. Taking the advantage of random set and a domain ontology, this research can interpret the topic models. The interpretation is evaluated by comparing it with different baseline models on two standard datasets. The results show that the performance of the interpretation is significantly better than baseline models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://svmlight.joachims.org.

References

  1. Blei, D., Lafferty, J.: Correlated topic models. Adv. Neural Inform. Process. Syst. 18, 147 (2006)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Brewster, C., Alani, H., Dasmahapatra, S., Wilks, Y.: Data driven ontology evaluation. In: International Conference on Language Resources and Evaluation (LREC 2004) (2004)

    Google Scholar 

  4. Brody, S., Lapata, M.: Bayesian word sense induction. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 103–111. Association for Computational Linguistics (2009)

    Google Scholar 

  5. Calegari, S., Pasi, G.: Personal ontologies: generation of user profiles based on the yago ontology. Inform. Process. Manag. 49(3), 640–658 (2013)

    Article  Google Scholar 

  6. Chaney, A.J.-B., Blei, D.M.: Visualizing topic models. In: ICWSM (2012)

    Google Scholar 

  7. Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Sheth, A., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88564-1_15

    Chapter  Google Scholar 

  8. Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. IEEE Trans. Knowl. Data Eng. 27(6), 1629–1642 (2015)

    Article  Google Scholar 

  9. Goutsias, J., Mahler, R.P., Nguyen, H.T.: Random Sets: Theory and Applications, vol. 97. Springer Science & Business Media, New York (2012)

    Google Scholar 

  10. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101(suppl 1), 5228–5235 (2004)

    Article  Google Scholar 

  11. Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 362–370. Association for Computational Linguistics (2009)

    Google Scholar 

  12. Hu, Z., Luo, G., Sachan, M., Xing, E., Nie, Z.: Grounding topic models with knowledge bases. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (2016)

    Google Scholar 

  13. Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using DBpedia. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 465–474. ACM (2013)

    Google Scholar 

  14. Kruse, R., Schwecke, E., Heinsohn, J.: Uncertainty and Vagueness in Knowledge Based Systems. Springer, New York (1991)

    Book  MATH  Google Scholar 

  15. Lau, J.H., Grieser, K., Newman, D., Baldwin, T.: Automatic labelling of topic models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1536–1545. Association for Computational Linguistics (2011)

    Google Scholar 

  16. Lau, J.H., Newman, D., Karimi, S., Baldwin, T.: Best topic word selection for topic labelling. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 605–613. Association for Computational Linguistics (2010)

    Google Scholar 

  17. Li, Y., Algarni, A., Albathan, M., Shen, Y., Bijaksana, M.A.: Relevance feature discovery for text mining. IEEE Trans. Knowl. Data Eng. 27(6), 1656–1669 (2015)

    Article  Google Scholar 

  18. Mao, X.-L., Ming, Z.-Y., Zha, Z.-J., Chua, T.-S., Yan, H., Li, X.: Automatic labeling hierarchical topics. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2383–2386. ACM (2012)

    Google Scholar 

  19. Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 490–499. ACM (2007)

    Google Scholar 

  20. Mei, Q., Zhai, C.: A mixture model for contextual text mining. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 649–655. ACM (2006)

    Google Scholar 

  21. Mimno, D., Wallach, H.M., Naradowsky, J., Smith, D.A., McCallum, A.: Polylingual topic models. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 880–889. Association for Computational Linguistics (2009)

    Google Scholar 

  22. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 42–49. ACM (2004)

    Google Scholar 

  23. Robertson, S.E., Soboroff, I.: The TREC 2002 filtering track report. In: TREC, vol. 2002, p. 5 (2002)

    Google Scholar 

  24. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)

    Article  Google Scholar 

  25. Shen, Y., Li, Y., Xu, Y.: Adopting relevance feature to learn personalized ontologies. In: Thielscher, M., Zhang, D. (eds.) AI 2012. LNCS, vol. 7691, pp. 457–468. Springer, Heidelberg (2012). doi:10.1007/978-3-642-35101-3_39

    Chapter  Google Scholar 

  26. Sieg, A., Mobasher, B., Burke, R.: Web search personalization with ontological user profiles. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 525–534. ACM (2007)

    Google Scholar 

  27. Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 2330–2336. AAAI Press (2011)

    Google Scholar 

  28. Spasic, I., Ananiadou, S., McNaught, J., Kumar, A.: Text mining and ontologies in biomedicine: making sense of raw text. Brief. Bioinform. 6(3), 239–251 (2005)

    Article  Google Scholar 

  29. Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007)

    Google Scholar 

  30. Sun, X., Xiao, Y., Wang, H., Wang, W.: On conceptual labeling of a bag of words. In: Proceedings of the 24th International Conference on Artificial Intelligence, pp. 1326–1332. AAAI Press (2015)

    Google Scholar 

  31. Tran, T., Cimiano, P., Rudolph, S., Studer, R.: Ontology-based interpretation of keywords for semantic search. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 523–536. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_38

    Chapter  Google Scholar 

  32. Wang, X., McCallum, A.: Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424–433. ACM (2006)

    Google Scholar 

  33. Wang, X., McCallum, A., Wei. X.: Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 697–702. IEEE (2007)

    Google Scholar 

  34. Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185. ACM (2006)

    Google Scholar 

  35. Wu, S.-T., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 1157–1161. IEEE (2006)

    Google Scholar 

  36. Yan, X., Cheng, H., Han, J., Xin, D.: Summarizing itemset patterns: a profile-based approach. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 314–323. ACM (2005)

    Google Scholar 

  37. Yi, K., Chan, L.M.: Linking folksonomy to library of congress subject headings: an exploratory study. J. Document. 65(6), 872–900 (2009)

    Article  Google Scholar 

Download references

Acknowledgment

This research was partially supported by Grant DP140103157 from the Australian Research Council (ARC Discovery Project).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md Abul Bashar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bashar, M.A., Li, Y. (2017). Random Set to Interpret Topic Models in Terms of Ontology Concepts. In: Peng, W., Alahakoon, D., Li, X. (eds) AI 2017: Advances in Artificial Intelligence. AI 2017. Lecture Notes in Computer Science(), vol 10400. Springer, Cham. https://doi.org/10.1007/978-3-319-63004-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63004-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63003-8

  • Online ISBN: 978-3-319-63004-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics