Skip to main content

BCTM: A Topic Modeling Method Based on External Information

  • Conference paper
  • First Online:
Broadband Communications, Networks, and Systems (BROADNETS 2023)

Abstract

Topic models are often used as intermediate algorithms for text mining and semantic analysis in natural language processing, and have a wide range of functions. However, most of the existing improvements to the topic model use word embedding to improve the accuracy of text modeling, but ignore the external information in the text. This paper proposes a topic model BCTM (Bi-Concept Topic Model) using the word feature information and concept information. Based on the BTM topic model, BCTM introduces word feature information through word vector technology and concept information based on ConceptNet to optimize topic modeling. The construction method of Bi-Concept pair is proposed. Based on ConceptNet semantic network, and the content of text is enriched with concept information. A more accurate topic distribution is obtained through the improved topic model, at the same time, due to the rich feature information, the model is also superior to the baseline model in short text modeling. The experiments prove that the bilingual topic model proposed in this paper has a good performance in modeling accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ye, J., Zou, B., Hong, Y., Shen, L., Zhu, Q., Zhou, G.: Negation and speculation scope detection in Chinese. J. Comput. Res. Dev. 56(7), 1506–1516 (2019). (in Chinese)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 34(5), 993–1022 (2003)

    MATH  Google Scholar 

  3. Liu, Y., Wang, Z., Hou, Y., Yan, H.: A method of extracting malware features based on probabilistic topic model. J. Comput. Res. Dev. 56(11), 2339–3234 (2019). (in Chinese)

    Google Scholar 

  4. Lee, Y.Y., Ke, H., Yen, T.Y., et al.: Combining and learning word embedding with WordNet for semantic relatedness and similarity measurement. J. Am. Soc. Inf. Sci. 71(6), 657–670 (2020)

    Google Scholar 

  5. Limwattana, S., Prom-On, S.: Topic modeling enhancement using word embeddings. In: 2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE) (2021)

    Google Scholar 

  6. Zhao, H., Du, L., Liu, G., et al.: Leveraging meta information in short text aggregation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019)

    Google Scholar 

  7. Yan, X., Guo, J., Lan, Y., et al.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)

    Google Scholar 

  8. Wu, T., Qi, G., Wang, H., et al.: Cross-Lingual taxonomy alignment with bilingual biterm topic model. In: AAAI, pp. 287–293 (2016)

    Google Scholar 

  9. Zhu, Q., Feng, Z., Li, X.: GraphBTM: graph enhanced autoencoded variational inference for biterm topic model. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4663–4672 (2018)

    Google Scholar 

  10. Li, X., Zhang, A., Li, C., et al.: Relational biterm topic model: Short-text topic modeling using word embeddings. Comput. J. 62(3), 359–372 (2019)

    Article  Google Scholar 

  11. Huang, J., Peng, M., Li, P., et al.: Improving biterm topic model with word embeddings. World Wide Web 23(6), 3099–3124 (2020)

    Article  Google Scholar 

  12. Nguyen, D.Q., Billingsley, R., Du, L., et al.: Improving topic models with latent feature word representations. Trans. Assoc. Comput. Linguist. 3, 299–313 (2015)

    Article  Google Scholar 

  13. Li, C., Wang, H., Zhang, Z., et al.: Topic modeling for short texts with auxiliary word embeddings. In: International ACM SIGIR Conference, pp. 165–174. ACM (2016)

    Google Scholar 

  14. Gao, W., Peng, M., Wang, H., Zhang, Y., Xie, Q., Tian, G.: Incorporating word embeddings into topic modeling of short text. Knowl. Inf. Syst. 61(2), 1123–1145 (2018). https://doi.org/10.1007/s10115-018-1314-7

    Article  Google Scholar 

  15. Yi, F., Jiang, B., Wu, J.: Topic modeling for short texts via word embedding and document correlation. IEEE Access PP(99), 1 (2020)

    Google Scholar 

  16. Archambeau, C., Lakshminarayanan, B., Bouchard, G.: Latent IBP compound Dirichlet allocation. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 321–333 (2014)

    Article  Google Scholar 

  17. Wu, X., Li, C., Zhu, Y., et al.: Short text topic modeling with topic distribution quantization and negative sampling decoder. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020)

    Google Scholar 

  18. Wallach, H.M., Minmo, D., Mccallum, A.: Rethinking LDA: why priors matter. Adv. Neural. Inf. Process. Syst. 23, 1973–1981 (2009)

    Google Scholar 

  19. Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: The 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 530–533 (2014)

    Google Scholar 

Download references

Acknowledgments

This work is supported by Key Research and Development Projects of Heilongjiang Province under grant number GA21C020, and Natural Science Foundation of Heilongjiang Province under grant number LH2021F015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taiying Wan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, G., Wan, T., Yu, J., Zhan, K., Wang, W. (2023). BCTM: A Topic Modeling Method Based on External Information. In: Wang, W., Wu, J. (eds) Broadband Communications, Networks, and Systems. BROADNETS 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 511. Springer, Cham. https://doi.org/10.1007/978-3-031-40467-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40467-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40466-5

  • Online ISBN: 978-3-031-40467-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics