short-paper

Discovering Interpretable Topics by Leveraging Common Sense Knowledge

Authors:
Ismail Harrando

EURECOM, BIOT, France

EURECOM, BIOT, France
View Profile

,
Raphaël Troncy

EURECOM, Biot, France

EURECOM, Biot, France
View Profile

K-CAP '21: Proceedings of the 11th Knowledge Capture ConferenceDecember 2021Pages 265–268https://doi.org/10.1145/3460210.3493586

Published:02 December 2021Publication History

K-CAP '21: Proceedings of the 11th Knowledge Capture Conference

Pages 265–268

ABSTRACT

Traditional topic modeling approaches generally rely on document-term co-occurrence statistics to find latent topics in a collection of documents. However, relying only on such statistics can yield incoherent or hard to interpret results for the end-users in many applications where the interest lies in interpreting the resulting topics (e.g. labeling documents, comparing corpora, guiding content exploration, etc.). In this work, we propose to leverage external common sense knowledge, i.e. information from the real world beyond word co-occurrence, to find topics that are more coherent and more easily interpretable by humans. We introduce the Common Sense Topic Model (CSTM), a novel and efficient approach that augments clustering with knowledge extracted from the ConceptNet knowledge graph. We evaluate this approach on several datasets alongside commonly used models using both automatic and human evaluation, and we show how it shows superior affinity to human judgement. The code for the experiments as well as the training data and human evaluation are available at https://github.com/D2KLab/CSTM.

References

Mehdi Allahyari, Seyedamin Pouriyeh, Krys Kochut, and Hamid Reza Arabnia.Google Scholar
. A Knowledge-based Topic Modeling Approach for Automatic Topic Labeling. IJACSA 2017 ([n. d.]).Google Scholar
Dr. Hiteshwar Kumar Azad and A. Deepak. 2019. Query Expansion Techniques for Information Retrieval: a Survey. Inf. Process. Manag. 56 (2019), 1698--1735.Google ScholarDigital Library
DavidMBlei, AndrewY Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993--1022.Google Scholar
Jonathan Chang, Jordan Boyd-Graber, Sean Gerrish, Chong Wang, and David M. Blei. 2009. Reading Tea Leaves: How Humans Interpret Topic Models (NIPS'09). Red Hook, NY, USA, 288--296.Google ScholarDigital Library
Zhiyuan Chen, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Discovering Coherent Topics Using General Knowledge. In CIKM '13 (San Francisco, California, USA). New York, NY, USA, 209--218.Google Scholar
Caitlin Doogan and Wray Buntine. 2021. Topic Model or Topic Twaddle? Reevaluating Semantic Interpretability Measures. In NAACL '21.Google Scholar
Anjie Fang, Craig Macdonald, Iadh Ounis, and Philip Habel. 2016. Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data. In SIGIR '16 (Pisa, Italy) (SIGIR '16). New York, NY, USA.Google ScholarDigital Library
Adriana Ferrugento, Ana Alves, Hugo Gonçalo Oliveira, and Filipe Rodrigues. 1957. A synopsis of linguistic theory 1930--1955.Google Scholar
Adriana Ferrugento, Ana Alves, Hugo Gonçalo Oliveira, and Filipe Rodrigues. 2015. Towards the Improvement of a Topic Model with Semantic Knowledge, Vol. 9273. Portuguese Conference on Artificial Intelligence, 759--770.Google Scholar
Derek Greene and Pádraig Cunningham. 2006. Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering. In ICML 2006.Google ScholarDigital Library
Antonio Gulli. 2005. AG's corpus of news articles.Google Scholar
Ismail Harrando and Raphaël Troncy. 2021. Explainable Zero-Shot Topic Extraction Using a Common-Sense Knowledge Graph. In LDK 2021. Dagstuhl, Germany.Google Scholar
Alexander Miserlis Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan L. Boyd-Graber, and P. Resnik. 2021. Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence. ArXiv abs/2107.02173 (2021).Google Scholar
Ming-Hung Hsu, Ming-Feng Tsai, and Hsin-Hsi Chen. 2006. Query Expansion with ConceptNet andWordNet: An Intrinsic Comparison. In Information Retrieval Technology. Berlin, Heidelberg.Google Scholar
Filip Ilievski, Pedro Szekely, and Bin Zhang. 2021. CSKG: The CommonSense Knowledge Graph. Extended Semantic Web Conference (ESWC) (2021).Google Scholar
Ken Lang. 1995. Newsweeder: Learning to filter netnews. In 12??? International Conference on Machine Learning (ICML). 331--339.Google ScholarCross Ref
Natalia Loukachevitch, Michael Nokel, and Kirill Ivanov. 2018. Combining Thesaurus Knowledge and Probabilistic Topic Models. In Analysis of Images, Social Networks and Texts.Google Scholar
David Mimno, Hanna M.Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing Semantic Coherence in Topic Models. In EMNLP '11 (Edinburgh, United Kingdom). Association for Computational Linguistics, USA.Google Scholar
Janna Omeliyanenko, Albin Zehe, Lena Hettinger, and Andreas Hotho. [n. d.]. LM4KG: Improving Common Sense Knowledge Graphs with Language Models. In ISWC 2020. Cham.Google ScholarDigital Library
Andrew Rosenberg and Julia Hirschberg. 2007. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In EMNLP-CoNLL '07. Association for Computational Linguistics, Prague, Czech Republic, 410--420.Google Scholar
Charlotte Rudnik, Thibault Ehrhart, Olivier Ferret, Denis Teyssou, Raphaël Troncy, and Xavier Tannier. 2019. Searching News Articles Using an Event Knowledge Graph Leveraged by Wikidata. In 5??? Wiki Workshop. 1232--1239.Google ScholarDigital Library
Maarten Sap, Ronan Le Bras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi. [n. d.]. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. In AAAI 2019.Google Scholar
Suzanna Sia, Ayush Dalmia, and Sabrina J. Mielke. 2020. Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!. In EMNLP '20. Association for Computational Linguistics, Online, 1728--1736.Google Scholar
Dandan Song, Jingwen Gao, Jinhui Pang, Lejian Liao, and Lifei Qin. 2020. Knowledge Base Enhanced Topic Modeling. In ICKG 2020. 380--387.Google Scholar
Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. 4444--4451.Google Scholar
Ilaria Tiddi, Mathieu d'Aquin, and Enrico Motta. 2015. Using Linked Data Traversal to Label Academic Communities. In WWW 2015 (Florence, Italy) (WWW '15 Companion). New York, NY, USA.Google ScholarDigital Library
Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. 2009. Evaluation Methods for Topic Models. In ICML '09 (Montreal, Quebec, Canada) (ICML '09). New York, NY, USA, 1105--1112.Google Scholar
Wei Xu, Xin Liu, and Yihong Gong. 2003. Document Clustering Based on Non-Negative Matrix Factorization (SIGIR '03). Association for Computing Machinery, New York, NY, USA, 267--273.Google Scholar

Index Terms

Discovering Interpretable Topics by Leveraging Common Sense Knowledge
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

Jointly Discovering Fine-grained and Coarse-grained Sentiments via Topic Modeling
MM '14: Proceedings of the 22nd ACM international conference on Multimedia

The ever-increasing user-generated contents in social media and other web services make it highly desirable to discover opinions of users on all kinds of topics. Motivated by the assumption that individual word and paragraph in documents will deliver ...
Read More
Text, Topics, and Turkers: A Consensus Measure for Statistical Topics
HT '15: Proceedings of the 26th ACM Conference on Hypertext & Social Media

Topic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of ...
Read More
Group topic model: organizing topics into groups
Abstract
Latent Dirichlet allocation defines hidden topics to capture latent semantics in text documents. However, it assumes that all the documents are represented by the same topics, resulting in the “forced topic” problem. To solve this problem, we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
K-CAP '21: Proceedings of the 11th Knowledge Capture Conference
December 2021
300 pages
ISBN:9781450384575
DOI:10.1145/3460210
General Chair:
Anna Lisa Gentile
IBM Research Almaden, USA
,
Program Chair:
Rafael Gonçalves
Center for Computational Biomedicine, Harvard Medical School, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 December 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
common sense knowledge
interpretable topics
topic modeling
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate55of198submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 85
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discovering Interpretable Topics by Leveraging Common Sense Knowledge

K-CAP '21: Proceedings of the 11th Knowledge Capture Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Jointly Discovering Fine-grained and Coarse-grained Sentiments via Topic Modeling

Text, Topics, and Turkers: A Consensus Measure for Statistical Topics

Group topic model: organizing topics into groups