Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis

Li, Zongxia; Mao, Andrew; Stephens, Daniel; Goel, Pranav; Walpole, Emily; Dima, Alden; Fung, Juan; Boyd-Graber, Jordan

Computer Science > Computation and Language

arXiv:2401.16348 (cs)

[Submitted on 29 Jan 2024 (v1), last revised 20 Feb 2024 (this version, v2)]

Title:Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis

Authors:Zongxia Li, Andrew Mao, Daniel Stephens, Pranav Goel, Emily Walpole, Alden Dima, Juan Fung, Jordan Boyd-Graber

View PDF HTML (experimental)

Abstract:Topic models are a popular tool for understanding text collections, but their evaluation has been a point of contention. Automated evaluation metrics such as coherence are often used, however, their validity has been questioned for neural topic models (NTMs) and can overlook a models benefits in real world applications. To this end, we conduct the first evaluation of neural, supervised and classical topic models in an interactive task based setting. We combine topic models with a classifier and test their ability to help humans conduct content analysis and document annotation. From simulated, real user and expert pilot studies, the Contextual Neural Topic Model does the best on cluster evaluation metrics and human evaluations; however, LDA is competitive with two other NTMs under our simulated experiment and user study results, contrary to what coherence scores suggest. We show that current automated metrics do not provide a complete picture of topic modeling capabilities, but the right choice of NTMs can be better than classical models on practical task.

Comments:	19 pages, 5 tables, 6 figures, Accepted to EACL Main Conference 2024
Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2401.16348 [cs.CL]
	(or arXiv:2401.16348v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2401.16348

Submission history

From: Zongxia Li [view email]
[v1] Mon, 29 Jan 2024 17:54:04 UTC (5,420 KB)
[v2] Tue, 20 Feb 2024 03:10:58 UTC (7,390 KB)

Computer Science > Computation and Language

Title:Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators