Compressing and Interpreting Word Embeddings with Latent Space Regularization and Interactive Semantics Probing

Li, Haoyu; Wang, Junpeng; Zheng, Yan; Wang, Liang; Zhang, Wei; Shen, Han-Wei

doi:10.1177/14738716221130338

Computer Science > Human-Computer Interaction

arXiv:2403.16815 (cs)

[Submitted on 25 Mar 2024]

Title:Compressing and Interpreting Word Embeddings with Latent Space Regularization and Interactive Semantics Probing

Authors:Haoyu Li, Junpeng Wang, Yan Zheng, Liang Wang, Wei Zhang, Han-Wei Shen

View PDF HTML (experimental)

Abstract:Word embedding, a high-dimensional (HD) numerical representation of words generated by machine learning models, has been used for different natural language processing tasks, e.g., translation between two languages. Recently, there has been an increasing trend of transforming the HD embeddings into a latent space (e.g., via autoencoders) for further tasks, exploiting various merits the latent representations could bring. To preserve the embeddings' quality, these works often map the embeddings into an even higher-dimensional latent space, making the already complicated embeddings even less interpretable and consuming more storage space. In this work, we borrow the idea of $\beta$VAE to regularize the HD latent space. Our regularization implicitly condenses information from the HD latent space into a much lower-dimensional space, thus compressing the embeddings. We also show that each dimension of our regularized latent space is more semantically salient, and validate our assertion by interactively probing the encoding-level of user-proposed semantics in the dimensions. To the end, we design a visual analytics system to monitor the regularization process, explore the HD latent space, and interpret latent dimensions' semantics. We validate the effectiveness of our embedding regularization and interpretation approach through both quantitative and qualitative evaluations.

Subjects:	Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2403.16815 [cs.HC]
	(or arXiv:2403.16815v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2403.16815
Journal reference:	Information Visualization (2023), 22(1), 52-68
Related DOI:	https://doi.org/10.1177/14738716221130338

Submission history

From: Haoyu Li [view email]
[v1] Mon, 25 Mar 2024 14:38:55 UTC (5,330 KB)

Computer Science > Human-Computer Interaction

Title:Compressing and Interpreting Word Embeddings with Latent Space Regularization and Interactive Semantics Probing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:Compressing and Interpreting Word Embeddings with Latent Space Regularization and Interactive Semantics Probing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators