Sentence Analogies: Linguistic Regularities in Sentence Embeddings

Xunjie Zhu, Gerard de Melo


Abstract
While important properties of word vector representations have been studied extensively, far less is known about the properties of sentence vector representations. Word vectors are often evaluated by assessing to what degree they exhibit regularities with regard to relationships of the sort considered in word analogies. In this paper, we investigate to what extent commonly used sentence vector representation spaces as well reflect certain kinds of regularities. We propose a number of schemes to induce evaluation data, based on lexical analogy data as well as semantic relationships between sentences. Our experiments consider a wide range of sentence embedding methods, including ones based on BERT-style contextual embeddings. We find that different models differ substantially in their ability to reflect such regularities.
Anthology ID:
2020.coling-main.300
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3389–3400
Language:
URL:
https://aclanthology.org/2020.coling-main.300
DOI:
10.18653/v1/2020.coling-main.300
Bibkey:
Cite (ACL):
Xunjie Zhu and Gerard de Melo. 2020. Sentence Analogies: Linguistic Regularities in Sentence Embeddings. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3389–3400, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Sentence Analogies: Linguistic Regularities in Sentence Embeddings (Zhu & de Melo, COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.300.pdf
Data
MultiNLISNLI