Repository logo
 

Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis

Published version
Peer-reviewed

Type

Conference Object

Change log

Authors

Vulic, Ivan 
McCarthy, Diana 
Korhonen, Anna 

Abstract

We present the first evaluation of the applicability of a spatial arrangement method (SpAM) to a typologically diverse language sample, and its potential to produce semantic evaluation resources to support multilingual NLP, with a focus on verb semantics. We demonstrate SpAM’s utility in allowing for quick bottom-up creation of large-scale evaluation datasets that balance cross-lingual alignment with language specificity. Starting from a shared sample of 825 English verbs, translated into Chinese, Japanese, Finnish, Polish, and Italian, we apply a two-phase annotation process which produces (i) semantic verb classes and (ii) fine-grained similarity scores for nearly 130 thousand verb pairs. We use the two types of verb data to (a) examine cross-lingual similarities and variation, and (b) evaluate the capacity of static and contextualised representation models to accurately reflect verb semantics, contrasting the performance of large language-specific pretraining models with their multilingual equivalent on semantic clustering and lexical similarity, across different domains of verb meaning. We release the data from both phases as a large-scale multilingual resource, comprising 85 verb classes and nearly 130k pairwise similarity scores, offering a wealth of possibilities for further evaluation and research on multilingual verb semantics.

Description

Keywords

Journal Title

Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)

Conference Name

28th International Conference on Computational Linguistics (COLING 2020)

Journal ISSN

Volume Title

Publisher

International Committee on Computational Linguistics
Sponsorship
European Research Council (648909)
ESRC (1804172)