Elsevier

Knowledge-Based Systems

Volume 187, January 2020, 104819
Knowledge-Based Systems

Linear transformations for cross-lingual semantic textual similarity

https://doi.org/10.1016/j.knosys.2019.06.027Get rights and content

Highlights

  • Linear transformations project monolingual semantic spaces into a shared space.

  • We propose a new transformation outperforming others in the cross-lingual STS task.

  • We extend unsupervised STS methods by the word weighting.

  • Our approach achieves promising results on several datasets in different languages.

Abstract

Cross-lingual semantic textual similarity systems estimate the degree of the meaning similarity between two sentences, each in a different language. State-of-the-art algorithms usually employ machine translation and combine vast amount of features, making the approach strongly supervised, resource rich, and difficult to use for poorly-resourced languages.

In this paper, we study linear transformations, which project monolingual semantic spaces into a shared space using bilingual dictionaries. We propose a novel transformation, which builds on the best ideas from prior works. We experiment with unsupervised techniques for sentence similarity based only on semantic spaces and we show they can be significantly improved by the word weighting. Our transformation outperforms other methods and together with word weighting leads to very promising results on several datasets in different languages.

Introduction

Semantic textual similarity (STS) systems estimate the degree to which two textual fragments (e.g., sentences) are semantically similar to each other. STS systems are usually evaluated by human judgments. The ability to compare two sentences in meaning is one of the core parts of natural language understanding (NLU), with applications ranging across machine translation, summarization, question answering, etc.

SemEval (International Workshop on Semantic Evaluation) has held the STS shared tasks annually since 2012. During this time, many different datasets and methods have been proposed. Early methods focused mainly on surface form of sentences and employed various word matching algorithms [1]. Han et al. [2] added distributional word representations and WordNet, achieving the best performance at SemEval 2013. Word-alignment methods introduced by Sultan et al. [3] yielded the best correlations at SemEval 2014 and 2015. Nowadays, the best performance tends to be obtained by careful feature engineering combining the best approaches from previous years together with deep learning models [4], [5].

Lately, research in NLU is moving beyond monolingual meaning comparison. The research is motivated mainly by two factors: (a) cross-lingual semantic similarity metrics enable us to work in multilingual contexts, which is useful in many applications (cross-lingual information retrieval, machine translation, etc.) and (b) it enables transferring of knowledge between languages, especially from resource-rich to poorly-resourced languages. In the last two years, STS shared tasks [6], [7] have been extended by cross-lingual tracks. The best performing systems [5], [8] first employ an off-the-shelf machine translation service to translate input sentences into the same language and then apply state-of-the-art monolingual STS models. These highly-tuned approaches rely on manually annotated data, numerous resources, and tools, which significantly limit their applicability for poorly-resourced languages. Unlike the prior works, we study purely unsupervised STS techniques based on word distributional-meaning representations as the only source of information.

The fundamental assumption (Distributional Hypothesis) is that two words are expected to be semantically similar if they occur in similar contexts (they are similarly distributed across the text). This hypothesis was formulated by Harris [9] several decades ago. Today it is the basis of state-of-the-art distributional semantic models [10], [11], [12]. Unsupervised methods for assembling word representations to estimate textual similarity have been proposed in [8], [13], [14]. We describe them in detail in Section 2.

Several approaches for inducing cross-lingual word semantic representation (i.e., unified semantic space for different languages) have been proposed in recent years, each requiring a different form of cross-lingual supervision [15]. They can be roughly divided into three categories according to the level of required alignment: (a) document-level alignments [16], (b) sentence-level alignments [17], and (c) word-level alignments [18].

We focus on the last case, where a common approach is to train monolingual semantic spaces independently of each other and then to use bilingual dictionaries to transform semantic spaces into a unified space. Most related works rely on linear transformations [18], [19], [20], [21] and profit from weak supervision. Vulić and Korhonen [22] showed that bilingual dictionaries with a few thousand word pairs are sufficient. Such dictionaries can be easily obtained for most languages. Moreover, the mapping between semantic spaces can be easily extended to a multilingual scenario (more than two languages) [23].

In the very last years, the first attempts to unsupervised bilingual dictionary induction were introduced in [24], [25], [26]. These methods exploit the structural similarities across monolingual semantic spaces and automatically infer the cross-lingual mapping. Word translation experiments show that the automatically induced bilingual dictionaries are of high quality.

This paper investigates linear transformations for cross-lingual STS. We see three main contributions of our work:

  • We propose a new linear transformation, which outperforms others in the cross-lingual STS task on several datasets.

  • We extend previously published methods for unsupervised STS by word weighting. This leads to significantly better results.

  • We provide thorough comparison of several linear transformations and several methods for STS.

This paper is organized as follows. In Section 2, we start with description of STS techniques based on combining word representations. The process of learning cross-lingual word representations via linear transformations is explained in Section 3. We propose our transformation in Section 4. We show our experimental results in Section 5 and conclude in Section 6.

Section snippets

Semantic textual similarity

Let wV denote the word, where V is a vocabulary. Let S:VRd be a semantic space, i.e., a function which projects word w into Euclidean space with dimension d. The meaning of the word w is represented as a real-valued vector vw=S(w). We assume bag-of-words principle and represent the sentence as a set (bag) s={wV}, i.e., the word order has no role. Note we allow repetitions of the same word in the sentence (set). Given two sentences sx and sy, the task is to estimate their semantic similarity s

Linear transformations between semantic spaces

A linear transformation between semantic spaces can be expressed as Sxy(wx)=TxySx(wx),i.e., as a multiplication by a matrix TxyRd×d. Linear transformation can be used to perform affine transformations (e.g., rotation, reflection, translation, scaling, etc.) and other transformations (e.g., column permutation) [29].1

Current issues

Lazaridou et al. [20] mentioned overfitting as the problem of current linear transformations for cross-lingual spaces (including the one they introduced). Several authors extended their learning objectives with L2 regularization term forcing the values in T towards zero. Xing et al. [31] and Artetxe et al. [21] experimented with orthogonality constraints on T forcing all vectors in T to be orthonormal.

Radovanović et al. [33] defined hubness as one of the curses of dimensionality. Lazaridou et

Settings

We experiment with all five techniques for linear mapping (all described in Sections 3 Linear transformations between semantic spaces, 4 Proposed transformation), namely, Least Squares Transformation (LS), Orthogonal Transformation (OT), Canonical Correlation Analysis (CCA), Ranking Transformation (RT), and proposed Orthogonal Ranking Transformation (ORT). We combine word representations to estimate semantic textual similarity by all three methods described in Section 2, i.e., Linear Combination

Conclusion and future work

In this paper we investigated linear transformations to create cross-lingual semantic spaces. We introduced a new transformation, which reduces the hubness in semantic spaces. We used three (previously published) approaches to combine information from word representations. We showed all three approaches can be rapidly improved by a word weighting. Our STS system does not require sentence similarity supervision and the only cross-lingual information is a bilingual dictionary.

We evaluated on

Acknowledgments

This work was supported from ERDF “Research and Development of Intelligent Components of Advanced Technologies for the Pilsen Metropolitan Area (InteCom)” (No.: CZ.02.1.01/0.0/0.0/17_048/0007267). Computational resources were provided by the CESNET LM2015042 and the CERIT Scientific CloudLM2015085, provided under the program “Projects of Large Research, Development, and Innovations Infrastructures”. Lastly, we would like to thank the anonymous reviewers for their insightful feedback.

References (39)

  • BärD. et al.

    UKP: Computing semantic textual similarity by combining multiple content similarity measures

  • HanL. et al.

    Umbc_ebiquity-core: Semantic textual similarity systems

  • SultanM.A. et al.

    Dls@cu: Sentence similarity from word alignment and semantic vector composition

  • RychalskaB. et al.

    Samsung Poland nlp team at SemEval-2016 task 1: Necessity for diversity; combining recursive autoencoders, wordnet and ensemble methods to measure semantic similarity

  • TianJ. et al.

    Ecnu at semeval-2017 task 1: Leverage kernel-based traditional nlp features and neural networks to build a universal model for multilingual and cross-lingual semantic textual similarity

  • AgirreE. et al.

    Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation

  • CerD. et al.

    Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation

  • BrychcínT. et al.

    UWB at Semeval-2016 task 1: Semantic textual similarity using lexical, syntactic, and semantic information

  • HarrisZ.

    Distributional structure

    Word

    (1954)
  • MikolovT. et al.

    Efficient estimation of word representations in vector space

    (2013)
  • PenningtonJ. et al.

    Glove: Global vectors for word representation

  • BojanowskiP. et al.

    Enriching word vectors with subword information

    Trans. Assoc. Comput. Linguist.

    (2017)
  • MuJ. et al.

    Representing sentences as low-rank subspaces

  • GlavašG. et al.

    A resource-light method for cross-lingual semantic textual similarity

    Knowl.-Based Syst.

    (2017)
  • UpadhyayS. et al.

    Cross-lingual models of word embeddings: An empirical comparison

  • VulićI. et al.

    Bilingual distributed word representations from document-aligned comparable data

    J. Artificial Intelligence Res.

    (2016)
  • LevyO. et al.

    A strong baseline for learning cross-lingual word embeddings from sentence alignments

  • MikolovT. et al.

    Exploiting similarities among languages for machine translation

    (2013)
  • FaruquiM. et al.

    Improving vector space word representations using multilingual correlation

  • Cited by (20)

    • A comparative study of cross-lingual sentiment analysis

      2024, Expert Systems with Applications
    • Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval

      2022, Knowledge-Based Systems
      Citation Excerpt :

      In recent years, various word embedding methods were proposed [10–13]. While most of them are trained on single language data sets, unsupervised approaches for aligning embeddings in a common vector space were explored [14–20]. The aligned word embeddings are then used to either (1) translate the query into the target languages and use traditional IR methods to find relevant documents, or (2) create a vector representation of both the query and document and calculate the cosine distance of the two vectors [21].

    • Cross-language text alignment: A proposed two-level matching scheme for plagiarism detection

      2020, Expert Systems with Applications
      Citation Excerpt :

      However, it is only possible to compare it with the model Ehsan et al. (2018) whose performance is reported on the PAN-PC-12 dataset. The third experiment is done on SemEval 2017 STS datasets and compared with state-of-the-art model in cross-lingual semantic textual similarity (Brychcín, 2020; Tian et al., 2017). Finally, the performance of the proposed text alignment scheme is analyzed on both PAN-PC-11 and PAN-PC-12 datasets in comparison with the models presented in Ehsan et al. (2018) and Franco-Salvador, Gupta et al. (2016) and the best-performing methods of PAN-2012 (Leilei et al., 2012) and PAN-2014 (Sanchez-Perez et al., 2014) competitions.

    • Novel metrics for computing semantic similarity with sense embeddings

      2020, Knowledge-Based Systems
      Citation Excerpt :

      Semantic similarity is a long-standing topic of investigation (see, e.g., [7–9]), but it is in the last few years that it has emerged as a central one: historically, this phenomenon is related to various aspects, such as the growing needs for elaborating natural language at large, and the wide availability of high quality word embeddings. The semantic similarity task is central for information retrieval [10], question answering [11], text summarization [12], automatic translation [13], and for solving word sense disambiguation [14,15], short text comparisons [16], also in multi- and cross-lingual setting [17,18]. Semantic similarity can be addressed at different linguistic levels, such as the sense level (or word meaning), and the term level (or word form) [19].

    View all citing articles on Scopus

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.06.027.

    View full text