Federation Automated method of hyper-hyponymic verbal pairs extraction from dictionary definitions based on syntax analysis and word embeddings

This study is dedicated to the development of methods of computer-aided hyper-hyponymic verbal pairs extraction. The method’s foundation is that in a dictionary article the defined verb is commonly a hyponym and its definition is formulated through a hypernym in the infinitive. The first method extracted all infinitives having dependents. The second method also demanded an extracted infinitive to be the root of the syntax tree. The first method allowed the extraction of almost twice as many hypernyms; whereas the second method provided higher precision. The results were post-processed with word embeddings, which allowed an improved precision without a crucial drop in the number of extracted hypernyms.


Introduction
Studying and determining semantic relations has a special role in contemporary computational linguistics. The main domain of the application of semantic relations extraction is lexicography resources construction (e.g. electronic dictionaries and thesauri). Besides, different semantic relations are used in natural language processing and language teaching resources.
The number of semantic relations is large and the number of semantic units in a language is tremendous. This means that it is almost impossible to establish all links manually; this task requires some other solutions, including automated methods of semantic relations extraction. So, the following research is devoted to the problem of the elaboration of such a method for automated extraction of Russian hyper-hyponymic verbal pairs.
One of the problems is that there are no unified precise criteria of hyponymy (in particular troponymy); its definition is very subjective and depends on individual comprehension and linguistic experience.
Elena Kotsova describes the hypernym and hyponym as: 1. Hypernym a. Is more frequent in a natural language; b. Allows more active synonymic substitution of species words in speech; c. Can be in a role of hypernym in different grades and levels of a genus-species hierarchy; d. Is simpler in morphemic structure, has no nominal motivation; e. Cannot be a word with an utterly broad meaning. 2. Hyponym a. Has more specific meaning and can be divided into semantic features; usually it is a monosemantic word; b. Has a seme, which is important for hyponymic links; c. Is less frequent, especially for specific words with professional meaning; d. More rarely can substitute a word with genus meaning, only in its syntagmatic field; e. Has two-seme prototypic semantic structure (hyperseme + hyposeme); f. Is in equivalent relations with other hyponyms of this hyponymic group; g. Usually has more complex morphemic structure [Kotsova, 2010, p. 25-27]. The authors of Princeton WordNet formulate the idea of hyponymy among verbs as following: "…the many different kinds of elaborations that distinguish a 'verb hyponym' from its superordinate have been merged into a manner relation that [Fellbaum, Miller, 1990] have dubbed troponymy (from the Greek tropos, manner or fashion). The troponymy relation between two verbs can be expressed by the formula To V1 is to V2 in some particular manner" [Fellbaum, 1993, p. 47].
For example, the verbs плестись, брести, тащиться 'to trail, to plod, to trudge' are synonyms as they express the same notion and contain the same quantity of information. They are also interchangeable in a context. And the verb идти 'to go' is their hypernym as it has a broader meaning. It also matches all hypernym criteria described above and fits in the formula: to trail/ plod/trudge is to go in some particular manner.
The relevance of our research is determined by the lack of studies dedicated to automated extraction of relations of verbs. As it is shown in the Related Work section, the majority of research is focused on nouns. At the same time the methods applied to nouns in most cases do not work well with verbs. The elaborated method could facilitate many theoretical and applied issues in linguistics, in particular, automated filling of lexical resources.
Despite such abundance of research, studies on verbs are not easily found as long as most works are focused on nouns. Unlike the others' work [Gonçalo et al., 2010], which extracted different types of relations for four open grammatical categories (nouns, verbs, adjectives, adverbs). They obtained 58,362 pairs of synonyms for nouns and 30,180 pairs for verbs; 122,478 noun hypernyms and no verb hypernyms at all. These results and our ongoing research allow one to suggest that verbal hyponymy extraction demands some special research as long as methods developed for nouns do not go well with verbs.
The authors of [Goncharova, Cardenas, 2013] designed a method of extraction of hypo-hypernymic hierarchy of verbs from domain-specific Лингвистика ISSN 2500Лингвистика ISSN -2953 corpora. This method is based on the cognitive theory of terminology [Benitez et al., 2005]. The method is developed further in [Cardenas, Ramisch, 2019]. Firstly, authors automatically extracted noun-verb-noun triples from specialized corpora of environmental science texts in English and Spanish. Secondly, they manually annotated each triple with the lexical domain of the verbs and the semantic class and role of the noun. And lastly, they manually inferred the hypo-hypernymic hierarchy of the extracted verbs according to their syntactic potential: the more types of semantic subclasses of nouns a verb accept, the higher its position in the hierarchy. The method is very different from all the aforementioned ones because it focuses on domain-specific terminology and demands much more human effort. Therefore, despite being a useful tool for the creation of domain-specific ontologies, this method is hardly applicable to common language.
To the best of our knowledge there is no other research on hypernyms extraction for Russian verbs. Our research started from lexico-syntactic patterns [Hearst, 1992[Hearst, , 1998], or specific linguistic expressions or constructions which usually include both hyponym and hypernym in a context. For example, <hyponym> and other <hypernym>; <hypernym> such as <hyponym> and so on. Firstly, there was an attempt to find some typical lexico-syntactic patterns in corpus data, but it turned out to be inefficient for verbs. Even though hypernyms and hyponyms can both be found in the nearest context, we have failed to discover any regular patterns in corpus data.
We manually analysed more than 400 contexts for 100 hyper-hyponymic pairs and realised that these pairs fulfil the hyponymy function very rarely, no more than 5-6 examples among our set of contexts. For example, the sentence К тому же хотелось сучить ногами, вертеться, вообще -двигаться, хотя несколько минут назад он мечтал только об одном -лечь 'In addition, he wanted to curl his toes, to spin, generally -to move, although a few minutes ago he dreamed of only one thing -to lie down' includes a hyper-hyponymic pair вертеться/двигаться 'to spin / to move', but there is no regular pattern that can be applied to other texts to find other hyperhyponymic pairs. Also it turned out that hyper-hyponymic pairs more often play a role of contextual synonyms in texts [Ogorodnikova, 2017].
Secondly, we have tried to process dictionary data and find some lexicosyntactic patterns there as long as they commonly contain both hyponym and hypernym in one entry. Frequent and universal lexico-syntactic patterns are easily detected for nouns: <hyponym> -род/вид/разновидность/… 'class, sort, kind' <hypernym>. We have also failed to detect any similarly universal lexico-syntactic patterns for verbs. Nonetheless, it was noticed that in most cases a hypernym in a definition is accompanied with a repeating specifying word. We have called such words "lexical markers". Unfortunately, Rhema. Рема. 2020. № 2 the lexical markers drastically differ for different semantic groups of verbs. For example, such lexical markers as вверх/вниз 'up/down' are typical for verbs of movement and useless for verbs of speech. Those verbs are usually defined by such markers as громко / тихо, невнятно, отрывисто 'loudly / quietly, incomprehensibly, abruptly'. We have manually created a list of such markers for verbs of movement, automatically extracted hyper-hyponymic pairs from six dictionaries and manually evaluated them [Antropova, Ogorodnikova, 2019].
The method based on this idea showed a moderate precision of 0,61, but the coverage of the method depends on the list of markers, which has to be created separately for every semantic group. Manual creation of such lists is time-consuming and the task of automated creation does not seem to be much easier than the task of hyponym extraction itself.

Data
The study is mainly based on the material of dictionary definitions for verbs which were taken from seven Russian dictionaries: 1  , 1935-1940; 7. Linguistic Ontology Thesaurus RuThes. These dictionaries are available in electronic form, so they can be easily processed. Besides, they are well-known to Russian linguistics and present the fullest vocabulary.
To check the effectiveness of the proposed methods we used one hundred Russian verbs. This number of test units allows the estimation of the methods and it is possible to analyze all achieved results manually. The verbs were extracted from The Explanatory Dictionary of Russian Verbs because it contains a detailed semantic classification of verbs, and it allows the consideration of the difference between semantic groups as it can influence the result of our analysis. The verbs from different groups were taken proportionally according to rates in the dictionary. We then tested our methods on these verbs' definitions taken from all seven dictionaries.

Syntactic analysis of definitions
The creation of the method is possible because of traditional definition construction. According to [Komarova, 1990] and [Shelov, 2003] there are some typical classes of definitions. So, the main difference, which is significant for the purpose of the research, is that definitions can be extended or unextended. Extended definitions are usually based on hyponymy, meronymy, or contextual explanations (рвать -резким движением разделять на части 'to rip -to divide into parts with an abrupt movement'). Unextended definitions contain synonyms of an entry word or its derivatives referring to another entry (жульничать -плутовать, мошенничать 'to cheatto palter, to swindle'; defining perfective referring to its imperfective pair).
The most common type of semantic relations in verbal extended definitions is hyponymy. This speculation allows us to elaborate a method of automated verbal hyper-hyponymic pairs extraction.
So, a hypernym is usually expressed as an infinitive with dependent words. However, this rule still results in the extraction of some noise, as far as an infinitive can be used in different extending constructions. We suggested that it is possible to get rid of some noise by adding a rule that a target infinitive should be the root of syntax tree of the definition.
In order to implement these methods, we decided to use the UDPipe 1 pre-trained model to obtain syntax trees for the definitions. UDPipe offers 3 models for the Russian language. We chose the Russian model trained on SynTagRus because it provides better quality according to its authors' estimations. 2 On the basis of the model we created two methods of hypernyms extraction from dictionary definitions: 1. "InfsWithDependants". It extracts all the infinitives having dependent words in a given definition.
2. "RootInfWithDependants". It extracts the infinitive having dependent words in a given definition only if it is the root of the syntax tree.
So, the definition стучать -ударять (ударить) в дверь, окно коротким, отрывистым звуком, выражая этим просьбу впустить кого-л., куда-л. 'to knock -to hit a door, a window with short, choppy sounds wishing to let somebody in' can be processed differently. The second method allows the discover of only one infinitive ударять 'to hit', which is the true hypernym for стучать 'to knock', while the first one extracts two verbs ударять, впустить 'to hit, to let in', and впустить 'to let in' is an example of the noise.

Post-processing with word embeddings
As it is shown in Table 1, "InfsWithDependants" proved to find almost twice as many correct hypernyms as "RootInfWithDependants", whereas the second method delivers considerably higher precision. Thus, we devised an idea how to improve "InfsWithDependants" results by post-processing it with word embeddings.
A word embedding is a mathematical model of a language. It is based on the idea that similar words tend to appear in similar contexts. A word embedding is a trained neural network which transforms words into vectors (or points 3 ) in some N-dimensional semantic space: if the words appear in similar contexts, the points are close to each other in the space. Figure 1 shows an example of such a representation. For the current research it is important that word embeddings also allow the calculation of a similarity measure of given words, namely the cosine similarity, which scales from 0 (least similar) to 1 (most similar). For example, according to the embedding from Figure 1, cosine similarity of verbs глядеть 'to gaze' and смотреть 'to look' is 0.836, глядеть and делать 'to do' -0.310, глядеть and обладать 'to possess' -0.144. Models differ from each other by the following parameters: corpora used for the model training; part of speech (POS) tags used to distinguish homonymic parts of speech (e.g., if "go" is a verb or a noun); the size of the sliding window -the number of neighbourhood words taken into account; and a number of other technical parameters such as the learning algorithm or dimensionality. See [Kutuzov, Kuzmenko, 2017] for details.
We employed pre-trained embeddings from RusVectōrēs project. 4 The idea of the method is to drop the extracted candidate verb if its similarity with the defined verb is lower than a threshold.
We took "InfsWithDependants" results for the hundred verbs as a starting point and randomly divided them into test (30) and development (70) sets. Then, for each available RusVectōrēs model we did the following. First, we cleaned the development set from verbs absent in a model. Second, we performed 7-fold cross-validation on the development set in order to get a better estimation of a model and find the best threshold for it. After that, we compared all the models by their mean performance on cross-validation and chose the best and applied it to the test set. The embeddings are created by a RusVectōrēs model trained on Russian National Corpus and Wikipedia. Represented verbs: делать 'to do', работать 'to work', иметь 'to have', обладать 'to possess', мочь 'to be able to', уметь 'can', глядеть 'to gaze', видеть 'to see', смотреть 'to look' , глазеть 'to stare', слышать 'to hear'

Results and Discussion
"InfsWithDependants" and "RootInfWithDependants" were applied to all the definitions of the hundred verbs. The derived hypernyms were then manually marked for correctness. The evaluation results are summarized in Table 1. Obviously, "RootInfWithDependants" cannot extract more true hypernyms than "InfsWithDependants", as long as it simply adds one more filtering condition. Calculating actual recall is not possible because only the extracted infinitives were marked. Nonetheless, we can get some notion about the recall drop judging by the drop of the true positive rate. "InfsWithDependants" allows the extraction of almost twice as many true hypernyms as "RootInfWithDependants", but it demonstrates a significantly lower precision. Let us consider some typical mistakes arising during the syntax analysis of the definitions. For example, for the definition вертеться -совершать круговые движения; вращаться, крутиться 'to revolve -to carry out circular motions; to rotate, to spin' it marks крутиться 'to spin' as dependent from вращаться 'to rotate', whereas they actually are homogeneous and both have no dependent words, which is typical for synonyms in definitions and facilitates distinguishing them from hypernyms. Second, a common problem may be illustrated by the definition доходить -понимая и осознавая что-либо, разбираться/разобраться в чем-либо (в каком-либо сложном вопросе, запутанном деле и т.п.) 'to see the light -to figure something out understanding or realizing it (about challenging issue, complicated problem etc.)'. Here the model mistakenly marks the first verb of verbal adverbial construction as a root while it should have been the homogeneous verbs разбираться/разобраться 'to figure out'. Such mistakes might be avoided by customising the syntax model for our task. In further research this issue will be addressed.
The following example illustrates drawbacks of our methods. Заедатьподвергая что-л. (обычно какие-л. механизмы) отрицательному воздействию, зажимать/зажать, защемлять/защемить, зацеплять/зацепить какую-л. деталь, препятствуя движению, нормальному функционированию 'to jam -exposing negatively (usually some devices), to press, to squeeze, to hook a detail, so that movement or action is prevented'. Even if the syntax tree for this definition was perfect, our method does not allow the delineation of hypernyms from synonyms in case the latter have dependent words. The application of our method is also limited to extracting one-verb hypernyms only. Finding the exact boundaries of a multi-word hypernym is much more difficult. A frequent case of multi-word hypernym is the verb совершать 'to carry out' which can collocate with different specifying supplements. For instance, the definition to the verb вертеться 'to revolve' starts with an expression совершать движения 'to carry out motions' in many dictionaries. Лингвистика ISSN 2500Лингвистика ISSN -2953 As mentioned earlier, we decided to post-process the results of "InfsWithDependants" in order to improve its precision. Performance of every available RusVectōrēs model in "InfsWithDependants" on crossvalidation is shown in Table 2. It was possible to calculate recall for this case, because here we processed only the extracted hypernyms, which had been manually marked for correctness, so we knew exactly how many true hypernyms the set contained. When fitting the thresholds and choosing the best model we decided to rely on precision rather than F-score because precision for this task does not grow with the increase of the threshold. A typical graph for precision, recall and F-score resembles Figure 1. This shows that precision grows up only to some threshold, but then it decreases. It happens because the word embeddings that we used does not distinguish different meanings of words, thus combining all the meanings of a word into a single average vector. Therefore, if at least one word of a hyper-hyponymic pair is used not in its most frequent meaning, their similarity might be rather low. Rhema. Рема. 2020. № 2 Also, Figure 2 demonstrates that recall changes in a much wider span, thus having greater impact on F-score. We chose the third model (Tayga, Universal tags, Window Size = 2) from all the models presented in Table 2 because even though the sixth model (Araneum, No tags, Window Size = 5) has slightly higher precision, the first one has significantly higher recall. We applied the chosen model with the threshold found during cross-validation to the test set and compared it with the corresponding parts of "InfsWithDependants" and "RootInfWithDependants" results (see Table 3). In that way we managed to obtain the results with a higher true positive rate and precision than those of the "RootInfWithDependants" method.

Conclusion
A preliminary linguistic reflection allowed us to conclude that for verb hyponymy extraction, it is worth using dictionary definitions. In this kind of linguistic source, verbal hyper-and hyponyms occur together more frequently than in others (e.g. corpus data).
Our previous study also allowed us to conclude that lexico-syntactic patterns, widely used for the extraction of hyper-hyponymic pairs of nouns, do not fit for verbs because we were unable to find any verbal lexico-syntactic patterns neither in corpora nor in dictionary definitions. Therefore, some methods of extraction should be developed specifically for verbs.
The study shows that syntactic analysis of definitions is a good starting point for hyper-hyponymic verbal pairs extraction. We developed two methods based on syntactic analysis of definitions and applied them to seven Russian dictionaries. The first method extracted all infinitives that have dependants. The second method also demanded an extracted infinitive to be the root of the syntax tree. The use of pre-trained word embeddings from RusVectores project improved precision of the first syntax-based method without a crucial drop in the number of extracted true hypernyms, which allowed outperformance of the second syntax-based method in both precision and number of extracted true hypernyms.
Nonetheless, analysis of mistakes showed that the syntax model should be customised for our task to improve the results of the developed method. We will address these issues in future research.