SPECIFIC PHRASEMES WITH ETHNONYMS AND THEIR STUDY BY CORPUS ANALYSIS

The focus of our paper is phrasemes comprising an ethnonym in the source language (French) as well as the target language (Slovak). Firstly, we will classify them according to the type of equivalence of phrasemes (total equivalent, partial equivalent and phrasemes without equivalent), then all the researched phrasemes from the point of view of general and professional language as general and professional phrasemes. With the help of the quantitative methods of corpus linguistics such as relative frequency and the logDice association measure, we will try to determine the extent of their presentation and the extent of their specificity in four monolingual corpora, each of them comprising from approx. 130 million to approx. 10 billion words. Based on the corpus analysis, it is possible to propose which of the phrasemes could be ranked high when searching for the entries of their individual components or a phraseme as a whole, and thus contribute to supporting the creation of current French and Slovak lexicography as well as phraseography. The aim of the article is a corpus analysis of the specific type of phrasemes, i.e. ethnonyms in French and Slovak. Research objectives are set to find the frequency of phrasemes in various types of texts and the extent of their specificity in each of the corpora.


Introduction
The focus of our paper is phrasemes comprising an ethnonym in the source language (French) as well as the target language (Slovak). Firstly, we will classify them according to the type of equivalence of phrasemes (total equivalent, partial equivalent and phrasemes without equivalent), then all the researched phrasemes from the point of view of general and professional language as general and professional phrasemes. With the help of the quantitative methods of corpus linguistics such as relative frequency and the logDice association measure, we will try to determine the extent of their presentation and the extent of their specificity in four monolingual corpora, each of them comprising from approx. 130 million to approx. 10 billion words. Based on the corpus analysis, it is possible to propose which of the phrasemes could be ranked high when searching for the entries of their individual components or a phraseme as a whole, and thus contribute to supporting the creation of current French and Slovak lexicography as well as phraseography.
The aim of the article is a corpus analysis of the specific type of phrasemes, i.e. ethnonyms in French and Slovak. Research objectives are set to find the frequency of phrasemes in various types of texts and the extent of their specificity in each of the corpora.

Research methods
Due to the heterogeneity of data, the imbalance of sources and insufficient representativeness, we use hybrid methods and sources such as the analysis of static and dynamic data. The analysis of static data is based on the actual linguistic and lexicographic description of researched phenomena; e.g. studies, dictionaries. The analysis of dynamic data researches into the real functioning of language phenomena in the corpus, which means that we rely on corpus linguistics (Hunston, 2006, p. 244) 1 , and we analyse the collected data in four monolingual text corpora. There are two corpora in French: I) the corpus frTenTen12 comprising approx. 10 billion words -on the webpage https://the.sketchengine.co.uk/ and II) the corpus Emolex comprising approx. 137 million words -on the webpage http://emolex.u-grenoble3.fr/ emoBase/), and two corpora in Slovak: I) the corpus skTenTen11 comprising approx. 540 million wordson the webpage https://the.sketchengine.co.uk/ and II) the Slovak National Corpus named prim-7.0public-all comprising approx. 1 billion words -on the page http://korpus.juls.savba.sk/. Subsequently, we have used the quantitative methods of corpus linguistics such as the relative frequency and the logDice association measure.

Comparison of corpora
Each of the four above-mentioned corpora has some quantitative (the size of corpus: frTenTen12, skTenTen11, prim-7.0-public-all; the diversity of language styles: frTenTen12, skTenTen11, prim-7.0-public-all) and/or qualitative advantages (orthographic cleaning up of texts: prim-7.0-publicall, Emolex; used functions: frTenTen12, skTenTen11, prim-7.0-public-all; precise stylistic determination of texts: Emolex, prim-7.0-public-all) compared to the other three corpora, and thanks to this heterogeneity of corpora, the analysis of collected phrasemes can be at a better level of excellence.

Quantitative methods of corpus linguistics
We have used two quantitative methods for the analysis of individual phrasemes: the relative frequency and the logDice association measure. The relative frequency enables the mutual comparison of individual corpora. On the contrary, it is not possible to compare the values of the logDice association measure between several corpora but only inside one corpus.

Frequency 2
Frequency is considered one of the fundamental indicators to assess the importance of a researched phenomenon and it points out the frequency of the occurrence of phenomena in the corpus. We differentiate between absolute frequency and relative frequency. The absolute frequency represents the number of the occurrence of phenomena in the corpus as a whole (Tables No. 1 and No. 2), which shows how many times a phraseme occurs in a corpus; e.g. the phraseme roulette russe in the corpus frTenTen12 (occurred 4,499 times). The relative frequency represents the absolute frequency averaged over the corpus as a whole (in tokens), i.e. 1 million tokens, which means the frequency of a phenomenon per 1 million occurrences (Tables No. 1 and No. 2). It is possible to calculate according to the following formula: Based on the comparison of relative frequency values we can find out, which of the researched phrasemes occur in the corpus more frequently and thus those that are statistically more significant compared to the other collected phrasemes.

LogDice 3 association measure
Currently, there are many association measures such as logDice, MI-Score, MI3, log-likelihood, T-score (all the association measures can be used for searching a collocation with the Collocations function in the corpora frTenTen12, skTenTen12 and prim-7.0public-all, and in all the three corpora), the logDice association measure is pre-set, and we decided to use it when analysing the collected phrasemes (Tables No. 3 and No. 4). This association measure shows values of statistical specificity for each phraseme. The highest their value, the highest their rank.
The logDice measure depends on the frequency of the words x and y and on the frequency of the bigram xy, i.e. N (the size of corpus in tokens) is not present for the calculation. The value of logDice can be from 14 to minus infinity. According to P. Rychlý (2006, p. 9), "comparing two scores, plus 1 point means twice as often collocation, plus 7 points means roughly 100 times frequent collocation." Based on the comparison of logDice values inside one corpus we can find out which of the researched phrasemes have the highest logDice value and thus those that are statistically more significant compared to the other collected phrasemes.

General phraseology
We can see diversity in French linguistics with respect to the name of the basic unit in phraseology as well as to its definition. More frequent are terms such as locution figée, locution figurée, expression figurée or expression figée, less frequent -the terms unité phraséologique, phrasème or phraséologisme. There also exist many definitions characterising the basic unit of phraseology, but we prefer the approach of L. Rosenbaum Franková (2010, p. 27), who based on the definitions of various francophone phraseologists such as G. Gross (1996), G. Gréciano (1993Gréciano ( , 1997, I. González-Rey (2002), S. Mejri (2000) who compiled the defining characteristics of general phrasemes. There are fundamental and additional characteristics. The fundamental defining characteristics include: polylexicality, stability and semantic opacity, and additional defining characteristics: figurativeness, iconicity, reproducibility, anomaly, expressiveness and untranslatability.
In Slovak phraseology, two terms dominate to indicate the fundamental unit of phraseology, i.e. phraseme and phraseologism and its definition is relatively stable. According to J. Mlacek, P. Ďurčo et al (1995, p. 28), phraseme and phraseologism represent a "specific type of stable multi-word expressions typical of expressiveness and figurativeness, the parts of which have been fully or partially desemantized." However, J. Mlacek (2007, p. 21) adds that "individual characteristics or the groups of the characteristics of phrasemes should be considered in relation to other principles that in phraseology and its functioning are mainly determined in new literature such as the principle of language economy, the principle of motive, the principle of syncretism, the principle of the secondariness of phraseological denomination, the principle of functional separation, the principle of transposition, the principle of analogy and idiomaticity, the principle of shaping
Based on the definition of a professional phraseme by francophone phraseologists (G. Greciano, L. Gautier) and Slovak phraseologists (P. Ďurčo, J. Mlacek, Ľ. Mešková, M. Olejárová), the author L. Rosenbaum Franková (2010, p. 59) suggested the common characteristics of professional phrasemes, and this author characterised it as the "phraseological unit regularly occurring in professional texts with the typical characteristics of phraseme such as figurativeness or expressiveness, but at the same time taking the denominative function of the term". Thus, she included the presence of the term, the figurativeness or imagery of the meaning of at least one part of the phraseme among the fundamental defining characteristics of the professional phraseme. However, contrary to general phrasemes (J. Mlacek, P. Ďurčo et al, 1995, p. 95), professional phrasemes "neither have such a weakened reference part of the meaning nor have they an expressive pragmatic (specifically expressive and evaluating) component of the meaning". Some professional phrasemes might seem just terms, as if the boundary was not clear enough, but at least one phraseme's component of the professional phraseme should have a figurative or metaphoric meaning.
With respect to the selection of general and professional phrasemes, our paper relies on the approach of defining characteristics according to L. Rosenbaum Franková.

So-called mixed, general-professional phrasemes
Furthermore, there is another type of phrasemes that can be added to general and professional phrasemes, the so-called mixed phraseme. It is a phraseme with at least two meanings, i.e. the meaning of the general phraseme, but also possibly the meaning of the professional phraseme. For example, the phraseme roulette russe [Russian roulette literally also Russian roulette] within the meaning of the general phraseme means a dangerous and risky solution, and within the meaning of the professional phraseme represents a type of suicidal game with a revolver and bullets.

Phrasemes and ethnonym
We have analysed a specific type of phrasemes comprising an ethnonym, which is a noun or adjective identifying a nation, ethnicity, tribe or an inhabitant in some place. For

Equivalence of phrasemes
Equivalence represents one of the fundamental issues for contrastive linguistics and translation. The equivalence of phrasemes can be considered from the qualitative or quantitative points of view (P. Ďurčo 2012, p. 91). We have decided to analyse its qualitative aspect and we distinguish three types of equi- Partial phraseological equivalence (where we distinguish partial equivalent phrasemes with the same ethnonym from partial equivalent phrasemes without ethnonym), III. Phrasemes without equivalent, but our attention is on the first two types.
I. Total equivalent phrasemes that comprise the same ethnonym in French and in Slovak; the compared phrasemes have the same component structure and the same motive (at a formal, lexical, semantic and figurative level). There are nineteen occurrences of this: We have analysed the first two types in four corpora, i.e. both general and professional phrasemes which comprise the ethnonym in French as well as in Slovak. The first type comprises nineteen phrasemes, the second type -eight phrasemes, total -twentyseven phrasemes. We have verified each of the phrasemes in all four corpora, but we only refer to such examples we consider to be more difficult for comprehension.
Based on the analysis of the results from the first and the second type of phrasemes, we state with respect to French and Slovak culture the following: I. The first type is twice larger compared to the second type; II. After summarising all the phrasemes in both types, there have been twice more professional phrasemes (eighteen phrasemes) than general (seven phrasemes); III. We found out two so-called mixed phrasemes (the phraseme roulette russe -ruská ruleta and the phraseme douche écossaise -1. studená sprcha, 2. škótska sprcha). As mentioned under Point 3.3. So-called general-professional phrasemes, the phraseme roulette russe -ruská ruleta (Russian roulette) within the meaning of general phraseme is a dangerous and risky solution and within the meaning of the professional phraseme -a type of suicidal game with revolver and bullets. The phraseme douche écossaise within the meaning of the general phraseme means a nasty surprise, the Slovak equivalent being studená sprcha (literally cold shower) and within the meaning of the professional phraseme representing a type of shower where cold and hot water can be alternated, and the Slovak equivalent is škótska sprcha (literally Scottish shower); IV. In the first type, there are seven times more (fifteen phrasemes) professional phrasemes compared to general (two phrasemes) or mixed (two phrasemes); V. In the second type almost twice more general phrasemes (five phrasemes) than professional (three phrasemes).

Results linked with relative frequency
According to the absolute frequency of phrasemes in the corpus and based on the size of every corpus (in tokens), we have calculated the relative frequency of phraseme, i.e. the frequency per 1 million (Tables No. 1 and No.2). The first nine or ten phrasemes with the highest frequency from both types are interesting for our assessment (these measures of specificity are highlighted in grey in Tables No. 1

and No. 2).
On the basis of the analysis of the results from the first and the second type of phrasemes we conclude the following for French and Slovak culture: I. If phrasemes occur minimum in two corpora at the same time, we evaluate them as statistically significant with respect to their frequency, specifically ten phrasemes: roulette russe, grippe espagnole, massage thaïlandais, gazon anglais, chiffre romain, droit romain, football américain, couteau suisse, médecine chinoise, (d'un) calme olympien; II. The vast majority of phrasemes with the highest frequency are professional phrasemes.

Results linked with the logDice association measure
We have calculated the logDice association measure (Tables No. 3 and No. 4) according to the absolute frequency of the individual components of phrasemes x and y and the absolute frequency of bigram xy. The first nine phrasemes of each corpus with the highest level of specificity from both types are interesting for the evaluation (the measures of specificity are highlighted in grey in Tables No. 3

and No. 4).
On the basis of the analysis of the results from the first and the second type of phrasemes we conclude the following for French and Slovak culture: I. If phrasemes occur minimum in two corpora at the same time, we evaluate them as statistically significant with respect to the measure of their specificity, in particular thirteen phrasemes: roulette russe grippe espagnole, massage thaïlandais, filer à l'anglaise, comme une horloge/ une montre suisse, bain turc, football américain, couteau suisse, médecine chinoise, (d'un) calme olympien, c'est du chinois, c'est de l'hébreu, assis en indien; II. Approximately one half of the phrasemes with the highest logDice are general phrasemes, the other half are professional phrasemes, and there is also one mixed phrasemes roulette russe.

Conclusions
Thanks to the quantitative methods of corpus linguistics, the relative frequency and the logDice association measure, we have presented the measures of the frequency and specificity of individual phrasemes, and sixteen phrasemes show the highest values: roulette russe -ruská ruleta, grippe espagnole -španielska chrípka, massage thaïlandais -thajská masáž, filer à l'anglaiseodísť po anglicky, comme une horloge/ une montre suisse -ako švajčiarske hodinky, bain turc -turecký kúpeľ, football américain -americký futbal, couteau suisse -švajčiarsky nožík, médecine chinoisečínska medicína, (d'un) calme olympien -(s) kľudom Angličana, c'est du chinois -(to je) španielska dedina, c'est de l'hébreu -to je španielska dedina, assis en indien -turecký sed, gazon anglais -anglický trávnik, chiffre romain -rímska číslica, droit romain -rímske právo. Based on the corpus analysis we can propose that it should be useful to place these sixteen abovementioned phrasemes at the forefront when compiling French-Slovak dictionaries (i.e. to place them in such a way that they will draw the attention of a dictionary user then looking up a phraseme as a whole or its individual components). We see usefulness of the obtained results in supporting the creation of the actual lexicography and phraseology of general and professional French and Slovak dictionaries as well as in bilingual French-Slovak and Slovak-French dictionaries where the results of this research can be helpful for updating and compiling entries in these dictionaries.
Furthermore, the phrasemes of the second type, which are relatively more frequent such as c'est du chinois, c'est de l'hébreu, (d'un) calme olympien, assis en indien, clé anglaise, canne anglaise, can be used from the didactic point of view as a contrastive factor because they are easily interchangeable with phrasemes in the target language.

I. Dinžíková
This article studies the phrasemes comprising an ethnonym in the source language (French) as well as the target language (Slovak). This approach is contrastive and the phrasemes have been classified according to the type of equivalence (total equivalent, partial equivalent and phrasemes without equivalent). The aim of the research was to analyse 27 phrasemes with the help of the corpus linguistics method (relative frequency and logDice association measure), and four monolingual corpora (the corpora frTenTen12, skTenTen11, Emolex, prim-7.0-public-all) with approx. 130 million up to approx. 10 billion words in each of them, so it is a fairly wide range of language materials.
Firstly, we focus on the current state of French and Slovak phraseology. We present the distribution of phrasemes into three types: general, professional and so-called mixed (of which the last type represents our own proposition). Then, by translating the source language-culture into the target language-culture, we demonstrate the three basic types of phrasemes equivalence but our attention is on the first two types. Afterwards, we present quantitative methods of corpus linguistics (four