The sound of size revisited - New insights from a German-Hungarian comparative study on sound symbolism

Languages consistently display sound symbolic effects. In our study we expand size- related sound symbolic research to German and Hungarian, languages with both rounded and unrounded high front vowels as well as/ø/, so that we can separate the effect of lip rounding and vowel backness. Our subjects had to rate arti ﬁ cial words with CVCVCV structure. For analysis we used crossed random effect models in a way that is novel in this ﬁ eld. Results con ﬁ rm previous ﬁ ndings: voiced consonants and back vowels are perceived to be larger. Additionally, we show that lip rounding increases size.


General overview
Sound symbolism can be defined as a non-arbitrary mapping between phonetic properties and the meaning of a word, such as the size or form of the object the word describes. As one of the first experiments in sound symbolism, Köhler (1929) conducted his famous test with shapes and words, in which subjects had to associate two artificial wordsdbaluma/maluma and taketedwith a spiky or a roundish drawing. In the same year, in one of Sapir's (1929) studies, 100 pairs of artificial wordsdsuch as mil/maldhad to be assigned to a small or a large referent. Köhler's study revealed very high rates of correspondence between maluma and round, and takete and spiky forms, while in Sapir's study, words that included/a/were paired with large objects and words that included/i/were usually assigned to small objects by the participants.
Research on size-sound mappings has shown that phonetic properties of (non)words can indicate the size of the object the (non)word describes. Experiments show that phonemes embedded in nonwords are associated with the properties of objects. Front vowels are associated with small shapes, while back vowels and voiced obstruents are linked to larger shapes (Dingemanse et al., 2016;Lupyan and Casasanto, 2015;Shinohara and Kawahara, 2010;Thompson and Estes, 2011). Furthermore, voiced obstruents are usually perceived to represent heaviness (e.g., Kawahara et al., 2018). Also, there is a relation between sounds and motion, e.g., back vowels are usually perceived as slow (Cuskley, 2013;Eddington and Nuckolls, 2019;Iwasaki et al., 2007;Saji et al., 2019;Shinohara et al., 2016). Moreover, Maglio et al. (2014) suggested a connection between vowels and the precision of mental construal of referents, where front (vs. back) vowels elicit the low (vs. high) level construal.
Our study concentrates on the relationship between size and sound.
1.2. Size mappings 1.2.1. Vowels The most frequent sound symbolic effect studied and found, on the one hand, is between high front vowels and perceived smallness, and on the other, is between low-back sounds and perceived large size (e.g., Sapir, 1929;Ertel, 1969;Peterfalvi, 1965;Shinohara and Kawahara, 2010;Thompson and Estes, 2011;Lupyan and Casasanto, 2015;Dingemanse et al., 2016). For example, Hoshi et al. (2019) tested the effect of front (/i/,/e/) and back vowels (/o/,/u/) for Japanese and German speakers with an Implicit Association Test. The nonwords with a CVCVCV structure were presented orally. In a complex research design, the nonwords were to be mapped to the picture of big (e.g., elephant) and small (e.g., cat) animals in congruent (nonword with back voweldlarge animal) and incongruent (nonword with back voweldsmall animal) settings, where reaction time differences were measured. Faster reaction times were expected in congruent situations. Their results confirmed that front vowels are associated with smallness, back vowels with largeness.
Results also show that cultural differences and language specific factors do not influence the perception of size compared to the tested phonetic characteristics, indicating that "semantic cross-modal associations have a biological rather than a sociocultural basis" (Hoshi et al., 2019: 25). However, the authors concentrated on vowels and did not consider consonants.
Sound symbolic effects of vowels can be proven in several languages, although some effects seem to be language specific, such as position/syllable structure (Elsen, 2017), vowel effects on the interpretation of beauty or pleasantness in Japanese (Iwasaki et al., 2007), or a relationship between vowel height and fastness in Japanese (Saji et al., 2019).
To summarize findings on vowels, we can rely on Lockwood and Dingemanse (2015). They reviewed sound symbolic literature across disciplinesdand showed that front vowels go with small shapes, while back vowels with large shapedby analyzing behavioral, developmental, and neuroimaging research literature. They also showed that high vowels go with smaller and low vowels with large shapes. Their findings suggest that the effects of roundedness have not yet been distinguished from those of backness, as in English, these two features are confounded. To our knowledge, we are the first to be able to separate their effect investigating the German and Hungarian languages.

Consonants
Consonants are important to sound symbolism, as well. In several experiments, colleagues (e.g., Klink, 2000, 2003) explored the perceived size of products governed by invented brand names. They found that fictional brand names containing front vowels were associated with smaller sized products, while those that contain back vowels are more likely to be associated with larger products. Further associations were found between fricatives and smaller sized products vs. stops; and between voiceless obstruents and smaller sized products vs. voiced obstruents. Some of Klink's results, however, were not proven in other researches focusing on different languages: Duduciuc and Ivan (2014), for example, did not find any influence of stops and fricatives when Romanian speakers evaluated artificial brand names.
Still, consonants are important, as Nielsen and Rendall (2011) pointed it out by showing that in the classic takete-malumaexperiment, not only vowels, but consonants and consonant-vowel combinations also played a crucial role in sound symbolic effects. In one study, Japanese speakers associated artificial verbs with initial voiced/less consonants and the walking of big/ small persons. They indicated that words starting with voiceless consonants were more even-spaced, feminine, and graceful (Kawahara et al., 2008).
However, it has been shown that English subjects were only sensitive to voiced obstruents, as there were no detected effects related to voiceless ones (Iwasaki et al., 2007). Shinohara, Uno, Kobayashi, and Odake (2017) tested the softness and the hardness of words with invented stimuli (VCVC). Japanese speakers associated voiced obstruents with hardness, while for English speakers, voiceless obstruents evoked hardness. In the case of Japanese Pokémon names, Kawahara et al. (2018) showed that voiced obstruents (/b/,/d/,/g/,/z/) imply largeness and heaviness. Kawahara and Kumagai (2019) suggested that the strong effects Japanese speakers experienced might be, in part, acquired. Shinohara et al. (2020) also showed that obstruents indicate higher acceleration, while sonorants indicate low acceleration.
The effect of voiced obstruents, however, is less clear for English speakers (Kawahara and Kumagai, 2019). In Saji et al.''s (2019: 14) study for Japanese speakers, the voiced/voiceless contrast was significant: English speakers associated speed and "energeticity" to voicingdin contrast to Japanese speakers, who associated voicing with size and weight (Saji et al., 2019: 16).
In some languages, one obstruent in a word is not enough for the effect: Brazilian Portuguese speakers need at least two voiced obstruents in a name to interpret the character as large (Godoy et al., 2019).

Possible explanations of sound symbolic effects
Several different explanations have been offered for sound symbolic phenomena. One suggestion is based on acoustic similarity. Velar and uvular sounds are related to strangling, retching, and unpleasantness, since they show acoustic similarity to the growling or snarling noises of dangerous animals. We transfer unpleasant or dangerous experiences to these sounds because of a similarity relation (e.g., Elsen, 2018;Whissell, 1999). Articulatory and kinesthetic reasons might be responsible for associations of vowels and size: when articulating/i/and similar sounds, the oral cavity is small and the position of the tongue is high, which gives us the feeling of smallness. For/a/and/o/the oral cavity is open, the position of the tongue is low (e.g., Peterfalvi, 1965;1970;Sapir, 1929). A further, neurological explanation is based on multisensory perceptions. Several authors, such as Parise and Spence (2012), can show cross modal associations between auditory and visual stimuli (Parise and Spence, 2012: 325). Obviously, there is intense neuronal communication between regions for seeing and listening, for example, auditory pitch and visual size, "between the waveform of auditory stimuli and the roundedness of visual shapes" (Parise and Spence, 2012: 326;cf. Lockwood and Dingemanse, 2015). Westbury (2005) and Kanero et al. (2014) demonstrated that Köhler's initial results investigating spiky takete and roundish maluma objects and similar experimentsde.g., Aveyard (2012)dmay have a neurological basis.
The frequency code (Ohala, 1994) assumes that high tones, vowels with high second formants (notably/i/), and high frequency consonants are associated with high-frequency sounds, small size, sharpness, and rapid movement. Low tones and vowels are associated with low second formants (notably/u/). Low-frequency consonants are associated with low-frequency sounds, large size, softness, and heavy, slow movements. Additionally, our experience supplies us with repeated systematic correlations. Large objects cause low sounds (e.g., when hitting the ground), so we link visual and acoustic information. The vocal tract is determined by the size of the skulldaccordingly, different qualities of sound are related to body size. A low larynx results in the lengthening of the vocal tract, which leads to lower sounds. Consequently, we learn that larger animals, especially mammals, produce lower sounds. Again, we link optical and acoustic information: high acoustic frequencies correlate with high sounds, like/i/, which indicate small (animals), while low acoustic frequencies correlate with low sounds, like/a/or/o/, signaling large (animals). Bands of formants deliver important information on the size of a mammal. The early recognition of largedthus, potentially dangerousdanimals increases the chance of survival. Therefore, knowing about correlations is important. Natural selection favors sounds that implicate magnitude. To conclude, whereas low, growling, rough sounds mean that an animal is large and dangerous; whimpering, whining, high sounds imply that it is frightened and harmless (cf. Fitch, 2010;Morton, 1994;Ohala, 1994). Kawahara et al. (2018) argue that the Japanese understanding of voiced obstruents is compatible with the frequency code. As voiced obstruents show low frequencies, they evoke the impression of largeness and co-vary with some strength parameters, such as heaviness.
At least some sound symbolic effects seem to work independently of language specific knowledge. Slavic languages very often use lexemes with/a/for small things, but in experiments with artificial words, Russian and Ukrainian speakers judge/a/ to be big,/i/to be small (Levickij, 2013: 87). The same was found in a research with Korean-speaking subjects (e.g., Shinohara and Kawahara, 2010). Knoeferle et al. (2017) even argue that sound symbolic properties of words may not be judged according to a rigid scale (e.g., a sound refers to large or small objects), as subjects rather connect different dimensions of objects to parts of the acoustic cue. Similarly, Westbury et al. (2018) argue in favor of different weights on sound symbolic cues. Thus, not all sound symbolic effects are universaldthey are not equally strongdand the sound system, as well as phonotactics of the lexicon or of a sublexicon (e.g., Pokémon names) might be of importance.

Limitations of previous experiments
Despite the increasing number of researches on sound symbolism, results of sound symbolic effects are not consistent, which can be tracked back to the following reasons: Sound symbolic research is published in a variety of journals across many disciplines. This itself results in a wide variety of methods according to the methodological tools of the given (sub)discipline, making results hardly comparable with each other and altering them according to their validity. Methodology-oriented papers are not often published in this interdisciplinary subfield, so there are just few guidelines how to conduct research, choose analytic tools, and interpret results. However, standardization and improvement of methods could be the key to reach valid and comparable results across languages. Most experiments on sound symbolic effects have been conducted with English speakers, although, in recent years, research on Japanese is increasing. Other languages are rather underrepresented in sound symbolic research. This makes it difficult to compare results across languages and to generalize results within or across language families. Previous researches did not separate lip rounding from vowel backness, as many languages, such as English or Japanese, do not have/y/or/ø/.
As a result, sound symbolic research is scattered across many disciplines, with near as many methods as papers with the exception of the first well-known experiments (e.g., Irwin and Newland, 1940;Scheerer and Lyons, 1957;Bremner et al., 2013;Imai et al., 2015;Thompson and Estes, 2011;Lupyan and Casasanto, 2015;Dingemanse et al., 2016).

Expanding the scope: new languages, more elaborated models
The present study expands sound symbolic research to German and Hungarian by analyzing how far native speakers of German and Hungarian map front/back vowels, high/low vowels, and voiced/voiceless obstruents to the size of an object.
German is interesting, because it is a Germanic fusional (inflected) language (similarly to English), so the results of researches conducted with German speakers could be compared to results of researches conducted in English to see whether the same effects apply for another member of the same language family. We chose Hungarian, because although it is a European language, it belongs to the Uralic languages. Hungarian is an agglutinative language using various affixes, mainly suffixes. Choosing Hungarian makes it possible to see in how far a non-Indo-Germanic language shows comparable results to the Indo-Germanic ones.
Despite being part of different language families, both languages have vowels which are rounded and are not present in English: for German it is/y, ʏ, ø, oe/, for Hungarian/y, y:, ø, ø:/. This enables us to analyze hitherto neglected sounds in two languages of different language families:/ø/and/y/. The additional vowels make it also possible to separate the effects of lip rounding from vowel backness.
Sound symbolic effects in Hungarian have been extensively studied since the 1950's (cf. T. Molnár, 1993). Early research in sound symbolism was connected to (functional) stylistics and the analysis of poems (cf. Fónagy, 1959;Szathmári, 1970Szathmári, , 1980. Fónagy and Szathmári were the first pioneers to actually coin the term "sound symbolism" into the relevant Hungarian discourses when they started to investigate its role in Hungarian and European poetics. Szathmári analyzed the sound symbolism phenomenon not only in poems (cf. Szathmári, 1970), but also in Hungarian folk ballads (Szathmári, 1980). Fónagy (1959,1961) analyzed Hungarian poems and found that more/k/,/t/,/r/, and back vowels, along with/i/are used in poems with aggressive tone, while more/l/,/m/, and/n/were used in poems with tender tone (for the role of/l/and/r/in Hungarian poems, see also Boda and Porkoláb, 2013). Molnár (1993) tried to categorize sounds referring to eight dimensions, such as "aggressive-peaceful," "warm-cold," "light-heavy," "soft-hard," and "small-large." However he used single letters in his experiment, and he did not randomize the stimuli. The person who collected the data pronounced the stimuli, and subjects were asked to pronounce them again. It is not described whether one or more people collected the data and what the setting wasde.g., could respondents hear other respondents pronouncing the sounds? Furthermore, subjects rated sounds referring to the eight dimensions. These different dimensions might have interfered with perceived size ratings. Another problem was that he used long and short vowels. Long versions were always perceived as larger than the corresponding short ones. Thus, the size effect may have been due to vowel length. His experiments were later not repeated, thus, his findings were not verified.
Sound symbolic devices are recently analyzed in Hungarian in the edited volume of Kádár and Szilágyi (2015;cf. Székely, 2015;Fazekas, 2015;Szilágyi, 2015;Sz} ucs, 2015;Szili, 2015). Further sound symbolic investigations of Hungarian were carried out by Tsur (2006), who showed that Hungarian front vowels symbolize proximity, while back vowels indicate distance (e.g., itt "here" vs. ott "there"). In a recent paper, Dimény (2018) analyzed Hungarian verb structures, and she pointed out that certain sound schemas were connected to semantic components of the verb meaning.
For German, we found several studies with different research designs and questions. Müller (1935) used lexemes from languages, such as Bantu, Swahili, Kâte, or Hebrew (cf. tumba, ongongolólo, lala, gogu, káta, fiti, sili, marr). His research involved 251 subjects, who did not know these languages. He wanted to see if his German-speaking subjects were able to identify meaning aspects merely on the basis of sound structure. He found astonishing similarities in the answers. Subjects described, a.o., tumba as something thick, large; fiti as something small, thin, pointed; or marr as something unpleasant. Wissemann (1954) examined the creation of new names for sounds and noises. His subjects had to name different acoustic information categories. Again, the answers showed similarities. High vowels were used for high sounds./i/,/y/, and/ø/represented a light tone color, while high tones/u/and/o/were associated with dark tone colors and low tones. Stop sounds were used for a sudden, abrupt end of a noisedand fricatives for gradually starting or ending a sound. Fónagy (1959,1961) found more/l, m, n/ in German poems with an affectionate background, but in aggressive poems, he counted more/k, t, r, g/, and back vowels. Ertel (1969) demonstrated magnitude symbolism for German-speaking subjects with the help of artificial words. Albers (2008) compared Old Egyptian and German texts and saw a connection between "pleasant-hard" mood message registers and plosives and a link between the impression of "sad" and "soft" and nasals. In a study about the motivation of fantastic names, names of plants, buildings, countries, planets, various creatures, etc. were collected (Elsen, 2008a, b). Data were based on names from 52 books written in German by German authors. Anthroponyms, such as Brin, Tik, Elim, Schti, were found to mark small, good beings, very often with the small-and well-sounding/i/. Foreign evil creatures, like orcs, demons or vicious, reptilian-like creatures, were named Brazoragh, Rok-Gor, Ch'tuon, Xandor, Chrekt-Orn, Rrul ghargop, or An-Rukhbardnames that include many back phonemes, especially velar and uvular fricatives and vowels like/u, o, a/. In the study, 106 native speakers of German had to rate names in a questionnaire. They were given lists of names, all of which were taken from books. The answers of the subjects correlated with authors' intuitions. Finally, Elsen (2017) discusses the facilitation of the role of sound symbolism in language processing and acquisition (for an overview, cf. Elsen, 2016).

Our study
To ensure comparability, we used an improved variation of the data collection method of Shinohara and Kawahara (2010) anddas far as possibledthe same phonemes as they did, though, some modifications were necessary to ensure that the results are not biased (see 2.2 for details). For the stimuli, we selected sounds that both German and Hungarian languages have 1 (see 2.2 for details).
Following Shinohara and Kawahara (2010), we decided to use mixed-effect models that are capable to separate subjectlevel and stimuli-level effects. We can compare their results on Japanese, Chinese, Korean, and English data, because we applied the same model. However, to exploit all opportunities of these models, we planned to use them in a way that is novel in the field of size symbolism. We planned to interpret model parameters that (according to our knowledge) have not been used by other researchers but seemed to be promising according to their interpretive power. To ensure comparability and reproducibility, we aimed to give a detailed but easy-to-understand overview of our research design and modeling approach.

Method
The methodology is based on the experiment of Shinohara and Kawahara (2010). In their study, they tested English, Chinese, Japanese, and Korean native speakers with 40 stimuli. The stimuli were artificial words, doubled VC-syllables with/b, d, g, zp, t, k, s/and/i, u, e, o, a/(cf. ibib). They found that three phonetic factors (the height of vowels, the backness of vowels, and the voicing in obstruents) contribute to the images of size, but to different degrees: Korean speakers showed a nonsignificant tendency to perceive voiced obstruents to be smaller in size than voiceless ones. The authors offer phonetic grounding of these size-related sound symbolic patterns.

Data
In our experiment, we worked with data from 291 German and 121 Hungarian native speakers. Subjects were students of the universities of Augsburg, Germany (BA and MA students) and Eötvös Loránd University, Hungary (BA students). The study used a non-probability sampling method.
The questionnaire was based on the one used by Shinohara and Kawahara (2010). Subjects were given stimuli which were, according to the task, words of a foreign language describing the size of objects. Subjects had to guess the size of the objectda gemdthe word described on a 4-point scale (very small, relatively small, relatively large, very large).
The stimuli consisted of 56 trisyllabic artificial words with CVCVCV structure. The three vowels and the three consonants were identical (e.g., papapa, füfüfü, kekeke). In order to avoid context effect (e.g., fatigue bias), 10 randomized versions of the stimuli list were used, each containing all 56 stimuli in a different order. The same 10 randomized stimuli sets were used on both data collection sites. We dropped those (three) questionnaires which contained presumably insincere answers based on the usage of the (statistically unlikely) repetitive pattern of the answers throughout the questionnaire. Data were collected via paper-and-pencil questionnaires. Shinohara and Kawahara (2010) used CVCV-stimuli: the consonants were four voiced obstruents/b, d, g, z/and four corresponding voiceless obstruents/p, t, k, s/. The vowels were/i, u, e, o, a/. All of the target languages had these five vowels in common. These factors were fully crossed (2 voicing types * 4 types of obstruents * 5 vowels) ¼ 40 words. All of these words were artificial words in all the target languages.
As our two languages offer the possibility to study the rather rare sounds/y/and/ø/, we added them to our stimuli to see to what extent they differ regarding the image of size. Consequently, we selected seven vowels: back/a, o, u/vs. front/i, y, e, ø/; high/i, y, u/vs. mid/e, o, ø/vs. low/a/, where/e/was classified as "low" in Hungarian (cf. Siptár and Törkenczy, 2000).
Additionally, we analyzed the effects of roundness. Rounded vowels in the dataset were/o, u, y, ø,/in German and/y, ø, u, o, a/in Hungarian.
We used the same consonants as Shinohara and Kawahara (2010) except for/s, z/, as stimuli were presented in written form and German uses <z> for/ts/. To avoid confusion, we chose/f, v/instead. Accordingly, we selected eight consonants: three voiced stops/b, d, g/, one voiced fricative/v/, three voiceless stops/p, t, k/, and one voiceless fricative/f/.
We did not use the VCVC structure of Shinohara and Kawahara (2010), because German shows syllable final devoicing of all voiced obstruents (Auslautverhärtung) and written <igig, ebeb, adad> will always "translate" into/igik, ebep, adat/. However, we could not use CVCV items instead, because several of these two syllable stimuli are lexicalized. In Hungarian, for example, tata and papa mean "grandpa/old man," baba means "baby," while popo, bebe, didi, and bibi are words from children's language. In German, Papa means "father, daddy." Other CVCV words are names (Pepe, Sisi, Kiki) or terms from children's language (Pipi, Popo, Kaka, Baba, Dada). In light of the described phenomenon, we chose CVCVCV forms. So we could ensure that Auslautverhärtung does not influence the results, and we minimalized the effect of existing words, whereby the results remain still comparable with those of Shinohara and Kawahara (2010). The 3 factors (7 vowels, 4 consonants with 2 voicing types) were fully crossed, giving 56 words (7*4*2). All of these words are artificial words in both the target languages, although as mentioned before, some of them have a meaningful bisyllabic version.

Subjects
All Hungarian subjects were native speakers of Hungarian, so we included all of them into the analysis. 29 out of the 300 subjects from Germany had a mother tongue other than German, 9 of them did not speak German fluently. We decided to exclude these 9 respondents from the analysis. The final dataset consisted of 121 respondents from Hungary and 291 respondents from Germany. Median age was 24 years in Hungary and 22 years in Germany. The gender distribution was less balanced: 46% of the sample was male in Hungary, as opposed to 19% in Germany 2 .

Variables used in the analysis
Outcome variable: rating of the size of the given word on a 1-4 scale (1 ¼ very small, 2 ¼ relatively small, 3 ¼ relatively large, 4 ¼ very large).

Statistical modeling
We used Stata 13.0 to implement the models. We aimed to account for the effect of each predictor simultaneously, hence, we used a multiple linear regression model. It is clear that the 56 stimuli are repeated across each respondent, and we needed to account for the fact that multiple ratings from the same respondent are correlated. But each word also had~400 repeated measurements (one per respondent), and those are also likely to be correlated to each other. To account for the repeated nature of the problem, we used a linear mixed-effect model (Snijders and Bosker, 2012;Rabe-Hesketh and Skrondal, 2012; for linguistic applications, see Baayen, 2008) with word and respondent as random factors. As these factors are not nested, a crossed random effect model were used. We had balanced data (equal sample size in each person/item cell), so restricted maximum likelihood (REML) yields unbiased estimates. We used Stata's xtmixed command.
To exploit the usage of mixed-effect models, we aimed at interpreting model parameters which have large interpretive power, like intra-class correlation or proportional change in variance. As these measures are rarely mentioned in the size symbolism literature (e.g., neither discussed by Shinohara and Kawahara, 2010; nor by Knoeferle et al., 2017), we introduce them in more detail below.
In this paper, we investigate the magnitude of correlation among measurements within stimuli and within subjects. Using this research question, we introduce the ''empty'' model. This model does not include any explanatory variable. According to the model, measurements corresponding to the same stimulus differ from the average rating by a certain value (stimulus effect). Similarly, measurements given by the same subject may differ from the average by a certain value (subject effect). Stimuli and subjects do not completely determine the measurement, as there is some further random variation in measurements. Therefore, a given rating comes out as a stimulus-level effect plus a subject-level effect plus a random measurement-level effect. As these effects are independent, the total variance of ratings can be partitioned into a variance between stimuli, a variance between subjects, and a between-measurement residual variance.
Statistical measures characterizing the effect of stimuli and subjects are intra-class correlation coefficient at the stimulus and at the subject level (ICC St and ICC Su ). If, for example, ICC Su equals 0.08, we can conclude that 8% of the total variance is attributable to between-subject differences. The term correlation suggests that the ICC Su expresses the similarity in ratings of two stimuli given by the same subject.
Stimuli-level clustering may be attributable to phonetic features which we add to our initial model. By adjusting for the phonetic features, we may explain some of the stimuli-level variances detected in the empty model. The success of this explanation can be measured by the proportional change in variance at the stimuli level.

Hypotheses
The hypotheses for the current research are the following: H1) Backness and height of vowels influence size ratings in both languages: back vowels are connected to larger size. H2) Voiced consonants are connected to larger size. H3) Rounded vowels show different results compared to their unrounded counterparts. To answer the hypotheses, we broke down the testing of hypotheses into smaller research questions, which are elaborated in detail to show how a crossed random effect model can answer them.
1. How is the size rating affected by phonetic factors (lip rounding, backness, and height of vowels, voicing of consonants)?
2. To what extent do these effects hold robustly across the two languages (are the effects similar in German and in Hungarian)? (H1þH2þH3) 3. Do the phonetic features affect rating independently from each other, or do they interact with each other by intensifying/weakening each other's effect? For example, does the effect of a back vowel on size rating depend on whether the consonant present in the same word is voiced or voiceless? (H1) 4. Are the effects we found similar to those found by Shinohara and Kawahara (2010)? (The findings are not completely comparable, see the differences in the designs above.) (H1þH2) 5. To what extent can we explain the effect of the stimulus in our models, and what proportions of the effect remain unexplained? Statistically speaking: to what extent are stimuli-level differences explained by the stimuli's phonetic characteristics (interpreting the proportional change in variances). (H1þH2þH3) 6. To understand the effect of backness, roundness, and height more thoroughly: how is the size rating affected by vowels themselves? (H1þH3)

Results
In the empty model, the value of ICC St is large (0.10 in Hungary, 0.28 in Germany) and it is much larger than the value of ICC Su (0.02 in both countries). That is, there is a moderate size of clustering at the stimuli-level, so stimuli are important in understanding the differences in size ratings, while the role of the subject is much less important. This result was much more pronounced at the German study site.
Regarding the effect of vowel and consonant features (Table 1), our findings are consistent with those from previous studies (Research Question 1 and 4). Backness (see the studies and reviews of, e.g., Auracher, 2017 andHoshi et al., 2019) significantly increases size rating. Regarding height, the direction of low-high contrast is also as expected (see Shinohara and Kawahara, 2010). This parameter is strongly significant in both countries. Moreover, lip rounding affects the impression of size: rounded vowels are perceived as larger than their unrounded counterparts. The low-middle contrast is not significant in either of the countries. The effect of voicing (voiced > voiceless) is also as expected. Additionally, we found that the manner of articulation (stop > fricative) may also be an influencing factor, however, its role is not significant in German.
Considering the effect sizes, vowel features have the greatest effect in both countries. They increase the rating by around 0.2-0.9 unit (on the 1-4 scale). To test the equivalence of the effects estimated from the Hungarian and the German subsamples, we fitted a joint model on the pooled data. The type of subsample was a fixed factor of the model that interacted with each of the stimulus-level predictors. All of these interactions were significant at the 0.001 level 3 , which means that the differences between the two subsamples are statistically significant. Interestingly, vowel features have (significantly) greater effect in Germany than in Hungary, while consonant features behave just the opposite waydthey show a stronger effect in Hungary (Research Question 2 and 3). Note: p-values in brackets, fixed parameters significant at the 0.001, .0.1, .0.5 level are denoted by ***, **, *, respectively. The numbers in the fixed parts of the table indicate how much larger the object was rated on a 1-4 scale influenced by the given phonetic feature. E.g., in Germany, words with back vowels were rated 0.35 larger (on a 1-4 scale) than those with front vowels.
After adjusting to phonetic features, the value of ICC St decreased a lot. In parallel, the proportional change in variance has a large value (77% in Hungary, 92% in Germany), which means that the vast majority of stimuli-level differences are attributable to differences in phonetic features (Research Question 5). In other words, features of words we have chosen turned out to be important features, and they cover the majority of the characteristics that affect ratings. Only a small proportion of the total stimuli-level variance remained unexplained.
To answer the question whether phonetic features interact with each other, we added interaction terms to Model 1 by defining two-way interactions between the four phonetic features. None of the interaction terms were statistically significant at the level of 0.05. Therefore, we conclude that phonetic features affect size rating independently from each other (Research Question 3).
We conducted a separate analysis to measure the effect of the vowels themselves (Research Question 6). We fitted a new model by adding only the vowel type to Model 0. This model does not include other phonetic features as explanatory variables. Fig. 1 shows the model estimatesdthe average ratings of all the seven vowels in both languages. The error bars represent the 95% confidence interval of the difference from the reference category/a/. Although there is cross-linguistic variation, we observe some consistent patterns: for example,/i/is rated the smallest,/o/is rated the largest, and the rank order too is almost the same in the two cases:/o>a>ø>u>e>y>i/for Hungarian,/o>a>ø>e>u>y>i/for German.

Discussion
In our study we investigated the influence of certain phonetic features on the perception of size with 121 Hungarian and 291 German native speakers. After entering phonetic features into the model, only a smaller proportion of stimuli-level variance remained unexplained, which means that we managed to grab almost all important features of words which could affect size rating.

Vowels
The examined phonetic features have significant effects (in almost each case), as expected. On the one hand, we verified the results of previous studies that backness of vowels is connected to larger size (H1 is accepted). Backness has the greatest effect in both countries. It increases the size rating by around 0.5 unit (on the 1-4 scale). Its effect is much stronger than the effect of voicing. Effect of height is moderate in Germany and slight in Hungary. The results mainly correspond to those of Shinohara and Kawahara (2010).
In contrast to Shinohara and Kawahara (2010), we found that mid vowels were interpreted larger than low vowels. This is found in other languages, too (Haynie et al., 2014). The explanation might be found in lip rounding.
The effect "low is smaller than mid" is attributable to the "new" vowel/ø/./ø/is definitely larger than/e/ (Fig. 1). As other studies did not test this aspect, this is a new finding. We, thus, got a more differentiated inner part of the rank order. Without/ ø/, our results are as usual:/a, o/>/e/>/i/. Perhaps, roundedness has a more pronounced effect on largeness than openness. In this respect, we found that/y/was rated larger than/i/in both languages. Further studies might look into more details how rounded and unrounded vowels are interpreted. That is, we need more studies with languages that have more rounded phonemes than/u/and/o/to see how widespread this effect really is. More studies could reveal more detailed results about various phonetic features and their interplay. Roundedness is significantly connected to larger size in both languages: German speakers rated rounded vowels 0.4 units larger (on the 1-4 scale) than unrounded ones, while Hungarian speaker rated rounded vowels 0.29 units larger. One of the highlights of our paper is the ability to distinguish lip rounding from vowel backness, as we detected a strong and significant effect for both features. According to our collected data, we can separate the effect of lip rounding and vowel backness.
Rounded vowels are connected to rounded shapes through rounded lips (D'Onofrio, 2014). The reason for roundness connected to larger size may lie in the shape of the lips forming a larger or a smaller opening (cf. D'Onofrio, 2014).

Consonants
As for the consonants, voiced ones have the effect to be perceived larger in size compared to voiceless consonants, as expected. That is, H2 is accepted. The finding is in line with previous findings: for English speakers, voiced obstruents are larger than voiceless ones (Newman, 1933;Shinohara and Kawahara, 2010). As for an explanation, either an acoustic and an articulatory one is possible (Shinohara and Kawahara, 2010).
Furthermore, we found that fricatives are perceived to be larger than stops. These contrasts are present in both countries, but they are not significant in Germany. This seems to contradict Klink (2000). The different results might be explained by methodological deviations, as Klink's design differs from our research in a lot of respects: e.g., Klink used only a handful of stimuli, he did not use a fully crossed design and did not use random effect models.
In sum, four phonetic factors contribute to the images of size: the height of vowels, the backness of vowels, lip rounding and voicing in obstruents, and the manner of articulation, which was significant only in Hungarian. We were able to distinguish lip rounding from vowel backness with a strong and significant effect for both.
Finally, we want to point out a limitation of our study. In German, there are two a-vowels, but they are so similar on the phonetic level that most phonologists define a single phoneme/a/. Accordingly, we could only work with one low vowel. In Hungarian, we classified/e/also as low (cf. Siptár and Törkenczy, 2000).
The results of the present study concerning vowels and obstruents can be seen as compatible with the frequency code which can explain the overall results of the study. Sound symbolism exists, but it may be latent without being active all the time. It may be activated when further information is lacking, e.g., when interpreting unknown or artificial words (Elsen, 2017). Magnitude symbolism of vowels seems to be psycholinguistically realdindependent of language and sound systems. The problem is how to detect additional frequency-based acquired effects. The research on such effects is not easy: so far, none of these possible explanations can be regarded as being the only responsible factor. Research is made more difficult by the interaction of individual effects: some might combine, some might work to different degrees with different effects .

Conclusion
Sound symbolic effects in our two language communities are comparable to Japanese, Chinese, Korean, and English results. One of the highlights of our research is the result to separate the effect of lip rounding from vowel backnessdas we detected, a strong and significant effect for both features.
Vowel height, backness, roundness, and voicing of obstruents affect the image of size. Back vowels were rated larger than front vowels. Voiced obstruents were rated larger than voiceless ones. In addition, we found that lip rounding is connected to larger sizedthis result shows why sound symbolic research should be expanded to as many languages as possible, and why it must consider more phonetic features.

Funding
The authors did not receive any funding for the project.

Declaration of competing interest
The authors have no competing interests to declare.