Contiguity-based sound iconicity: The meaning of words resonates with phonetic properties of their immediate verbal contexts

We tested the hypothesis that phonosemantic iconicity––i.e., a motivated resonance of sound and meaning––might not only be found on the level of individual words or entire texts, but also in word combinations such that the meaning of a target word is iconically expressed, or highlighted, in the phonetic properties of its immediate verbal context. To this end, we extracted single lines from German poems that all include a word designating high or low dominance, such as large or small, strong or weak, etc. Based on insights from previous studies, we expected to find more vowels with a relatively short distance between the first two formants (low formant dispersion) in the immediate context of words expressing high physical or social dominance than in the context of words expressing low dominance. Our findings support this hypothesis, suggesting that neighboring words can form iconic dyads in which the meaning of one word is sound-iconically reflected in the phonetic properties of adjacent words. The construct of a contiguity-based phono-semantic iconicity opens many venues for future research well beyond lines extracted from poems.


Introduction
While traditional linguistics largely keeps endorsing the hypothesis that the linguistic sign is arbitrary regarding the relation of the signifier (i.e., sound or written characters) and the signified (i.e., meaning) [1], evidence for non-arbitrary sound-meaning relations in the use of language is growing (for reviews see [2][3][4][5][6]). Several studies have tested phonosemantic congruencies between meaning and acoustic characteristics on the level of individual words [7][8][9][10][11][12][13][14] or statistical correlations between the frequency of particular phonemes and the overall emotional tone of entire texts [14][15][16][17][18][19][20][21]. The aim of the present study is to introduce a hitherto unconsidered variant of sound-meaning relations in natural language texts. In this variant, we predict sound-iconic relations between the meaning of a given word and the sound patterns of its immediately neighboring words. That is, we expect that the sound of adjacent words is phonetically an iconic simile for the meaning of a given reference word. We will refer to this kind of phonosemantic relations between adjacent words as contiguity-based sound iconicity. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Sound iconicity, also known as phonosemantics, sound symbolism, linguistic iconism, or phonological iconicity, refers to relations between sound and meaning of linguistic signs (for discussions of the terminology see [22][23][24][25]). In this article we will use the term sound iconicity to refer to systematic and universal associations of articulatory-acoustic properties of phonemes with non-acoustic attributes, such as size, shape, or affect. In this meaning, the concept of sound iconicity does not include imitative sound-meaning relations such as in the case of onomatopoeia.
Natural language use has likewise been tested for the relevance of sound iconic associations. Studies on language acquisition suggest that sound iconic words are learned easier and faster in both early first [46][47][48][49] and second language acquisition [50][51][52][53][54]. There are also attempts to derive the selection of the best-fitting brand or product names from insights into sound iconicity [55,56]. Finally, studies on prosody in verbal interactions reported that iconic modulations, such as changes in the pitch of a voice, are not only affected by the emotional state of the speaker or the syntactic structure of an utterance but also by its content [57][58][59].
Within the research on the relevance of sound-meaning relations in natural language use, a subgroup of studies focused on relations between-mostly emotional-aspects of the content in natural language texts and the relative occurrence of phonemes with specific articulatoryacoustic characteristics. Compared to the studies referred to above, these studies have yielded a less consistent picture. Some of them found a significant relation between the emotional content of texts and their phonetic structure [14, 16-18, 21, 60, 61], others have not [19,20]. Moreover, the same phonetic characteristics have been attributed to different iconic meanings. For example, a high occurrence of nasal consonants in a text has been linked to "tenderness" in one study [18] and to "melancholy" in another [16].
The present proposal of a novel type of sound-iconic relations in natural language focuses neither on individual words nor on entire texts, but on small clusters of neighboring words. Honoring long-standing assumptions that art shows a particularly strong resonance of form and content [62][63][64][65][66][67][68][69] and specifically following Jakobson's hypothesis that iconic sound-meaning relations are likely to be more frequently and more saliently found in poetic language [70], we used lines extracted from poems as the first testing ground for our hypothesis. At the same time, and just like Jakobson, we do not imply that poetic language use is categorically set apart from ordinary language use. Rather, we expected that poetry might provide first evidence for a phenomenon that is also relevant in ordinary language use, if only to a lesser degree.

Sound iconicity of magnitude
In the current study, we tested whether words that refer to either largeness or smallness differ regarding the relative occurrence of specific phonemes in the words immediately preceding and following them. The so-called sound iconicity of magnitude is a well-studied phenomenon. Previous studies have provided evidence that high-front vowels (such as /i:/ in 'heed') tend to be associated with smallness and low-back vowels (such as /ɑ:/ in 'bath') with largeness [26,35,38,40,[71][72][73][74]. This cross-modal association between articulatory-acoustic characteristics of vowels and the non-acoustic property of size has been reported for participants of various mother tongues [75,76] and even for prelingual toddlers [37]. Moreover, comparative studies have found a near-universal tendency for an increased likelihood of high-front vowels in linguistic units that refer to smallness and related concepts [8,10,12] (but see [11,77,78]; for a discussion see [79]).
Sound iconicity of magnitude can also be understood as a specific kind of synaesthetic association, that is, an association across sensory modalities. The characterization of vowels as high vs. low and front vs. back refers to articulatory features related to the relative position of the tongue when pronouncing the respective vowel. These articulatory characteristics are directly related to the frequency of the vowel's first two formants [80]. Formants are the repercussions of resonance frequencies of the vocal tract and correspond to higher amplitudes in the power spectrum of vowels. The frequency of the formants is influenced by the size and shape of the vocal tract, which, in turn, varies depending on the articulatory movements, such as opening or closing the mouth, changing the position of the tongue, and rounding or spreading the lips. The frequencies of the first two formants and their relative distance are important for characterizing the distinct sound of a vowel. For example, the distance between the first and second formant-the so-called formant dispersion-is relatively wide for high-front vowels, such as /i/, and considerably narrower for low vowels, such as /a/, and back vowels, such as /u/ (Fig 1). A closer look at the specific relation between the characteristic frequencies of a vowel's first two formants suggests that the association between vowels and size corresponds to the relative formant dispersion of the respective vowels: vowels that have a narrow formant dispersion are Relation between frequency of the first two formants (F1 and F2) and their articulatory characteristics for three vowels. The distance between F1 and F2 indicates the formant dispersion. As can be seen, the formant dispersion is widest for the high-front vowel /i:/ and considerably narrower for the low vowel /a:/ and the back vowel /u:/. associated with largeness, whereas vowels that have a wide formant dispersion are related to smallness.
Sound-iconic associations between formant-frequency and size are, therefore, closely related to the frequency code, a theory originally introduced by Ohala [81,82]. Ohala pointed out that, across species and languages, acoustic frequency is used to convey an impression of size. Moreover, as in most species size is directly associated with physical and social dominance, acoustic frequency can also strategically be used to intimidate or appease rival conspecifics or to gain an advantage in the competition for mates (see also [83]). According to Ohala [82], the frequency code is causal for many sound-meaning relations that are seemingly independent of a specific language, culture, or even species, including phonosemantic congruencies in the lexicon, the vocal expression of affect, the use of sound-frequency in threatening vocalization, or emotional facial expressions to display fear or anger. To give an example, facial gestures of primates that express aggression or submission involve lip movements that also affect the frequency of the voice's formants. For Ohala [82], the original motivation of these gestures is to make the voice sound darker and more intimidating when attacking a conspecific, and brighter and friendly-sounding when trying to appease a dominant opponent (p. 332-333). As the facial gestures described by Ohala are closely related to articulatory movements when pronouncing back or front vowels, the association between formant dispersion of vowels and their association with size, strength, or dominance might well be rooted in facial expressions of emotions.
Formant dispersion, thus, might not be associated with (directly perceivable) size, but more generally with a notion of physical and social dominance. Corroborating this hypothesis, several studies have revealed that body size, via its close relation to the length of the vocal tract, shows a negative correlation with the distance between the frequencies of the formants of an individual's voice [84]. Accordingly, formant dispersion is widely considered to be indicative of body size across various species [85][86][87][88][89][90][91], including humans [92,93] (but see [94]). A recent overview of studies on various mammal species suggests that males strategically lower their formant dispersion to appear larger and thereby gain an advantage in the competition for mates and access to resources [95]. Similarly, studies with humans have reported that the formant dispersion of the male voice is used to signal physical and social dominance in interactions [96,97]; and that it predicts females' preferences regarding male voices [98][99][100][101]. Thus, it appears that the naturally occurring relation between formant dispersion and body size initiated the development of use of formant dispersion as a communicative tool to signal physical and social dominance.
There is also evidence that these relations between sound frequency and physical and social dominance may have been adopted in sound-iconic associations with vowels. Fischer-Jørgensen [102], for example, found that the relative formant dispersion of vowels corresponds to how these vowels are assessed on a variety of scales, such as dark-bright, hard-soft, or smallbig. In a more recent study [26], participants showed a significant tendency to implicitly associate front vowels with fearful body postures and back vowels with angry and aggressive behavior. As both anger [103] and fear [104,105] have been suggested to be closely related to a dominance-submissiveness system facilitating the establishment of dominance hierarchies (see also [106][107][108][109][110][111][112], these findings again support the assumption that formant dispersion is used to signal physical and social hierarchy.
However, while studies on the relation between vowels' formant-dispersion and the concept of dominance so far mainly looked at phonosemantic congruencies within words or texts, the focus of this study will be on sound-meaning relations between contiguous words within a given text sample. That is, we expected that words in the immediate context of target words designating smallness, delicateness, weakness, or fear would contain more high-front vowels, whereas the immediate context of target words referring to largeness, roughness, strength, or anger would contain more low and back vowels.
For clarity, we will henceforth speak of target words and their contexts, with 'target word' referring to a semantically categorized word that refer either to smallness or largeness and 'context' referring to the words that occur in the immediate proximity of the target word in a given text sample. The term 'target word' here only means that we used these words as search terms to extract one-line samples from a text corpus that include verbal designators of smallness, delicateness, or weakness on the one hand and largeness, roughness, and strength, on the other. We do not imply that these words are by definition of outstanding importance for the overall semantic processing of the selected text samples. We merely hypothesized that the average formant dispersion in a target word's context should match the allocation of this target word to either side of the bipolar dominance dimension.

Materials
We selected five attributes per semantic category as target words (Table 1). Criteria used for the selection of the attributes were: (1) They refer to one of the semantic dimensions that had previously been associated to the articulatory characteristics of vowel backness and/or height; (2) they are directly associated with one of the two poles of the dominance dimension; (3) they occur at least 30 times in the text corpus used. The latter requirement was implemented to secure a minimum of text samples per target-word. For the selection of the target words we used the online version of the Duden, a standard German dictionary which also provides a list of synonyms where applicable (www.duden.de/).
We first selected two target words per group that most obviously represented size and strength, i.e., klein (small) and schwach (weak) for group SMALL and groß [large] and stark [strong] for group LARGE. As it had been shown that the distinction between anger and fear lies along the dominance dimension, with anger being related to dominance and fear to submissiveness [104][105][106][107][108][109][110][111][112][113][114][115], we also added ängstlich [fearful] for SMALL and wütend [angry] for LARGE. Attempts to include further target words failed mostly because eligible words occurred far less than 30 times in the corpus. However, as most target words were sound-iconically congruent in the sense of the hypothesis (i.e., words, referring to smallness and related concepts contained vowels with a high formant dispersion and words related to largeness and related concepts contained vowels with a low formant dispersion), we additionally included sound-meaning-incongruent target words, i.e. riesig [gigantic] for the group LARGE and zart [delicate] for the group SMALL, together with their respective counterparts winzig [tiny] and grob [rough]. This allowed us to control to what extent the phonetic characteristics of the target words exerted an influence on the hypothesized phonosemantic relation between target word and context. As neither winzig nor grob had a sufficient number of For reasons of convenience we henceforth refer to the two opposite groups of attributes as SMALL vs. LARGE, although they are not limited to physical properties but also refer to a general notion of low vs. high dominance and hence represent the potency dimension of the socalled EPA model [116]. According to this model, there is a general tendency to semantically categorize concepts along the three bi-polar dimensions Evaluation (peasant-unpleasant), Potency (dominant-submissive), and Activity (active-passive). Further developing this model, Mehrabian and Russel proposed to designate the three dimensions as Pleasure, Dominance, and Arousal [113][114][115]. For our purposes, we equally rely on both variants of the EPA model and hence use the terms Potency and Dominance interchangeably.
We selected text samples from the online archive Freiburger Anthologie (http://freiburgeranthologie.ub.uni-freiburg.de/fa/fa.pl). The anthology comprises more than 1500 poems written between 1720 and 1890. To select the text samples, we used the stems of the attribute terms in Table 1 as search terms (e.g., 'riesig ! 'ries-'). We also included inflected forms, derivatives, and compound words, provided that these did not alter the meaning; thus, for klein (small), we also included diminutives of nouns (e.g., Glöck-lein (little bell)). For the phonetic analysis, we treated compound words as two separate words (e.g., riesenhoch ! riesen + hoch [gigantically high]). To ensure that neither a specific author nor a single poem would dominate the data, we randomly selected a maximum of six samples per author and a maximum of five samples per poem for each target word. Lists of all text samples, all authors, and all titles can be found in the Supplements.
The number of occurrences in the corpus varied greatly for our target words (see Table 1). In order to avoid that the results are dominated by samples of target words with a relatively high frequency of occurrence, we randomly selected 30 text samples per target word using the shuffle algorithm in the Python programming language's Numpy package. In contrast, when testing the effect of sound-meaning congruency for each individual target word, we included all samples detected in the corpus.
Not only the absolute length of the text samples and the position of the target words varied in our study, but also the number of words surrounding the target words. For example, when target words appeared at the beginning or at the end of a sample, their context was limited to only one side of their textual position. In fact, a majority of the samples had only one word before or after the target word, and fewer than 10% of the samples had more than two words before and after the target word (S1 Fig). Therefore, to eliminate the distance between a context word and a target word as a confounding variable, we included no more than two words before and two words after the target words in our analysis.

Phonetic analysis of the context words
For the phonetic transcription, we used a web-based tool for grapheme-to-phoneme conversion. The tool was developed at the Bavarian Archive of Speech Signals in the context of the CLARIN-D project [117,118] (https://clarin.phonetik.uni-muenchen.de/BASWebServices/ interface). A manual inspection of the transcriptions suggested that the results were acceptable for the purpose of this study.
Formant dispersion was operationalized as the distance between the first two formants (dF = |F1-F2|; Table 2). To this end, Hertz frequencies for the formants were adopted from Kohler [119]. For the analysis, we converted Hertz to Mel, a psychoacoustic scale that assesses the perceptual equivalent of the physically measurable frequency [120,121]. We used the formula introduced by O'Shaughnessy [122] for the conversion (i.e., m = 2595 log (1 + f/700), with m = value in Mel and f = frequency in Hertz). Thus, all values for formant frequency in the results section are reported in Mel. Diphthongs were categorized according to their latter vowel; in this regard, we followed Greenberg and Jenkins [30], who assessed the association between vowels and diphthongs with nonacoustic attributes. Schwa (/ə/) and the near-open central vowel /ɐ/ (German: "a-Schwa" or "low-Schwa") were not included in the analysis. We assessed the average values of the articulatory and acoustic characteristics per line for all vowels of all context words, excluding the target word.

Data analysis
All statistical analyses were performed using R software version 3.4.3 [123]. In a first step, we compared the average frequencies of the first two formants (F1 and F2) for the semantic categories (SMALL vs. LARGE) in the reduced dataset of 30 samples per target word, using a multivariate analysis of variance (MANOVA). Approximations of F-values were calculated using the Pillai-Bartlett trace. We also conducted separate post hoc tests to compare the averaged frequencies of the first two formants and of the formant dispersion in the text samples for the two semantic categories. As we found the difference between the semantic categories to be most pronounced for formant dispersion, we subsequently conducted a logistic regression to test the predictive power of formant dispersion for the categorization of the target words as either SMALL or LARGE. To this end, we assessed the percentage of correctly categorized text samples per target word. We then conducted an additional logistic regression in which the target words were distinguished dependent on their phonetic characteristics (i.e., articulatory-acoustic properties of their central vowel) rather than dependent on their semantic categorization (i.e., SMALL vs. LARGE). Whereas a majority of six target words showed a phonosemantic congruence of phonetic characteristics and semantic category (i.e., klein, winzig, ängstlich, groß, stark, grob), two target words of the category SMALL (schwach, zart) and two target words of the category LARGE (i.e., riesig, wütend) did not show the phonetic properties expectable under the hypothesis of phonosemantic congruence. The two separate logistic regressions were aimed at comparing the relative influence of the semantic and phonetic features of the target words on the phonetic features of the content. For the reduced dataset (30 samples per target word), we tested whether the data confirmed the assumption of homogeneity of variances and normality of distribution using the R software car package [124] and mvnormtest package [125]. We confirmed the homogeneity of the variances using Levene's test (F1: F[1,298 065, p > .1). Additionally, we tested the homogeneity in the covariance matrix. Results show that the ratios of variances and covariances for the semantic categories (SMALL vs. LARGE) were within an acceptable range, given the relatively large sample size (S1 Table).
Inspecting the standardized skewness, we did find indications suggesting that the distribution of the data significantly deviated from normality (S2 Table). As it has been reported that departures from normality have only marginal effects on Type 1 error rates [126], we decided to stick to the parametric MANOVA. Additionally, all results were verified through robust statistics on the ranked data using the mulrank() function in the WRS package [127,128]. In each case, the results of the nonparametric tests fully corroborated the results of the parametric tests.

Results
We first compared the mean values of the first two formants for the two semantic categories. As expected, the averaged frequency for the first formant (F1) was higher for text samples in the category LARGE than for text samples in the category SMALL; the opposite held for the second formant (F2). Consequently, the distance between the first and second formant (dF = |F1-F2|) was on average wider for samples in the category SMALL than for those in the category LARGE (Table 3).
For better visualization of the relation between the semantic category and the formant frequency, we compared the normalized values for F1, F2, and dF for the samples from the group SMALL and the group LARGE (Fig 2).
Testing the influence of the target words on the phonetic characteristics of their context, we found a highly significant effect of the semantic category on the frequency of the first two formants (V = 0.03, F[2,297] = 5.2, p < .01). To address the skewness of the distribution of the data, we also conducted a robust MANOVA on the ranked data [129,130]. Results confirmed those of the parametric test (F = 5.98, p < .01, ratio of ranks [SMALL/LARGE] for F1: 0.86 and for F2: 1.19). (For results of the post hoc analysis applying the nonparametric Wilcoxon's signed rank test, see S3 Table).
Confirming our hypothesis, the results indicate that formant dispersion is the most reliable predictor for semantic category. To test whether this held for all target words, we compared the mean values for the formant dispersion in all text samples for each target word. Whereas we used a reduced dataset with 30 samples per target word for the MANOVA, in what follows we report the results obtained for the complete dataset including all lines found in the corpus. Again, we found a relation between the semantic category of the target word and the averaged formant dispersion in its context: all target words that are related to smallness have an averaged formant dispersion above 800 Mel whereas the opposite holds for target words that are related to largeness (Fig 3; see S4 Table for exact mean values and standard errors for each target word).
To test the extent to which the averaged formant dispersion in a target word's context can accurately predict the semantic category of the target word, we conducted a logistic regression with formant dispersion as the predictor and semantic category as the outcome variable. Confirming our expectation, formant dispersion proved to be a significant predictor for the semantic category of the samples (b = .002, z = 6.174, p < .001). The resulting model explains roughly 13% of the variance (Nagelkerke's R 2 = .132). A chi-square test comparing the fit of the model in relation to a baseline model suggests a highly significant improvement (i.e., reduction of deviance; χ 2 (1) = 41.21, p < .001).
We also compared these results to a second model with formant dispersion as predictor and the target words grouped by their phonetic characteristics as outcome variable. To this end, we divided the target words into two groups depending on the formant dispersion of their central vowel (high formant dispersion: klein, winzig, ängstlich, riesig, wütend; low formant dispersion: groß, stark, grob, schwach, zart). As a majority of text samples came from target words that were phono-semantically congruent (i.e., klein, winzig, ängstlich, groß, stark, grob) we expected only marginal differences between these two models. Still, compared to the first model (semantic categorization) the predictive power of the second model (phonetic categorization), albeit highly significant (b = .001, z = 4.786, p < .001, Nagelkerke's R 2 = .078), clearly decreased.
Finally, using the averaged predicted probability of all text samples as a criterion for categorizing each sample as either SMALL or LARGE, we found that with only one exception ('weak'), a majority of the samples for each target word were categorized in accordance with the target word's semantic category (Fig 4). That is, the ratio of correctly vs. incorrectly categorized text samples proved to be a valid criterion for the semantic classification of the target words.

Discussion
The results confirm our hypothesis: phonetic properties that show a sound-iconic resonance with the meaning of our target words were indeed found to be overrepresented in the words immediately preceding and/or following these target words. The averaged distance between the frequency of the first and the second formant turned out to be significantly wider in the context of attributes that designate smallness or related concepts than in the context of attributes that connote largeness or related concepts. This relation between a word's semantic categorization with regard to the dimension of Dominance and the averaged formant dispersion in its context held for nine out of ten attributes examined. Hence the effect is fairly robust even though the concepts under consideration (e.g., small vs. delicate vs. weak vs. fearful) show substantial semantic differences. Since the abstract notion of dominance or potency, as defined in the EPA-Modell [113][114][115], is the uniting factor that groups these attributes into one category, our results indicate that formant dispersion is not exclusively associated with size but with a more general notion of physical and social dominance. These results are in line with previous findings that the relative distance between the first two formants is associated with either dominance or submissiveness. Thus, whether deliberately or involuntarily, acoustic properties of phonemes are apparently used in a way that emphasizes abstract notions expressed in the content of the text via contiguity-based sound iconicity.
Our results are in accord with previous findings, according to which humans tend to adapt acoustic characteristics of their voice to the meaning of an utterance [57,58]. At the same time, we propose that contiguity-based sound iconicity, as reported here, is qualitatively different from iconic prosody or-more generally-from iconic gestures. According to Clark et al, iconic gestures are a specific kind of "performed depiction," i.e., they visually support, or Contiguity-based sound iconicity embody, the message of an utterance [131]. Adopting this approach, one could argue that the acoustic characteristics of the vowels are a physical instantiation of the concept of largeness or smallness. However, while iconic gesture and iconic prosody use distinct channels of communication, such as hand movements or pitch modulation, the acoustic characteristics of the words under scrutiny in the present study are not highlighted in any special expressive fashion. Rather, it is only by virtue of their fairly inconspicuous cooccurence, or contiguity, that a sound-iconic relation emerges between the meaning of one word and the sound of a neighboring word. Therefore, this special type of sound iconicity has escaped the attention of researchers longer than other types.
The so-called exemplar theory can be read to suggest a possible explanation of our results. According to this theory, the cognitive representation of concepts is structured in accordance with the perceived similarity between these concepts. Applied to language, this means that verbal tokens that are similar regarding-for example-their phonetic shape, their meaning, or their syntactic function constitute clusters which allow for comparing and categorizing new experiences [132][133][134]. The semantic interpretation of a stimulus as a representation of a certain concept would then follow from its positioning within a multi-dimensional feature space (in language, semantic space) and, consequently, from the semantic proximity of this new stimulus to previously stored experience. Thus, one could speculate that phonosemantic relations between the meaning of one word and the acoustic properties of another word might cause them to be memorized as tokens of a common cluster, which in turn could bias authors to use these words together. Accordingly, words which are perceived as semantically similar are indeed used in similar grammatical constructions [135].
However, to the best of our knowledge, the exemplar theory does not cover sound-meaning relations of the type investigated in our study. Research on similarity-based categorization of linguistic tokens has so far focused on similarity between different tokens of the same type, that is, similarity between phonetic features or similarity in meaning. In contrast, the crucial point of contiguity-based sound iconicity is precisely its emphasis on cross-fertilizations between the form and content of words on the one hand, and on the other hand across sensory modalities (e.g., sound frequency and size). Thus, at their current state, exemplar theory and contiguity-based sound iconicity do not address the same issues.
Still, while we believe that contiguity-based sound iconicity is a specific type of form-content congruency in written language, it should also be noted that the averaged formant dispersion per text sample varied greatly within each category (SMALL vs. LARGE). That is, while we did find a fairly robust contiguity-based sound-meaning resonance across all samples for the two categories, the percentage of correctly categorized text samples for three target words (i.e., 'delicate', 'weak', and 'strong') was only marginally above the level of chance, and, in the case of 'weak', even contrary to our expectation.
Thus, neighboring sound qualities do by no means always drive home the meaning of the words they surround. However, this lack of consistency in the relation between sound and meaning was to be expected for more than one reason. First, the mere occurrence of the attribute 'giant' in a line of poetry does not necessarily predict the message intended by the author. Given that a vowel's formant frequency is not primarily related to size but to the notion of physical and social dominance, the text samples containing words like 'giant' do not necessarily suit acoustic features that imply high dominance (consider, e.g., 'a giant failure'). In other words, to improve the predictive power of formant dispersion for semantic categorization, it might be necessary to assess the meaning of the relevant word combinations rather than basing the analysis on the mere occurrence of specific words irrespective of the context. Second, the author of a text may make use of a wide variety of linguistic features other than contiguity-based sound iconicity that likewise allow for highlighting content in a non-semantic fashion. For example, rhyme, meter, and syntactic parallelism are all similarity-based features that poets frequently use to guide readers' attention and also to highlight semantic relations. Thus, contiguity-based sound iconicity is but one feature for establishing form-content relations, and this specific feature might well be out of focus or even intentionally sacrificed in favor of other linguistic features, if appropriate.
Finally, it is not clear to what extent a focus on the average-such as the averaged formant dispersion-is a good measurement for phonosemantic relations. As Jakobson [70] pointed out, some words in a poem are more in the fore of the reader's mind than others (p. 19). Therefore, one could speculate that rather than treating all words equally when assessing formal linguistic characteristics, the contribution of the words should be weighted according to the relative attention they receive.
The fact that most target words for SMALL and LARGE show phonosemantic properties stipulated by the hypothesis of the sound iconicity of magnitude implies a limitation for the interpretation of our results: we cannot rule out that the spreading of the respective formant dispersion to neighboring words is not only driven by the meaning of the target words, but also by their phonetic properties. It is, for example, striking that the two target words zart [delicate] and schwach [weak] which are both phonosemantically incongruent also show a relatively weak sound-meaning relation with their neighboring words.
While the importance of phonological parallelism in poetry is unquestionable, it is, however, not clear that it is specifically found in cases such as those that are the object of the present study. As for one, the predictive power of the context's formant dispersion was clearly higher for the semantic categorization of the target words than for their categorization according to their phonetic characteristics. Moreover, Roman Jakobson [70] assumed that soundmeaning relations between a word and its immediate verbal context can also be functional for compensating mismatches between the phonetic characteristic of a word and its meaning (p. 373). The German word riesig [giant], for example, has /i:/ as the central vowel, which, according to our theory, would connote smallness rather than largeness. Following Jakobson, one would therefore expect a particularly high frequency of back and low vowels (i.e., vowels with a small formant dispersion) in the words surrounding riesig. In accord with Jakobson's assumption, we indeed found that the predictive accuracy of formant dispersion for samples containing the attribute groß [large] is considerably lower than that for samples containing riesig [giant]. Still, further research is needed to test the extent to which sound-iconic resonances are also found in the context of target words that by themselves do not show any congruence of sound and meaning.
Our results also raise questions regarding the functions of contiguity-based sound iconicity. We propose that a sound-iconic feedback between neighboring words might render the conceptual meaning more concrete and more salient. In animal communication, acoustic characteristics that signal dominance or submissiveness also have a strong emotional component as they are used to intimidate potential opponents, or to appease dominant rivals. In a similar vein, sound iconicity in human language use can be expected to implicitly set an emotional tone that is intended to guide, or prime, the recipient's understanding of a message. In fact, results from a recent EEG study suggest that the phonetic characteristics of a word can, relatively early in the cognitive processing of words, trigger an automatic shift of the reader's attention towards emotionally relevant content [136]. Sound iconicity might thus provide speakers and authors with means to simultaneously express meaning at various communicative levels that, at least potentially, complement and reinforce each other.
In sum, our results provide good empirical evidence for an interrelation between form and meaning in poetic language. More precisely, we found a statistically significant relation between the meaning of a word and the phonetic characteristics of its neighboring words. In light of Roman Jakobson's proposal that relations of contiguity constitute what rhetoric and poetics used to call metonymies [137], the type of sound iconicity we here propose might well be called metonymic sound iconicity.

Outlook
The phenomenon we propose is clearly in need of additional empirical evidence. First and foremost, it needs to be tested to what extent our findings based on single lines extracted from poems extend to other types of language use. Moreover, it would be interesting to investigate whether other phonosemantic relations, such as the bouba-kiki effect [28,31,[138][139][140] or the oft-claimed association between perceived pitch and brightness [35,102,[141][142][143] likewise come in metonymic variants in which the iconic relation is not found within a word, but is displaced onto a relation between two neighboring words.
The effect sound iconicity exerts on readers' cognitive processes should also be further investigated. Based on the assumption that analogies between form and content can foster the embodiment of abstract concepts, one would expect that text samples exhibiting sound iconicity have a higher potential to elicit appropriate physical responses to a given content.
Finally, the question arises whether or not sound iconicity is invariably about congruencies between sound and meaning. Could sound iconicity, for example, also be instrumental for creating meaningful oppositions between sound and meaning such that contradictory feelings are amalgamated within a single expression?   14. Whissell C. Phonosymbolism and the emotional nature of sounds: evidence of the preferential use of particular phonemes in texts of differing emotional tone.