Iconicity correlated with vowel harmony in Korean ideophones

This paper aims to establish connections between the following phenomena pertaining to Korean ideophonic vowel harmony: A set of vowel patterns classified (phonologically) as ‘harmonic,’ ‘neutral,’ and ‘disharmonic’; a set of ideophones classified (semantically) as onomatopoeic vs. cross-modal; and a set of form-meaning mappings classified (semiotically) as higher vs. lower in iconicity. Onomatopoeic ideophones represent sounds in the external world by linguistic sounds. To do so effectively requires taking whatever phonological and phonotactic liberties are needed. This predicts that (a) onomatopoeic ideophones will show great diversity in harmony patterns and, in contrast, (b) cross-modal ideophones that capture sensory imagery by using more abstract iconic mappings (Dingemanse et al., 2016) will have more ‘room’ to conform to vowel harmony. To test these hypotheses, the distribution of harmony patterns in onomatopoeic vs. cross-modal ideophones was examined, using a written corpus of Korean ideophonic stems. The results supported the hypotheses by revealing that onomatopoeic ideophones are skewed toward a larger proportion of disharmonic forms compared to crossmodal ideophones.

Korean ideophones exhibit stem-internal vowel harmony, which does not occur in the prosaic lexicon.
This paper focuses on Korean ideophonic vowel-harmony system, which contains so-called 'dark' and 'light' vowels (see Section 2 for details of this language-specific vowel distinction) and restricts the co-occurrence of those two vowel-harmony types within a morpheme (Cho, 1994;J.-S. Lee, 1992;H.-M. Sohn, 1999, among others). Within the ideophonic harmony system, legitimate violations of the harmony rule occur only with the presence of 'neutral' vowels-some dark vowels act as harmony-neutral in non-initial syllables and freely allow either dark or light vowels in the preceding syllable. Forms that contain dark and light vowels within a morpheme are considered an apparent violation of the system.
Acknowledging the possibly occurring harmony patterns in the ideophonic lexicon, this paper examines the connections between the following phenomena: A set of vowel patterns classified (phonologically) as harmonic, neutral, and disharmonic; a set of ideophones classified (semantically) as onomatopoeic vs. cross-modal; 2 and a set of form-meaning mappings classified (semiotically) as higher vs. lower in iconicity. In detail, using a written corpus of Korean ideophonic stems, this paper quantitatively tests hypotheses that onomatopoeic ideophones would show diversity in harmony patterns. That is so because they are bound to actual sounds, and therefore they would take whatever phonological and phonotactic liberties they need-this fits cross-linguistic observations that, among ideophones, those with onomatopoeic meanings tend to show the most diversity in phonology and phonotactics Akita, 2013;Childs, 1994). In contrast, cross-modal ideophones would conform to stricter vowel harmony, which within the ideophone inventory is considered unmarked, because they are not directly tied to sound.
Regarding the formulation of the hypotheses, it is not necessarily the case that all cross-modal ideophones are less iconic than all onomatopoeic ideophones. The present study resolves this issue by empirically establishing degrees of iconicity through native speakers' rating judgments on a randomly selected subset of the ideophones (see Section 3 for further details).
The paper is organized as follows. Section 2 describes the vowel harmony system in Korean ideophones, and Section 3 provides a brief introduction to the semantic subcategories of Korean ideophones and examines their associated iconicity levels on an empirical basis. Section 4 describes the corpus of Korean ideophonic stems used in the current paper, and Section 5 reports the relative proportions of onomatopoeic and crossmodal ideophones in their associations with neutral forms containing neutral /i, ɨ/ and partially neutral /u/ in non-initial syllables. The phonosemantic analysis expands to disharmonic forms containing non-neutral /a/ and harmonic forms. Section 6 discusses the results and Section 7 summarizes the paper. 2 The twofold onomatopoeic/cross-modal distinction is based on the traditional semantic categorization of the Korean ideophonic lexicon, i.e., ɨjsəŋə 'phonomimes' for depiction of auditory experiences and ɨjtɛə 'phenomimes plus psychomimes' for depiction of non-auditory experiences, such as visual or tactile sensations or psychological states. A more fine-grained level of distinction is possible, for example, by subdividing cross-modal domains into motion, texture, shape, and visual appearance, as in Dingemanse et al. (2016). However, the principal aim of this paper is to examine whether ideophones of the highest iconicity display the greatest diversity in the regular phonotactic of the ideophonic system. This indicates that the only relevant distinction is whether or not an ideophone is onomatopoeic (i.e., intra-modal). The iconicity level of different types of ideophones is examined empirically in Section 3.2.
In Middle Korean (15 th -16 th century), vowel harmony was active and regular throughout the entire vocabulary. The co-occurrence of the class of dark vowels (including /ɨ, u, ə/) with that of light vowels (including /o, a, ɔ/) 3 was strictly prohibited both stem-internally and -externally. The regular harmonic system, however, underwent disruption due to both a number of borrowings from Chinese, which had no harmonic system, and a historic vowel shift (Kim-Renaud, 1976, p. 397;Larsen & Heinz, 2012). Since that change, strict vowel harmony has largely disappeared from Modern Korean, leaving its trace in only a few limited cases, namely, verbal suffix harmony and ideophonic harmony (Larsen & Heinz, 2012). 4 Of those, the current paper focuses only on ideophonic harmony pertaining to monophthongs, which regularly make up the dark-light harmony classes (Larsen & Heinz, 2012); verbal suffix harmony is not discussed, since it does not predict any potential connection with iconicity. The ideophonic harmony system in Modern Korean comprises light vowels, consisting of /ԑ, (ø), 5 a, o/, and dark vowels, consisting of /i, e, (y), ɨ, ə, u/. Previously, several studies have tried to account for the harmonic groupings in Korean ideophones, using various features, such as [±low] (K.-O. Kim, 1977;McCarthy, 1983;H.-S. Sohn, 1986), and [±Advanced Tongue Root] or [±Retracted Tongue Root] (Cho, 1994;J.-K. Kim, 2000;J.-S. Lee, 1992;M. Lee, 2001;Y. Lee, 1993). However, none of the claimed distinguishing features are successfully supported by the formant frequencies of Korean monophthongs (see the detailed formant chart for F1 and F2 values of Standard Korean monophthongs spoken by female speakers in Larsen & Heinz,p. 437; see also Kwon, to appear). This indicates that the dark and light vowel sets in modern Korean are not a natural class that can be distinguished by any widely-accepted universal distinctive feature. Reflecting this fact, this paper adopts the traditional semantic terms dark and light to refer to the two harmony classes of vowels.
These semantic terms are in fact useful for describing the connotation that each vowel class carries in the ideophonic lexicon. Korean ideophones display systematic vowel alternations by associating the light vowels with a diminutive connotation (such as lightness, smallness, and fastness) and the dark vowels with an augmentative connotation (such as heaviness, largeness, and slowness) (Cho, 1994;Finley, 2006;J.-K. Kim, 2000;Y.S. Kim, 1984;Kim-Renaud, 1976;M. Lee, 2001;McCarthy, 1983;H.-M. Sohn, 1999). Alternations occur vertically, involving a change in the high/low feature, and also diagonally, involving a change in the frontness/backness feature (K.-O. Kim, 1977;Y. Lee, 1993). This results in seven possible alternating patterns, as in (1). 6 (1) Vowel alternating patterns 6 ɨ ə ɛ a -Tellingly, the vowel alternations, which create a series of semantic minimal pairs, occur not only in initial syllables but also in non-initial syllables, as exemplified in (2). This is because Korean ideophones are governed by a harmony rule-that the vowels within a stem should agree with the semantic feature (Cho, 1994;H.-M. Sohn, 1999;Larsen & Heinz, 2012).
( The position-sensitive harmony-neutral status of /i, ɨ/ has diachronic and synchronic grounds. In diachronic terms, the neutrality of /i/ in non-initial syllables is attributed to the newly appeared light /ɛ/ in initial syllables (which formed a harmonic pair with the position-insensitive neutral /i/) in the late 18 th century. The neutrality of /ɨ/ is attributed to a historical merger between the light /ɔ/ and its dark counterpart /ɨ/ in non-initial syllables around the middle of the 15 th century (K.-M. Lee, 1961Lee, , 1972. Synchronic evidence is found in Larsen and Heinz's (2012) corpus-based study showing that the neutral vowels /i, ɨ/ have a more or less equal distribution of dark and light vowels in the preceding syllables. According to Larsen and Heinz's study again, dark /u/ in non-initial syllables also frequently follows light vowels, at a ratio of around 2:1 (464/266). Although /u/ occurs with light vowels proportionately less than the traditional neutral vowels /i, ɨ/, Larsen and Heinz claimed that it patterns closely with the neutral vowels /i/ and /ɨ/ (in terms of transparency in vowel harmony) and that it is therefore at least partially neutral. In a diachronic sense, the partial neutrality of /u/ can be traced back to the raising of /o/ ~ /u/ in the late 19 th century (Cho, 1994;Ko, 2012;J.-S. Lee, 1992). Perhaps the varying behaviors of /u/ as a neutral, dark or optional neutral vowel in non-initial syllables in (4) may have been influenced by the ongoing midvowel raising process (e.g., hoto > hotu 'walnut'; cato > catu 'plum, ' Larsen & Heinz, 2012, p. 454 However, /a/ is harmonic to a greater extent (approximately 89% of the time) than the traditional neutral vowels, /i, ɨ/ and the partially neutral /u/ (Larsen & Heinz, 2012). Also, its neutrality does not have a robust historic basis. Therefore, this paper restricts neutral vowels to /i/, /ɨ/, and /u/, that is, to those which enter relatively freely into disharmonic patterns in an ideophonic word. Vowel harmony as a pattern of alternation will not be discussed further, because the focus of this paper lies on harmony patterns within individual ideophonic forms only. 7
Leaving aside the difficulty of creating a binary semantic classification of ideophones (which will be discussed in empirical terms in Section 3.2), intuitively speaking, ɨjsəŋə 'depiction of sound' seems to be more (transparently) iconic than ɨjtɛə 'depiction of visual/tactile information or of mental states,' because its form-meaning associations occur within the same modality. Empirical support for such intuitive claims is found in Dingemanse et al.'s (2016) behavioral experiment, in which Dutch listeners showed better rates of correct guessing of meaning for onomatopoeic ideophones than for cross-modal ideophones, when given words from five languages (including Korean) that they did not speak.
For further assurance of the hierarchy of iconicity linked to onomatopoeic/cross-modal distinction, the current research directly measured their iconicity levels in Korean ideophones. Specifically, it used ratings from native Korean speakers on a paper-based questionnaire. The rating task partly replicates work by Perry et al. (2015), who tested the iconicity of English and Spanish words in different lexical categories, such as onomatopoeia, nouns, and adjectives (see also Vinson et al., 2008 for iconicity rating in sign language).

Participants
Thirty native Korean speakers were recruited in Seoul through the author's personal contacts. Their participation was on a voluntary basis without pay.

Materials and procedure
Participants subjectively rated the iconicity of 170 randomly selected ideophones (onomatopoeic only: 17; cross-modal only: 120; both onomatopoeic and cross-modal: 33) that form 10% of the main data for analysis (onomatopoeic: 178; cross-modal: 1,262; both: 335). The dataset, from which a representative 10% is taken, contains North Korean (29.97%; 532/1,775) as well as South Korean dialects (70.03%, 1,243/1,775). Since all of the participants are speakers of the South Korean dialect, a random selection of ideophones was made from the South Korean dialect only for the questionnaire.
Three randomized versions of the questionnaire, in the form of a Microsoft Excel spreadsheet, were randomly distributed to the participants via email. In the rating task, participants were asked to look at the words in written form and to say them aloud before making their rating (on a scale from 1 to 7, where 1 indicates that a word is not at all iconic and 7 indicates that a word is highly iconic). The instructions to the participants included a careful definition of iconicity (see Appendix A for the instructions). The estimated time for completing the questionnaire was 30 minutes or less.

Results
For the analysis of the rating results, a linear mixed-effects model was run, with the lme4 (Bates et al., 2015) and lmerTest (Kuznetsova et al., 2016) packages in R (R Core Team, 2017). The dependent variable was the iconicity rating. The independent variable, implemented as a fixed effect, was the Semantic Type (three levels: Onomatopoeic, both onomatopoeic and cross-modal, and cross-modal). Participant and item served as random effects; the by-participant effects included the random intercept and the random slope for the semantic type. The by-item effects included the random intercept. The estimated means for the three semantic types of ideophones (with examples of stimulus words), which can generalize over participants and items, are shown in Table 1 below.
These results lend some support to the intuitive claim that onomatopoeic ideophones have stronger iconicity than cross-modal ideophones, by suggesting the following iconicity rank order: Onomatopoeic > both onomatopoeic and cross-modal = cross-modal. Given the inferred iconicity rank order, I combined those ideophones of both onomatopoeic and cross-modal meanings, and cross-modal ideophones into a single cross-modal type, against onomatopoeic ideophones. Then, I tested the following hypotheses: (a) That vowel harmony governs the ideophonic lexicon. Therefore, both onomatopoeic and cross-modal ideophones would be mostly harmonic or neutral. (b) Yet, since onomatopoeic ideophones are bound to actual sounds, they would be skewed toward a larger proportion of disharmonic forms. (c) Conversely, cross-modal ideophones would conform to stricter vowel harmony patterns and would disfavor disharmonic forms.

The written corpus containing a list of Korean ideophonic stems
This study used the same corpus as Larsen and Heinz's (2012) study. Although a detailed description of the corpus is found in Larsen and Heinz's paper (pp. 441-443), I address the main characteristics of the corpus that are relevant to the current study. The corpus, which is reported to contain 29,015 Korean ideophones, was developed during the compilation of pʰjocunkukətɛsacən 'the Great Dictionary of Standard Korean' . 9 A few concerns can be raised about the accuracy of the corpus, for the following reasons (as the distributor has admitted): (a) Some words in the corpus may not possess sound-symbolic meanings (i.e., no perceptual sensory meanings), although they exhibit a pattern of sound alternations as do ideophones; and (b) some may have no entries in the Great Dictionary of Standard Korean. Despite these concerns, I do not question the accuracy of the corpus, since I found that words falling under (a) and (b) occupy only 1.55% (32/2,062) of the underlying data for the current study (955 harmonic forms, 749 neutral forms containing /i/ or /ɨ/, 260 partially neutral forms containing /u/, and 98 disharmonic forms containing /a/; the details of the list are found in Section 5). 10 Another issue arises when considering the fact that many Korean ideophonic stems are combined with a verb, hata 'do, be,' or verbal suffix, kəlita or tɛta 'keep doing' (e.g., t'ak'ɨm-hata 'be painful' and t'ak'ɨm-kəlita 'keep being painful') (H.-M. Sohn, 1999, p. 101). Relating to this, the corpus contains multiple variants of a single underlying ideophonic stem. For example, it lists four variants-katuŋ-katuŋ, katuŋ-katuŋ-hata, katuŋ-kəlita, katuŋ-tɛta-built on a single stem, katuŋ 'swaying one's hips.' Among the variants, I extracted only the reduplicated forms to minimize confusion about whether or not the selected items were ideophonic.
Among those, I extracted reduplicated forms, including forms of echo reduplication, based on two-and three-syllable ideophonic stems (3,041 di-syllabic and 983 tri-syllabic types), since they are the two most frequent syllable lengths for ideophonic stems. 13 One-syllable-based reduplicated forms, such as (6a), were not extracted from the corpus because they provide no information about vowel harmony (299 types). Four-syllablebased reduplicated forms, such as (6g), were also excluded (10 types), because they appear to be a compound consisting of two reduplicated forms of two one-syllable-based stems (e.g., cʰik-and pʰok-). In addition, two-syllable-and three-syllable-based forms of echo reduplication that show differing vowel patterns between base and reduplicant were excluded (110 out of 316 types). Examples of such include kalpʰaŋ-cilpʰaŋ 'cluelessly' (for a syllable change) and nɨnsil-nansil 'behaving lasciviously' (for a vowel change). As for homophonous reduplicated forms that were listed twice in the corpus (e.g., katakkatak (01) 'into strips' and katak-katak (02) 'dry and stiff state of an object that was once watery'), 14 I kept only one token of each homophonous form (but attended to all of their meanings for semantic coding in the next sub-section). As a result, the total number of reduplicated ideophonic forms extracted from the corpus amounted to 4,024. This included 2,875 harmonic and 1,149 neutral/disharmonic stems. The number deviates from that in Larsen and Heinz's (2012) study, which extracted reduplicatives of di-and tri-syllabic ideophonic stems that displayed harmonic (e.g., katakkatak) and neutral/disharmonic (e.g., kaku-kakul 'winding') sequences with monophthongs (3,972 forms in total). This difference may have resulted from the exclusions of /y/ and /ø/ (cf. note 5) and the inclusions of some forms of echo reduplication in the present study. However, given that the number of stems containing /y/ and /ø/ is 40 in Larsen and Heinz's study while the number of stems of echo reduplication is 206 in the present study, there is still a slight difference in the number of stems between them. 15 Reasons for this remaining difference are unknown, and one can only speculate that it is due to counting errors in either study.
For the limitations of the corpus, Larsen and Heinz (p. 442) listed the following points. First, some reduplicatives found in the corpus are not familiar to Korean speakers. To minimize this issue, I did not consider those reduplicatives that did not appear in the dictionary. As well, their use is not found in the Kaist Concordance Program (http:// semanticweb.kaist.ac.kr/research/kcp/), a web-based program for searching expressions containing a target word in the Kaist Raw Corpus (1997), which contains 70 million Korean phrases. 16 Second, not all of the ideophonic reduplicatives in Korean are found in the corpus. I did not make any additions to the corpus for a future replication study 13 Syllable length does not affect the frequency of occurrence of onomatopoeic versus cross-modal types in Korean ideophones if their stem bases consist of more than one syllable (cf. ideophones of (6a) form have more onomatopoeic meanings than cross-modal meanings in Korean) (Wang, 2010). 14 For polysemous items, in contrast, the corpus lists only one token. For semantic coding, I attended to all of the available meanings for each polysemous item, applying the same practice as was used for homophonous items. 15 Hong (2010) used the exact same corpus and criteria as the current study, except for the inclusion of 206 forms of echo reduplication. However, even if the 206 forms are added to Hong's (2010) list, the number of ideophonic stems containing monophthongs (3,397, including original 3,191 forms plus 206 forms of echo formation) is largely different from the current study (4,024). When the number of the stems containing /y/ and /ø/ found in Larsen and Heinz's study is factored in (40) and the number of the stems in echo reduplication is factored out (206), it appears that the current study (expected number: 3,857) patterns more closely with Larsen and Heinz's study than Hong's study (expected number: 3,231). 16 Still, there remains a possibility that not all of the selected reduplicants in the corpus are likely to be familiar to general Korean speakers. If a significant portion of the data is not in the mental lexicon of language users, the implication of the results for iconicity correlated with the distribution of vowel harmony may have limited validity, from the perspective of Korean synchronic grammar. However, the iconicity rating results in Section 3.2 demonstrate that only 2% of the answers were marked as "don't know the word's meaning." Therefore, I find little reason to question the representativeness of the data.  Sohn, 2012), so I did not differentiate between them in the main analysis. Still, the point that needs to be addressed here is that there is a synchronic merger between /ə/ and /o/, caused by the rounding of /ə/ in standard North Korean (Kwak, 2003). On the other hand, a merging of /e/ with /ɛ/ has occurred in all areas of South Korea, due to the raising of /ɛ/ (Ingram & Park, 1997;Tsukada et al., 2005;Yang, 1996, among others). These different synchronic mergers in the two dialects create potential confounds in the data analysis, as they appear to transform the dark /ə/ into the light /o/ in North Korean and the light /ɛ/ into the dark /e/ in South Korean. In order to not make any changes in the corpus, I retained those reduplicatives that contained the potential confounds, but considered the effects of the synchronic mergers in each dialect separately in the analysis in Section 5.

Semantic coding
According to the compilation guidelines 18 for the Great Standard Korean Dictionary (http://stdweb2.korean.go.kr/main.jsp), the glosses of onomatopoeic ideophones contain the phrase 'the sound of …' or 'the sound made when conducting the action of …' in wordfor-word translations. On the other hand, the glosses of cross-modal ideophones contain the phrase 'the shape/way of …', 'the state of …' or 'the feeling of …' (the National Institute of the Korean Language, 2000). Following the guidelines, I checked the meanings of the reduplicatives using the dictionary and, based on those meanings, I assigned the semantic codes 'O' for onomatopoeic and 'C' for cross-modal meanings. When a reduplicative had both onomatopoeic and crossmodal meanings (e.g., tekul 'the sound or action of an object rolling'), it was assigned C, as it was revealed in Section 3.2 that its iconicity level is not significantly different from the iconicity of cross-modal ideophones. When there appeared to be multiple identical semantic codes for one reduplicative, they were merged into one. For example, t'ɨk'ɨm has three related cross-modal meanings: (1) Burning sensation when one suddenly touches a fire (C); (2) enthusiasm when one is under the inspiration of someone/something (C); and (3) pain when one is being beaten or pricked (C). The three Cs were merged into one C.
A limitation of semantic coding is that it could not be applied to all of the reduplicatives, for the following reasons. First, there were some reduplicatives whose definitions did not appear in the dictionary. In this case, their actual use was searched in the Kaist Concordance Program, and only those that were found in naturally occurring data received a coding (based on their meanings in the exemplified expressions). Second, there were some reduplicatives whose meanings were not sound-symbolic, such as mitʰamitʰa 'suspicious'-they were excluded in the main analysis. Third, some reduplicatives appeared to be mistake forms of other reduplicatives in the corpus. For example, the neutral k'asil-k'asil 'rough skin or hard-grained character' was defined as a mistake form of the neutral k'asɨl-k'asɨl. Similarly, the neutral patɨŋ-patɨŋ 'struggle in agony or wriggle' was defined as a mistake of the harmonic patoŋ-patoŋ. In this case, only the meanings of the correct reduplicatives were counted in the analysis. The correct forms were not newly added to the corpus, though, to avoid unnecessary duplication (the correct forms were already in the corpus). The specific number of each semantically non-classifiable case is reported in Appendices C-F.

The distribution of non-initial /i, ɨ/ and /u/ in the data
Before a discussion of the distribution of neutral forms (i.e., forms that contain neutral /i, ɨ/ or partially neutral /u/) in onomatopoeic vs. cross-modal ideophones, I examine their neutrality by considering the proportions of the harmonic and neutral forms they produce in the current data (which include a total of 4,024 ideophonic reduplicatives in Korean). If their neutrality is strong, they should produce harmonic (i.e., the forms where the neutral vowels are preceded by dark vowels) and neutral forms (i.e., the forms where the neutral vowels are preceded by light vowels) at approximately the same ratio. Since vowel patterns in the base and reduplicant are identical in all of the forms in the dataset, I chose to consider vowel patterns in base stems only.
The result is largely in line with Larsen and Heinz's (2012) study: The counts when the non-initial /i/ and /ɨ/ occur with dark vowels (296 harmonic stems for /i/ and 565 harmonic stems for /ɨ/) were not much different from the counts when they occur with light vowels (244 neutral stems for /i/ and 505 neutral stems for /ɨ/). 19 The vowel /u/ is seen to occur more frequently with dark vowels (461 harmonic stems) than with light vowels (260 neutral stems). However, it is still distinct from non-initial dark vowels at a statistically significant level (p < .001***), in terms of the frequencies of light vowels in the preceding syllables (Larsen & Heinz,p. 449)-non-initial dark /e/ and /ə/ preferred to follow dark vowels as against light vowels at a ratio of approximately 35:1.
For disharmonic forms that did not contain /i/, /ɨ/, or /u/ in non-initial syllables, there were 133 forms occupying 3.30% of the entire data (133/4,024) and a large number of these involved a non-initial light /a/ (98 out of 133 forms). This raises a question about whether /a/ should also be classified as (partially) neutral. However, the non-initial /a/ preferred to be preceded by light vowels (598 harmonic stems) than dark vowels (98 disharmonic stems) to a greater degree (i.e., approximately at a ratio of 6:1) than other neutral vowels.
To statistically test whether /i/, /ɨ/, /u/, and /a/ were different from each other in their co-occurrences with dark and light vowels, I conducted Fisher's exact tests on pairs of vowels (Figure 1), using the R statistical software package (R Core Team, 2017). The results with Odds Ratios (OR) are shown in Table 2.
Given this, the following sub-sections report connections between two semantic types of ideophones associated with different iconicity levels and stems containing non-initial /i, ɨ/, /u/, and /a/ in order. The iconicity correlated with apparently harmonic stems that do not contain any of the vowels of interest (i.e., /i, ɨ/, /u/, /a/) in non-initial syllables is measured next. Sub-sections 5.2 to 5.5 contain only the major results of a comparison. For readers who wish to see substantial detail related to justification for the selection of the relevant datasets, refer to Appendices C-F.

Iconicity correlated with stems containing the neutral /i, ɨ/
From 749 neutral forms containing non-initial /i/ or /ɨ/ (i.e., forms where /i/ or /ɨ/ follows light vowels), 131 forms were excluded for semantic coding (see their details in Appendix C). In brief, 14 forms were eliminated as they were listed as mistakes of forms already found in the dataset; 85 forms were eliminated as they could have been affected by the /ɛ/~/e/ merger in South Korean dialects; seven forms were eliminated as their meanings were not found in the dictionary; and 25 forms were eliminated because they instantiated partial reduplication.

Iconicity correlated with neutral stems containing the partially neutral /u/
There were 260 neutral forms containing a non-initial /u/. Of those, a total of 50 forms (in Appendix D) were eliminated before the examination of iconicity correlated with the neutral /u/ in Korean ideophones. Consequently, 210 forms remained for the semantic 20 NK denotes a North Korean form. Figure 1: Stem frequencies where the co-occurrence of a non-initial /i/, /ɨ/, /u/, and /a/ with dark and light vowels can be seen. analysis, and of those, 206 forms (or 98.10%) represented cross-modal meanings, while four forms (or 1.90%) represented onomatopoeic meanings only (e.g., t'alk 'uk-t'alk'uk 'hiccup'). To sum up Sections 5.2-5.3, the proportional distributions of neutral forms with /i, ɨ/ of a strong neutrality and /u/ of a weak neutrality in onomatopoeic and cross-modal types of ideophones differ in the same direction. Specifically, the baseline proportions of the neutral and the partially neutral forms in the observed subset of ideophones are 34.82% (618/1,775) and 11.83% (210/1,775), respectively. In cross-modal ideophones, the corresponding proportions are 36.19% (578/1,597) and 12.90% (206/1,597), so they remained similar. However, in onomatopoeic ideophones, the distributions of neutral (22.47%; 40/178) and partially neutral forms (2.25%; 4/178) are significantly lower than in cross-modal ideophones (p < .001***, two-tailed proportion test). This indicates that onomatopoeic ideophones would show a corresponding increase in either or both the remaining harmony patterns (i.e., harmonic and disharmonic). The measurement of proportions of the disharmonic stems containing /a/ (Section 5.4) and the harmonic stems (Section 5.5) in onomatopoeic vs. cross-modal ideophones follows.

Lexical iconicity correlated with disharmonic forms containing /a/
There were 98 disharmonic forms containing non-initial /a/. Of those, a total of 35 forms (in Appendix E) were eliminated, but 11 forms were newly added from the list of disharmonic forms containing /i, ɨ/. These 11 forms, exemplified in (7), were moved here because they appeared to be disharmonic forms containing /a/ rather than /i, ɨ/, when the /ɛ/~/e/ merger is taken into account.

Iconicity correlated with harmonic forms that do not contain non-initial /i, ɨ/, /u/, or /a/
Harmonic forms that do not contain any of the aforementioned vowels /i, ɨ/, /u/, or /a/ in non-initial syllables amounted to 955. Of those, 82 forms (in Appendix F) were eliminated. After this elimination, 873 forms remained, and of those, 760 forms (or 87.06%) were classified as having cross-modal meanings (e.g., nətəl-nətəl 'in tatters'; k'ɛlk'ɛk-k'ɛlk'ɛk 'a sound or state of chocking') while 113 forms (or 12.94%) were classified as having onomatopoeic meanings only (tekək-tekək 'a rattling sound'). Figure 2 shows the number of onomatopoeic or cross-modal stems by reference to the four categories-harmonic forms, neutral forms containing /i, ɨ/, partially neutral forms containing /u/, and disharmonic forms containing /a/.
In sum, in the subset of ideophones analyzed, cross-modal (96.68%; 1,544/1,597) and onomatopoeic ideophones (88%; 157/178) are mostly linked to harmonic or neutral forms. This indicates that they generally conform to the conventional phonotactic of the ideophonic lexicon. However, as a remarkable difference between them, onomatopoeic ideophones are skewed toward a larger proportion of disharmonic forms.

Discussion
The current corpus-based study reveals that the vowel-harmony system in Korean ideophones is associated with iconicity in a complicated manner. In general, ideophones obeyed vowel harmony. However, a closer look at the distribution of harmony patterns (i.e., harmonic, neutral, and disharmonic) in onomatopoeic vs. cross-modal ideophones reveals that the former shows greater diversity in harmony patterns than the latter. This supports the hypothesis that highly iconic ideophones would take the phonotactic liberties they needed, so that there would be some skewedness in the distribution of vowel harmony patterns (i.e., the conventional phonotactic of Korean ideophones) in onomatopoeic vs. cross-modal ideophones. In fact, onomatopoeic ideophones were relatively frequently associated with disharmonic forms containing /a/, which does not possess a legitimate neutral status. In contrast, cross-modal ideophones that represent abstract iconic mappings conformed to stricter vowel harmony patterns, which are phonologically motivated regularities.
The findings that highly iconic ideophones are relatively free from the conventional phonotactic of the ideophonic system are reached from quantitative data. Perhaps, to find further empirical evidence, it would be useful to conduct production experiments with native Korean speakers in a future study. For example, one could ask Korean speakers to produce novel ideophones for ideophonic and cross-modal meanings, for which real ideophones do not exist in the language. One could then examine the distribution of harmonic, neutral, and disharmonic forms in the novel ideophones. If the skewed distribution of harmony patterns in onomatopoeic vs. cross-modal ideophones was also observed in the production data, the current investigation would gain strong psycholinguistic validity.

Summary
The primary aim of this study was to examine whether there is any correlation between the levels of iconicity and the degree to which ideophones conform to harmony constraints in Korean. Using a written corpus of Korean ideophonic stems, the study examined the meanings of 873 harmonic forms that do not contain any of the (potential) neutral vowels (/i/, /ɨ/, /u/, and /a/), in non-initial syllables, and of 828 neutral and 74 disharmonic forms (1,775 ideophonic stems in total). The results showed that both onomatopoeic and cross-modal ideophones were mostly harmonic or neutral. But onomatopoeic ideophones (i.e., a highly iconic type of ideophone) were skewed toward a larger proportion of disharmonic forms. This quantitatively confirms the hypothesis that high iconicity is correlated with phonotactic diversity.