Defining iconicity: An articulation-based methodology for explaining the phonological structure of ideophones

Iconicity is when linguistic units are perceived as ‘sounding like what they mean,’ so that phonological structure of an iconic word is what begets its meaning through perceived imitation, rather than an arbitrary semantic link. Fundamental examples are onomatopoeia, e.g., dog’s barking: woof woof (English), wou wou (Cantonese), wan wan (Japanese), hau hau (Polish). Systematicity is often conflated with iconicity because it is also a phenomenon whereby a word begets its meaning from phonological structure, albeit through (arbitrary) statistical relationships, as opposed to perceived imitation. One example is gl(Germanic languages), where speakers can intuit the meaning ‘light’ via knowledge of similar words, e.g., glisten, glint, glow, gleam, glimmer. This conflation of iconicity and systematicity arises from questions like ‘How can we differentiate or qualify perceived imitation from (arbitrary) statistical relationships?’ So far there is no proposal to answer this question. By drawing observations from the visual modality, this paper mediates ambiguity between iconicity and systematicity in spoken language by proposing a methodology which explains how iconicity is achieved through perceptuo-motor analogies derived from oral articulatory gesture. We propose that universal accessibility of articulatory gestures, and human ability to create (perceptuo-motor) analogy, is what in turn makes iconicity universal and thus easily learnable by speakers regardless of language background, as studies have shown. Conversely, our methodology allows one to argue which words are devoid of iconicity seeing as such words should not be explainable in terms of articulatory gesture. We use ideophones from Chaoyang (Southern Min) to illustrate our methodology.

Ideophones are often cited as examples of sound symbolism which can be summed up to the relation of linguistic form to its meaning (Hinton et al. 1994). One implicit assumption underlying the term sound symbolism is that phonemes, or clusters of phonemes, map onto meaning below word or morpheme level thus acting as affordances which together Glossa general linguistics a journal of Thompson, Arthur Lewis and Youngah Do. 2019. Defining iconicity: An articulation-based methodology for explaining the phonological structure of ideophones. Glossa: a journal of general linguistics 4(1): 72.  allow the sound symbolic word to take on meaning. For example, the /ŋ/ in the English onomatopoeia /diŋ.dɔŋ/ seems to be characteristic of the reverberating echo of a bell tolling, while the alternating /i/ and /ɔ/ seems characteristic of movement or a fluctuation in pitch as the bell tolls. While various studies have worked to elicit submorphemic sound-to-meaning phoneme correspondences (McCune 1983;Maduka 1988;Oswalt 1994;Rhodes 1994;Hamano 1998;in press;Hutchins 1998;Blust 2003;Ofori 2009;Assaneo et al. 2011;Urban 2011;Akita et al. 2013;Ayalew 2013;Kwon & Round 2015;Blasi et al. 2016;Akita 2017;De Carolis et al. 2017;Strickland et al. 2017; Aryani 2018;Kawahara et al. 2018;Shih et al. 2018), the underlying mechanisms which allow for such correspondences in spoken language remain unclear (cf. Emmorey 2014 for signed language; cf. Sidhu & Pexman 2018 for overview). Specifically, it is unclear whether such correspondences are iconic (form miming meaning), and thus presumably universal, or simply systematic (form-meaning meappings).
Systematicity is defined as arbitrary and potentially language-specific patterns of sounds which exhibit a statistically consistent relationship to a group of words (Dingemanse et al. 2015: 604). The line between iconicity and systematicity may seem blurry since words embedded with iconic properties, like ideophones, sometimes exhibit systematic patterns. For example, nasal stops for many ideophones are interpreted as a systematic encoding of reverberation (Hinton et al. 1994;Hamano in press). Another example is Hamano's (1998) full-scale analysis of Japanese ideophones, which is built on the idea that systematic sound-meaning mappings are rooted in iconic properties of phonemes. However, this does not mean that systematicity is iconicity. Technically speaking, morphemes are systematic form-meaning mappings, e.g., English prefix pre-. However, there are no claims that pre-is iconic because its systematic and Latinate origins are rather straightforward (OED 2018). Yet, this not the case for phonaesthemes. Phonaesthemes are clusters of phonemes which systematically pattern to meaning, e.g., gl-as in glisten, glimmer, and glint in English. Because of this and their diachronically obscure origins (see Table 1), phonaesthemes are misconstrued as examples of iconicity at work in spoken language (Hinton et al. 1994;Waugh 1994;Hutchins 1998;Bergen 2004;Smith 2014;Kwon & Round 2015). It is assumed that phonaesthemes are statistically consistent in their form-meaning mappings because they are inherently iconic one way or another. But if gl-is in fact iconic, then one might very well ask how gl-is imitative of glistening or any property of light for that matter. And one might also ask where that leaves cross-linguistically attested onomatopoeic /g/+/l/, as in glug, gloop, glom, and gulp (see Table 1). If we decide that systematicity is statistically robust because of iconicity, then we risk muddying the definition of iconicity. Iconicity is supposed to be rooted in universal, cognitive capabilities, not necessarily in statistical relationships. And if iconicity is not universal, then we must revise its quintessential potential for universality (Perniss et al. 2010).
To that end, as Dingemanse et al. (2015) propose, systematicity is often languagespecific, not to mention arbitrary. Our study is motivated by the fact that there is no consensus for how to disentangle iconicity from systematicity. This paper offers a straightforward methodology for teasing apart systematicity and iconicity by using articulatory gestures to explain why some phonosemantic mappings are iconic and others not. Building a methodology to differentiate iconicity from systematicity is crucial in the research program of sound symbolism in order to make iconicity comparable across languages and identify which aspects are truly universal. We use ideophones from Chaoyang (Zhang 2016: 166-187)-a variety of Shantou (code: shan1244), a Southern Min (Sino-Tibetan) language spoken on the south eastern coast of Guangdong, P. R. China-to exemplify this methodology by identifying and explaining the perceptuo-motor analogies which underpin Chaoyang ideophone codas. 1 We focus on codas to illustrate our methodology because codas have been described as iconic of perceptual endings for ideophones across many unrelated languages (McCune 1983;Maduka 1988;Rhodes 1994;Oswalt 1994;Hamano 1998;in press;Hutchins 1998;Li 2007;Ofori 2009;Assaneo et al. 2011;Urban 2011;Ayalew 2013;Akita 2017;Strickland et al. 2017). 2 In order to propose a new methodology that can tease apart iconicity from systematicity, we first overview how iconicity has been identified thus far in the literature. Some linguists argue that as long as native speakers consistently rate a word as "sounding like what it means" this judgement should be sufficient for knowing what is or is not iconic, and hence what the phonosemantics correspondences are in a given language. This notion has formed the basis of a few iconicity studies (Winter et al. 2017;Aryani 2018;Sidhu & Pexman 2018). Other linguists argue that this judgment-based 1 Chaoyang is reported to have 248 ideophones (Zhang 2016), making it an inventory comparable to reports on ideophones from other languages: Akan Twi = 190 (Ofori 2009); Pastaza Quichua = 293 (Nuckolls and Swanson 2019, Quechua Realwords Corpus); Kisi = 99 (Childs 1988); Kam = 225 (Gerner 2005); Kuhane = 66 (Mathangwane & Ndana 2014); Upper Necaxa Totonac = 145 (Beck 2008); Yakkha = 65 (Schakow 2016); Temne = 76 (Kanu 2008); Lithuanian = 44 (Wälchli 2015); Uyghur = 50 (Wang & Tang 2014); Pichi = 29 (Yakpo 2019). However it is unclear how much explanatory power should be allocated to inventory size given that ideophones have been recently redefined as belonging to an open class (Dingemanse 2019). 2 Exemplar evidence for treating codas as endpoints comes from Upper Necaxa Totonac (Beck 2008) where ideophones depicting event endings with a potential for iterativity can undergo resyllabification so that the coda of an originally monosyllabic ideophone can be reduplicated to express that its endpoint recurs, e.g., /poŋʃ/ 'a large object striking water' (=endpoint occurs once) > /poŋ.ʃu.ʃu/ 'multiple objects falling into the water' (=endpoint occurs several times). An English equivalent might be something like sploosh-sh-sh. This coda-to-endpoint mapping is further supported by Emmorey's (2014) proposal of iconicity as structure mapping. methodology is problematic because one is never entirely certain of what linguistic or cognitive intuitions speakers are 'tapping into' when making such a judgement (Akita & Dingemanse 2019). Instead, they argue that iconicity is in fact grounded in imitation or imitative properties, through either analogical or perceptuo-motor means (Nuckolls 2000;Dingemanse 2012;Dingemanse et al. 2015;Hatton 2016), much like the iconicity we see in hand gestures or sign languages (McNeill 2000;Kendon 2004;Emmorey 2014;Ortega et al. 2017;Ortega 2017). Phonosemantics is thus the phonological encoding of the imitation inherent to iconicity. Following on from that, we take the position that phonosemantic mappings should be explainable in terms of imitation. When it comes to communication in the visual modality, this is highly intuitive and can be easily tested for universal tendencies by simply showing a sign or gesture to someone (Drijvers & Özyürek 2017;Ortega 2017;Ortega et al. 2017;Östling et al. 2018). But additional layers of language-specific complexity in spoken languages such as phonotactics, phonological inventories, or lexical association prevent the phonosemantics from jumping out at us like the imitation seen in the visual modality. For example, plosives in coda position are assumed to encode an 'abrupt ending' to a sound or event (Hinton et al. 1994). But then there is the problem of how languages which do not allow plosives in coda position, like Japanese or Mandarin, encode abruptness, if at all. It is perhaps for these language-specific and phonological reasons that so far no methodology has been introduced to identify phonosemantics (teasing them apart from aforementioned layers of complexity) and compare them across languages (to address the question of their universal or cross-linguistic bearing). Until such a methodology is established, it will be difficult to rectify the problem of judgement-based iconicity versus iconicity as imitation, since neither can be objectively tested for the spoken modality.
To begin rectifying the issue of what exactly makes a sound symbolic word iconic, and so the field of iconicity can move toward a unified understanding of what affordances in the spoken modality should be classified as iconic, this paper attempts to propose a new methodology for identifying how phonemes map to meaning through perceptuo-motor analogies. The proposed methodology relies on insights from gesture and sign language studies to create a multimodal explanation of how speech sounds are perceived as imitative.
First, §2 will provide a brief overview for the existing definitions of phonosemantics and propose a revised definition based upon that for the purposes of this study. Next, §3 will outline how iconicity is understood in gesture and sign language studies, as this provides an analogical basis for how the proposed methodology works for phonosemantics. Then, to illustrate the proposed methodological process step by step, we take ideophones (a.k.a. mimetics, expressives) in Chaoyang as our example dataset. Our reasoning for using ideophones to illustrate this methodology is because ideophones have been shown to be easily learnable by speakers from different language backgrounds, which speaks to the imitative nature of ideophones despite language-specific differences (Iwasaki et al. 2007a;b;Dingemanse et al. 2016;Lockwood et al. 2016). Thus, ideophones are an ideal testing ground due to their learnability based on phonological information coupled with semantic cues. §4 will show how phonosemantics can explain the imitative properties of relative and gestalt iconicity by eliminating phonotactically predictable segments and thereby extrapolating sound-meaning correspondences from imagic iconicity (i.e., onomatopoeia) based in articulatory features, e.g. [±tongue in resting position], [±closed lips], [±nasal airflow]. In this section, our methodology will be outlined in five steps. §5 will demonstrate the methodology by analysing the coda positions of Chaoyang ideophones. Finally, §6 will discuss the greater implications of this methodology in relation to learning and identifying universal properties of human perception expressed through linguistically imitative means.

Phonosemantics
Submorphemic sound-to-meaning mappings have been proposed for a number of languages (Maduka 1988;Rhodes 1994;Oswalt 1994;Waugh 1994;Hamano 1998;Blust 2003;Assaneo et al. 2011;Akita et al. 2013;Ayalew 2013;Kwon & Round 2015;Blasi et al. 2016). The general assumption behind these studies is loosely encapsulated by a hypothesis that "every phoneme is meaning-bearing", and that this meaning "is rooted in its articulation" (Diffloth 1972;1979; c.f. Dingemanse 2018 for review). This second tenet of phonosemantics has garnered support from recent studies which have found that articulation bears a relationship to imitative meaning (Oda 2000;Assaneo et al. 2011;Strickland et al. 2017;Taitz et al. 2018). However, this study does not assume that phonemes are always meaning-bearing in all phonological contexts. We follow Diffloth's approach to phonosemantics (1972;1979), whereby we assume that phonemes are meaning-bearing within the context of the expressive lexicon (hereon ideophone inventory) of a language. However, in light of recent research showing that ideophone inventories are subject to language-specific phonotactic constraints (Akita et al. 2013;Nasu 2015;Tsou 2017; , we do not assume that all ideophone inventory phonemes in surface form are meaning-bearing. Some phonemes are realized for purely phonological reasons, e.g., to satisfy constraints with regard to syllable structure. Therefore, our methodology requires that phonotactically motivated phonemes be removed before proposing phonosemantic mappings. From here on "phonosemantic" will refer to sound-meaning mappings embedded with imitative properties, like those of ideophones, while "form-meaning" will refer to sound-meaning mappings derived through systematicity, like those of phonaesthemes. For the purposes of introducing the methodology of this paper, we will look at words belonging to the ideophone inventory of Chaoyang. As stated above, we take the position that all ideophones should be semantically explainable through some of their articulatory properties and the perceptuo-motor analogies they afford, while prosaic (i.e., nonimitative, arbitrary) words should not (Dingemanse et al. 2015). Most prosaic words should beget their form-meaning mappings through historical roots or other lexical or systematic phonological associations. 3 This is illustrated in Table 1 below, where row 1 is purely a phonological pattern, as opposed to a meaning-bearing pattern, row 2 is a formmeaning pattern derived through historical (as opposed to imitative) processes, and row 3 is a phonosemantic pattern with articulatory grounding and attested across multiple unrelated languages (Thompson 2017).
Contrary to our theoretical position outlined above, Kwon & Round (2015) would propose that Type 2 Phonological + Meaning in Table 1 be considered iconic. According to this line of thinking, iconic sound-meaning mappings can be language -specific and are not necessarily universal. This divide in how iconicity should be identified, and its implications on the universal nature of iconicity, is our point of departure: If iconicity in the spoken modality is not always adherent to universal tendencies across unrelated languages, but is instead expressible via more language-specific means, then languagespecific iconicity should still be rooted in some universal tendencies about how imitation is perceived. And, if the perception of words as imitative is derived from perceptuomotor analogies, i.e., analogies which form meaningful links between motor skills and sound (Dingemanse 2012;Dingemanse et al. 2015), then imitative meaning should be encoded via phonosemantic mappings, since their phoneme meaning is rooted in their articulation, and articulation of sound is a type of motor skill. Following on from this, the present methodology seeks to identify and explain iconicity using the perceptuo-motor analogies created from articulatory gesture and movement. We take the position that these perceptuo-motor analogies are encoded phonemically and therefore create phonosemantic mappings. Ultimately, if the form-meaning mapping of a word cannot be grounded in the articulatory gesture, then said word should not be considered iconic, because its meaning is not explainable via perceptuo-motor analogy.
Given that the current proposal relies on the analogy of imitation through gesture, it is important to understand how iconicity is expressed through physical means in the visual modality. The visual modality is full of iconicity (McNeill 2000;Brentari 2010;Channon & van der Hulst 2011;Östling et al. 2018). Our reason for drawing on the visual modality to inform this methodology is because sign language and gesture researchers have clear ideas about what is iconic and what is systematic (Bellugi & Klima 1976;McNeill 2000;Kendon 2004;Emmorey 2014;Ortega 2017;Östling et al. 2018). Research in this vein of the visual modality is based on gesture (handshape, hand movement), something which is analogically relatable via oral articulatory gesture to the spoken modality. This line of reasoning seems appropriate since ideophone researchers have proposed ideophones to be oral extensions of physical or bodily gestures (Nuckolls 2000;Mihas 2013;Hatton 2016). By drawing more detailed parallels with the spoken modality in Table 1, the next section outlines how the visual modality can be understood as both systematic, systematic and meaningful, and systematic and imitative. These parallels in turn form the basis of our current methodology.

Identifying iconicity in the visual modality
For communication in the visual modality, iconicity is predominantly understood as imitation: handshape or hand movement is perceived to share some property of its referent (McNeill 2000;Kendon 2004;Brentari 2010;Emmorey 2014), such as a flat open hand as depictive of level or even surfaces due to shared properties of the handshape and its referent. These shared properties form what Emmorey (2014) calls structure mappings in iconicity. The concept of flatness or a flat surface is structurally mapped with a flat handshape. Shared properties between a sign and its referent are fundamental when it comes to determining whether or not a linguistic unit is imitative or structure-mapping.
Examples in Table 2 are from Hong Kong Sign Language (HKSL) (illustrated with handshape font created by the Centre for Sign Linguistics and Deaf Studies, Chinese University of Hong Kong). As in row 3 of Table 2 below, for signs like PAPER, FOLD, HIGHWAY, and BOOK, the active articulator (hand) assumes a flat and open shape. This articulation is common across signs depicting similar semantic domains across unrelated sign languages (cf. Asia Signbank, SignTyp, and KISR's Kuwaiti Sign Dictionary). What causes these signs to share properties (look similar) even though these sign languages might have had little contact? The answer is imitation or, again, what Emmorey (2014) calls structure mapping. 4 Signs involving iconic properties, like those in row 3 of Table 2, have also been shown to be easier for naïve signers to learn (Ortega 2017), as opposed to the prosaic signs in rows 1 and 2. When it comes to iconic signs, this ease of learnability speaks to their imitative nature which is accessible even to those with little or no sign language background.
With this idea of shared properties creating structure mappings between handshape and meaning, some additional parallels between Table 1 above and Table 2 below can be drawn. Like the segments in the third column of Table 1, all the handshapes of Table 2 can act as submorphemic units. Row 1 of Tables 1 and 2 show submorphemic units which (1) do not relate back to a common semantic domain (e.g., flat, sat, pat, hat, cat = ? shared meaning; NINE, REMEMBER, BRITISH, PLASTIC = ? shared meaning), and (2) thus have no potential for being classified as iconic in this case. Therefore, rows 1 of Tables 1 and 2 are purely units of phonological patterning. There is no structure mapping in these handshapes. However, if a submorphemic unit is indicative of a common semantic domain, then this could be an indication of iconicity at work. But, as seen in rows 2 of Tables 1 and 2, this is not always the case. Common semantic domains can also be expressed through systematicity, i.e., specific patterns which are associated to a meaning arbitrarily. After a little observation, it becomes quite self-evident that there is in fact no shared property between the sign and its meaning for rows 2 in Tables 1 and 2. The association between submorphemic units and meaning is purely systematic and must be learnt rote.
The above discussion about iconicity in the visual modality is based on examples of primary or direct iconicity. In row 3 of Table 2 we have handshape matching shape meaning. What we have left out of Table 2 is when handshape maps to a meaning indirectly depicted by its shape, e.g., a flat open hand to depict begging. A mapping of this type naturally requires an additional analogical step, e.g., receiving an offering in the hands > asking for an offering > begging, which is later discussed with respect to COGNITIVE ideophones in §5.2. Note that the purpose of this section is to show that shared properties between a linguistic unit (no matter the modality) and its meaning are the key to determining whether or not phonosemantic mappings are rooted in iconicity. If a linguistic unit cannot be explained in terms of shared properties, ideally supported by cross-linguistic evidence, the outlook for iconicity should be very weak. Based on the assumptions in this chapter, we propose a methodology for identifying iconic phonosemantic mappings by applying the general form-meaning principles exemplified in Tables 1 and 2 to a phonological analysis of ideophones from Chaoyang.

Theoretical assumptions
A few theoretical assumptions need to be addressed before we show how to elicit and explain iconic phonosemantic mappings. Firstly, there are three types of iconicity defined by Peirce and frequently referred to in the literature: imagic, gestalt, and relative iconicity (Dingemanse 2011). Imagic iconicity refers to sound depicting sound, as in onomatopoeia. Gestalt iconicity refers to linguistic structure mapping onto event structure, e.g., syllable coda depicts the ending of an event. This is essentially structure mapping (Emmorey 2014). Relative iconicity is when related forms are extrapolated to depict related meanings, e.g., high vowels depict an acoustic high pitch and therefore a visually small-size. Research in the semantic typology of ideophones shows that ideophones cover all three types of iconicity cross-linguistically (Lu 2006;Dingemanse 2012;Van Hoey 2016). We would like to emphasize that our approach does not see these three types as mutually exclusive. Instead, we take iconicity as gestalt (phonological structure maps to the structure of the event depicted), and most iconicity is relative because it depicts more than just acoustic qualities of its referent. For example, the structure of imagic iconicity is gestalt, in that the phonological structure of an onomatopoeia somehow depicts the event structure of that sound (how long the sound is perceived to last, how the sound is perceived to begin/end, i.e., gradually or suddenly). Moreover, phonemes which appear in onomatopoeia also have the ability, through relative iconicity, to take on different meanings for non-auditory ideophones. That is to say, our methodology assumes that phonosemantic mappings can take on analogically relatable meanings in different contexts (relative) which can ultimately be explained through movement-to-movement or sound-to-sound (imagic) mappings from articulatory gestures. Given their potential for extrapolation, it is no coincidence then that these sound-to-sound or movement-to-movement mappings are defined by Dingemanse's (2012) implicational hierarchy as the most basic and fundamental semantic categories of ideophones cross-linguistically. Finally, whether the ideophone meaning is sound-related (imagic) or not, how phonosemantic mappings are arranged or ordered in word form is in fact gestalt due to their ultimately relative nature.

Methodology: Identifying iconic sound-to-meaning mappings in the spoken modality
Drawing on previous phonosemantic studies (Hamano 1998;Akita et al. 2013;Kwon & Round 2015), as well as criticisms about their shortcomings, 5 we have devised five main steps to elicit and explain iconic phonosemantic mappings: (1) Establish an inventory and create subgroups according to semantic relatedness (2) Identify phonological components (contrast, phonotactics, alternations) and establish the roots of pure phonosemantic structures (3) Cross-check semantic relations between phonosemantic structures using (near) minimal pairs (4) Identify the articulatory or gestural explanation which acts as the affordance for the perceptuo-motor analogy behind these phonosemantic structures (5) Take phonotactic probabilities into account, rank phonosemantic mappings according to these probabilities, and finally conduct cross-linguistic comparison to determine the validity of mappings assigned lowest rankings.
We will now detail the motivations for each step here before illustrating them using Chaoyang ideophone codas.

4.2.1
Step 1: Establish an inventory and create subgroups The first step concerns the exclusive inventories of the lexical groups considered. In the case of this paper, the inventory is a database made up of ideophones. This is essentially a dataset from which the phonosemantic analysis will be drawn. This dataset should be uniform, i.e., all ideophones, and how the term ideophone is defined should follow a heuristic pattern. This heuristic is by and large language-specific and can be phonological or syntactic.
Ideophones are known to be formally different or marked thus distinguishing them from non-ideophone words (Childs 1988;Newman 2000;Ameka 2001;Bodomo 2006;Beck 2008;Ofori 2009; . The heuristic should correctly capture this formal difference. One example of a heuristic might be reduplication (see §5 for a complete heuristic to define ideophones in Chaoyang). If reduplication is required for ideophones of a given language, as it is for Chaoyang ideophones (Zhang 2016), we might use reduplication as our heuristic in addition to some other semantics-based (e.g., depictions only) or syntactic (e.g., adverbials only) factors.
Caution must be taken to ensure that all factors relevant to identifying a uniform set of ideophones are considered; otherwise prosaic adverbials may be inadvertently included in our dataset. In Bodomo's (2008) A Corpus of Cantonese Ideophones, for instance, several reduplicated adverbs were included owing to their superficial structural resemblance to actual ideophones. See example (1c) below. This means that, in Cantonese, reduplication alone is not enough to differentiate ideophones from adverbs. Instead, a more appropriate heuristic would be to limit the ideophone inventory to reduplicated expressions containing syllables which are seemingly meaningless in isolation. These meaningless syllables are often indicative of onomatopoeia or expressive language in Cantonese. Example (1a) contains reduplicated onomatopoeia (fu55 'sound of wind' or 'sound of blowing on hot food') which is otherwise meaningless when taken out of its trisyllabic context, while example (1b) is only meaningful in its four-syllable context. The syllables from example (1c) when taken out of context, however, still retain their meaning in isolation and non-adverbials (怪 kwa:i 33 'strange' > 奇怪 kei 31 kwa:i 33 'odd', 古怪 ku 35 kwa:i 33 'eccentric').
( Once a heuristic is set to qualify an ideophone inventory, then the next step is to divide the ideophone inventory into subgroups. The subgroups should serve some overall purpose, taking on a well-grounded hypothesis, again motivated by theoretical grounds. We take the position that ideophones should be subgrouped according to the semantic categories of Dingemanse's (2012) implicational hierarchy, e.g., sound < motion < visual patterns < other sensory perceptions < cognitive states. For example, the English ideophone boom could be subsumed under multiple categories for the following sensory meanings it entails, e.g., SOUND for 'sound of explosion,' MOTION for 'bursting motion', VISUAL for 'sudden appearance or execution', and COGNITIVE for 'surprise.' We propose that boom be analyzed according to each of these categories individually. And, crucially, SOUND meanings should be analyzed first of all since they represent a direct mapping of linguistic sound to real world sound (Dingemanse 2013).

4.2.2
Step 2: Identify phonological components (contrast, phonotactics, alternations) and establish the roots of pure phonosemantic structures (i.e., the phonological roots of ideophones) Although ideophones are imitative in nature, this does not mean that their phonological structure is purely determined by their referent. Firstly, ideophones are made up of sounds which are part of a language's phoneme inventory. Secondly, though their phonological structure may differ somewhat from their prosaic (non-ideophone) counterparts, general phonotactic principles are still adhered to (Childs 1988;Ofori 2009;Akita et al. 2013;Nasu 2015). And yet, there may be phonotactic regulations or phonological processes which only apply or must apply to the ideophones of a given language (Akita et al. 2013;Lai 2015;Nasu 2015;Tsou 2017; . It is important to rule out these phonotactic factors when determining which phonemes are phonosemantically meaningful. If the appearance of a sound is purely phonotactic, for instance, we do not want to confuse that with other phonemes which might be potentially meaningful via perceptuomotor analogies. Let us begin with how ideophones are faithful to their phoneme inventory and general phonotactic principles, yet differ somewhat from prosaic words in terms of phonotactic structure. For example, the Cantonese syllable /fiŋ/ is composed of legal phonemes while the sequence is traditionally considered illicit, to the extent that there is no historically corresponding orthographic form (Chinese character) for this syllable. However, as an ideophone, /fiŋ/ 'loose' (Bodomo 2008) is perfectly acceptable to native speakers. The same goes for Mandarin /pju/ 'manner of a small object such as a bullet or dart shooting through the air; whizzing sound,' /pja/ 'slapping sound,' /tuaŋ/ 'wobbliness; befuddlement' (Li 2007; . These ideophones, though traditionally considered illicit syllable structures, are still made up of legal phonemes. Moreover, these syllables still structurally resemble prosaic syllable structure in terms of what is allowed in the onset, nucleus, and coda position. It is merely the combination of these sounds into a single, coherent syllable that is illegal. 6 Other phonotactic principles are left unviolated. For example, consonant clusters are still unattested in Cantonese and Mandarin ideophones as well as prosaic words. Furthermore, we do not find vowels or consonants in Cantonese and Mandarin ideophones that are otherwise unattested in the phonological inventory of each language.
Identifying deviation from phonotactic norms is important because it speaks to potential imitative affordances which are perhaps necessary for depictive means. The motivation for violating a phonotactic norm is assumed to be depictive or imitative in nature, and indeed phonotactic violations are widely attested for expressive words (Childs 1988;Hinton et al. 1994;Ofori 2009;Nasu 2015;Nuckolls et al. 2016;Kwon 2018;. We should not eliminate these irregularities from our dataset. They should be kept for analysis. What should be eliminated from our dataset are segments whose appearance is predictable, regular, and therefore presumably not motivated by imitative affordances or depictive means. For example, in Chaoyang, the third onset in trisyllabic ideophones exhibits assimilation triggered by nasality: by default, the third onset is lateral unless the nasality spreads rightward across syllable boundaries from the preceding nasalized vowels. Compare examples (2) without nasal assimilation and (3) with nasal assimilation below.
(2) a. If the nasal assimilation in (3) were unaccounted for, then the nasal might be included in the final phonosemantic mapping as a result of the analysis. This is problematic because the third onset is not nasal for imitative reasons but for phonotactic reasons. Without taking this language-specific, phonotactically-motivated reduplication into account, then we would have to explain why /l/ and /n/ are associated to such a wide-range of imitative meanings, an otherwise erroneous conclusion. Eliminating phonotactic interference should lead to the establishment of phonological roots, i.e., the phonological content which is not phonotactically predictable in terms of phonological processes. In addition to third syllable reduplication, there is a pattern of partial reduplication where the onset of the first syllable is preserved but the nucleus becomes /i/ (Yip 2012). This first syllable partial reduplication must be eliminated as well. This leaves us with the second syllable which forms the phonological root of Chaoyang ideophones. Table 3 illustrates what we call "phonological roots", i.e., ideophones with predictable material removed. It is with phonological roots that phonosemantic mappings can be drawn up.

Step 3: Cross-check semantic relations using (near) minimal pairs
Minimal pairs reveal which phoneme(s) within the root create meaningful contrast. This harkens back to the originally stated phonosemantic hypothesis where "every phoneme is meaning-bearing". A contrast in meaning can tell something about what the presence (or absence) of a phoneme contributes to an ideophone's overall meaning. 7 However, not all minimal pairs are helpful in the initial stages of analysis. Take the following roots in example (4) from Chaoyang which are contrasted by vowel nasality.
From this simple example, one might tentatively conclude that nasality is meaningfully contrastive for these two ideophones. But it is not clear what exactly is being contrasted. And, more importantly, it is not clear how such a contrast could even be meaningful. One might propose some acoustic explanation, e.g., that nasality sounds "jumbled" or perhaps more "speech-like" given the involvement of nasal passages in speech production. But at this stage there is no real basis for such proposals except for what can perhaps be gathered from native speaker intuition. Though before we resort to asking native speakers how nasality 'sounds,' there is possibility for some semantic manoeuvring first.
The problem with comparing examples (4a) and (4b) lies in the semantic categories to which each word belongs. According to Dingemanse's (2012) implicational hierarchy, these two ideophones belong to SOUND (jumbled speech) and MOVEMENT (in a great hurry). Within these different semantic categories, the contrastive element, or nasality in this case, might operate (or be interpreted) according to different (category-specific) perceptuo-motor analogies. To see how nasality, or any other contrastive element, behaves phonosemantically, then comparisons between minimal pairs should be limited to those within a semantic category, i.e. SOUND, (5a) vs. (5b) and (5c) vs. (5b).  From these examples, one can definitely conclude that nasality is contrastive for ideophones within the semantic category SOUND. We could hypothesize that nasality is used to depict some kind of friction or turbulence which is perceived to be integral for meanings like (5a) 'crunching' or (5d) 'cymbals clanging' but not necessarily integral for (5b) 'smashing' or (5c) 'talking loudly.' Likewise, 'talking loudly' or 'smashing' does not necessarily resonate or denote a kind of vibration or turbulence. There are perhaps other perception-based reasons for this. For example, 'smashing' is much more punctual and audibly sharper, with little necessity of friction, than 'crunching crispy food'. While 'talking loudly,' though perhaps perceptually similar in amplitude to cymbals, does not exactly entail the turbulent resonance like that of the snare of cymbals. If other data points in the semantic category SOUND align with this mapping of 'nasality = friction or vibrating turbulence,' then we can expand our analysis to cover other semantic categories, like MOTION. It will expedite the progression of our analysis if we can first draw up a gestural-or articulatory-based explanation for whatever contrastive phonosemantic mapping has been tentatively proposed.

4.2.4
Step 4: Identify articulatory gestures acting as perceptuo-motor affordances Articulatory gestures should be physiologically accessible to all speakers. This follows our assumption that all spoken-language users have the ability to use or reach the anatomical components required of any articulation. Every language makes use of the lips and tongue. In addition to that, every language requires air to be constricted, with varying degrees, from partial to complete constrictions (Ladefoged & Maddieson 1996). Articulatory gestures should subsume language-specific phonological features and categories so that they only describe properties of articulation (tongue/lip movement, airflow) common across all languages. We therefore propose that, if properties of iconicity are truly universal, then the universally accessible properties (articulatory gestures) behind categorical linguistic units should bear the explanatory power for what perceptuo-motor affordances underpin iconicity and its notions of (analogical) depiction.
In Figure 1, the top layer of arrows and circles conveys the choreography of muscle movements inherent in producing speech which speakers may be aware of as a physical movement, i.e., raising the tongue, closing the lips, touching the teeth with the tongue etc. The middle layer illustrates how these muscle movements are condensed into languagespecific featural settings which thereby characterise the phonemes of the bottom layer. The goal here is to describe the muscle movements and choreography behind speech production (articulatory gestures) without being language-specific.
Complications may arise when attempting to qualify the articulatory gestures (and their perceptuo-motor affordances) that scaffold phonosemantic mappings. Below we give two examples of how an analysis might go astray when linguistic units are not properly broken down into universally accessible articulatory gestures and analysed accordingly. It is crucial that articulatory gestures be qualified only after all contrastive elements have been identified for the phonological roots of a given dataset. This way we can avoid complications like (1) creating overlapping phonosemantic mappings, or (2) making up mappings that, while convenient, in principle do not exist. Finally, there is also the important issue of many-to-many form-meaning mappings.
Complication (1) might occur if we fail to extrapolate meaning beyond the phoneme level shown in Figure 1. Take examples (6) and (7) below, if we overlook the phonotactic constraint in English that coda /ŋ/ never follows /u/, then /ŋ/ and /m/ appear superficially independent of one another (despite semantic overlap). We might therefore mistakenly assign /ŋ/ and /m/ each an independent phonosemantic mapping or even assign redundant mappings without much justification. This can be avoided if we move beyond the phoneme level and look at broader articulatory factors as Figure 1 asserts. In reality, /ŋ/ and /m/ are just two phonotactically-governed (and therefore English-specific) realizations of a single phonosemantic mapping, i.e. 'nasal coda depicts a resonant or vibratory ending'.  (2) might arise if the appearance of some phonemes is in fact the result of a phonological alternation. This is demonstrated by Chaoyang nasals and plosives in coda position, whereby nasals cannot co-occur with checked tones and are thus realized as homorganic stops (/ŋ/ > [k]). In this case, our phonosemantic analysis might want to give more explanatory power to the articulatory gestures behind the phonological  . We would have to ask whether these segments in complementary distribution actually correspond to a difference in semantics (which therefore can be explained with contrasting properties of articulatory gesture between these segments). That is to say, do Chaoyang nasals and stops in coda position need to be treated as separate phonosemantic mappings or are they just the same, single phonosemantic mapping whose realization is governed by an arbitrary and languagespecific phonological alternation (like the aforementioned /ŋ/ and /m/ for English)? It would be convenient to create two separate mappings. Whether two separate mappings can be justified depends on the meanings of the relevant roots.
Essentially, problems occur if the segment level is taken as a string of literal symbols with no regard for the articulatory properties which they embody. It is important to list all the contrastive elements of roots in a dataset in order to (1) move beyond the phonemes and phonological features, (2) qualify the articulatory gestures they embody, before finally proposing phonosemantic mappings.
There also is the issue of many-to-many possibilities of form-meaning mappings (Dingemanse 2018), which is a partial explanation for apparent mismatches attested in mappings across languages: a given articulatory gesture may afford the iconic expression of multiple possible meanings and vice versa; a given concept can be iconically expressed with multiple articulatory gestures. Our methodology allows for many-to-many form-meaning mappings to be explained by identifying analogical extrapolation of one form-meaning mapping from one semantic category to another (cf. §5.2 for detailed discussion) and positing articulatory gestures which support them. Having a smaller phoneme inventory size, e.g., Japanese, might also contribute to the issue of many-to-many form-to-meaning mappings. Shared articulatory gestures across meanings can help us to explain why a given phoneme might be overloaded with many phonosemantic mappings.
To demonstrate how articulatory gestures ought to be proposed as perceptuo-motor affordances, recall examples (5a-d) regarding nasality as a contrastive element in the vowels of some Chaoyang ideophones. We proposed that vowel nasality acts as a phonosemantic mapping which conveys some kind of friction or turbulence integral to the meaning of some Chaoyang ideophones. Indeed, the articulatory properties of nasal vowels seem to lend themselves to this analogical proposal: in addition to the velum being lowered so that air may pass through the nasal cavity, nasal vowels are characterized by an increased amount of airflow when compared to their corresponding oral vowels (Ladefoged & Maddieson 1996: 298). This increased amount of airflow, combined with the additional articulatory effort of lowering the velum, is a good candidate for such an affordance of turbulence since it is measurable and directly comparable to other (oral) vowels. It is worth mentioning that nasals can also be characterized by a degree of friction due to a narrowed velic opening (Ladefoged & Maddieson 1996: 103), an (optional) articulatory property which may also lend itself as a perceptuo-motor affordance for the phonosemantic mapping proposed above. Therefore, one might propose that the increased amount of airflow is seen as imitative of (or analogous to) the turbulence or friction inherent to the meaning of said Chaoyang ideophones. This articulatory gesture lends itself well to the scaffolding of perceptuo-motor affordances which non-native speakers could latch onto, given the universally accessible perception of increased airflow during the articulation of nasal vowels. This [±airflow] affordance can be tested further by (1) seeing whether increased airflow corresponds to similar meanings in ideophones cross-linguistically, and (2) seeing whether speakers create novel ideophones using this articulatory property.
One might ask about the significance of articulatory properties that span multiple phonemes. For an intra-linguistic comparison, this would imply that phonemes with articulatory properties in common are somehow related in terms of what perceptuo-motor analogies they map onto for that language (discussed in §5.2 using Chaoyang examples). If two phonemes are in complimentary distribution, and these phonemes share articulatory properties, we might need native speaker input to explain the distribution of these phonemes within a semantic category (if it is not already explainable based on distributions across semantic categories of ideophones in the data). Recall, from examples (6) and (7), that phonotactics can also explain the distribution of phonemes with shared articulatory properties. For cross-linguistic comparison, the fact that articulatory gestures can span multiple phonemes is precisely how we explain phonosemantic relations between semantically-similar ideophones from unrelated languages. In other words, we can explain why the Japanese ideophone for a doorbell chime is pin-pon but the English equivalent is ding-dong (cf. §6 Figure 4).

4.2.5
Step 5: Calculate phonological probabilities, make rankings, and compare cross-linguistically The appearance of phonemes in certain syllable positions can be statistically predictable even if it is not phonologically conditioned by rules. When proposing phonosemantic mappings, it is important to check that these mappings are not highly predictable as speakers are known to have an implicit knowledge of this predictability. To do this, we should calculate positional and transitional phonotactic probabilities, both of which have been empirically shown to be psychologically accessible (Bailey & Hahn 2001;Greenberg & Jenkins 1964;Ohala & Ohala 1986). Positional phonotactic probabilities refers to the statistical likelihood of X phoneme appearing in Y syllable position. Transitional phonotactic probabilities refers to the statistical likelihood of X phoneme preceding or following Y phoneme. Proposed phonosemantic mappings should be ranked according to these probabilities, and finally, using cross-linguistic comparison, to determine the validity of mappings assigned with lowest rankings.
It is important to note that choice of phoneme is too language-specific to compare across languages. This is because we know that different languages use different sounds (or combinations of sounds) to phonologically encode an articulatory-based analogy. Instead, it is the articulatory features which ground phonosemantic mappings in their perceptuo-motor analogy which should be compared, like [±nasal] as opposed to [ã ĩ ũ ẽ õ]. Articulatory-based explanations should speak to the universality which underlies iconicity. The phonosemantic mappings which they derive should be tested against positional and transitional phonotactic probabilities to see if they are statistically predictable based on the environments in which they occur in a given language (e.g., whether /k/ statistically predictable after /o/). In terms of iconic status, statistically predictable segments should be ranked lower than those which are not (as) predictable.
To contend with universality, iconically low-ranked segments should be cross-checked with articulatory-based explanations from other languages. For example, if nasal vowels are highly predictable after palatalized affricates in Chaoyang, nasality might be ruled out as a purely phonological element in this environment as opposed to an iconic one due to its low ranking for iconicity. In order to completely rule out nasality after palatalized affricates, we should see how the proposed phonosemantic mapping of nasality behaves in ideophones of other languages. If nasality is indeed attested in other unrelated languages with correspondences semantically related to those of Chaoyang, then we cannot rule out iconicity as a motivating factor despite its statistical probability.

Demonstrating the methodology: Analysis of Chaoyang ideophone codas
According to Zhang (2016), the phonological inventory of Chaoyang contains the following consonants: /p p h b m w t t h ts ts h n s z l j k k h g ŋ ʔ h/. Among these, /b g z/ are absent from Chaoyang ideophones. Only /p m k ŋ ʔ/ can appear in coda position. 9 Like other Sinitic languages, consonant clusters are not allowed in Chaoyang syllable structure. Stop codas are only allowed in "entering" or "checked" toned syllables which are characterized as shorter in duration than other syllables. Bilabial stop codas are in complementary distribution, as are velar stop codas, /tam/ with /tap/ and /taŋ/ with /tak/. Likewise, empty codas are in complementary distribution with glottal stops in checked toned syllables, e.g., /ta/ > /taʔ/. It should be noted that these are not realized as part of any productive morphophonological alternation for Chaoyang but are simply a phonotactic constraint on the compatibility of tones and segments in the syllable. Moreover, tone sandhi (tone change) does not occur for ideophones even though it is required for prosaic words. All syllables within a Chaoyang ideophone bear the same tone. Finally, Chaoyang ideophones are systematically realized as low which allows for checked syllables, and thus plosive codas, to occur. This leads us to the six heuristics for identifying Chaoyang ideophones listed in example (1) below.
(1) a. No /b, z, g/ in ideophones. b. No tone sandhi. c. No dipping tone (a.k.a. fall-rise tone). d. Reduplication is required by all ideophones, disyllabic reduplication being the default. e. If trisyllabic reduplication occurs, then the onset of the third syllable is /l/ unless the previous consonant is nasal, then nasal assimilation occurs. If the preceding onsets are nasal then the third onset must be identical. If the preceding onsets are not nasal but precede a nasal vowel, then the third onset must be /n/. f. Most ideophones are in low tone (checked or unchecked).
Using the five steps already outlined in §4, we will demonstrate how the proposed methodology works with the case study of Chaoyang ideophone codas. Analysing all the ideophones of Chaoyang is beyond the scope of this section. We instead focus on codas of Chaoyang ideophones to narrow the scope of the analysis. Theoretically, this is a valid subgroup to focus on with the assumption that syllable codas phonosemantically depict endings of events or percepts (ideophone referents). In this way, the first step of the methodology is already complete. Our initial inventory is the entire set of Chaoyang ideophones, totalling to 248 words (Zhang 2016). Our subgrouping is based on the phonological heuristic of codas. This means that we will narrow our dataset down to 151 ideophones by eliminating those with empty codas. (Here we assume that empty codas do not necessarily encode a perceptual ending). Further subgrouping is done so that ideophones belonging to the SOUND semantic domain are analysed first, that is all ideophones which depict sound, i.e., onomatopoeia. Non-auditory ideophones should be ana-lysed subsequently, once perceptuo-motor analogies can be established (via articulatory gestures) for the SOUND subgroup.
The second step calls for the elimination of any segments in the dataset which are derived by phonotactic restrictions and can be explained away with phonotactic analysis. First we can eliminate third syllables, as these always contain reduplicated (and therefore redundant) phonemes which have also undergone long-distance nasal assimilation in onset position (Yip 2010;Zhang 2016). This is also applied to any fully reduplicated disyllabic ideophones, so that only one syllable is analysed. The same holds for partially reduplicated ideophones whereby the nucleus and coda do not exhibit full reduplication (an element of the vowel is missing but not replaced by an entirely different segment, cf. third syllable reduplication vs. partial reduplication in Table 4 below). To identify which portions of the syllables are the results of phonological reduplication, and thus are redundant to phonosemantic mappings, it is fundamental to have proper phonological descriptions of the languages concerned, suggesting the urgent need for core phonological-grounding when it comes to iconicity.
After reduplicated forms have been eliminated from the original set of 151 ideophones with codas, we are left with 129 phonological roots. 93 phonological roots belong to the semantic category of SOUND. Multiple reduplicated forms were collapsed into a single phonological root. For example, we originally counted both reduplicated forms /ŋi.ŋãuʔ/ and /ŋãuʔ.ŋãuʔ/ for 'unwilling and muttering' as part of the total 151 ideophones with codas. After eliminating reduplicated forms, both /ŋi.ŋãuʔ/ and /ŋãuʔ.ŋãuʔ/ were collapsed into the single phonological root /ŋãuʔ/ counted as part of the total 129 phonological roots. However, if a single phonological root spanned multiple meanings, then it was counted according to the number of meanings it corresponded to. For example, /tom/ was counted four times because it spans two semantic categories and depicts four distinct meanings: /tom/ SOUND 'sound of wading through water,' /tom/ SOUND 'sound of beating a drum,' /tom/ SOUND/MOTION 'water sloshing in a container,' and /tom/ COGNITIVE 'nosiness.' Note that tables throughout the remaining sections do not reflect the counting of phonological roots according to multiple meanings.
Candidates for the analysis are still subject to phonological regulations of the language including regulations on contrast, phonotactics, and alternations. As mentioned in the previous section, Chaoyang nasals and plosives in coda position are in complementary distribution: whereby nasals are in complementary distribution with homorganic stops in checked tone syllables, e.g., /taŋ/ is always [-checked tone] while /tak/ is always [+checked tone]. In this case, the analysis might want to give more explanatory power to the feature which is preserved, e.g. [±velar], as opposed to the feature(s) which is/are contrastive (e.g.,

[+velar] [+nasal] [+sonorant] > [+velar] [-nasal] [-sonorant]) or vice versa.
To ascertain this, we have to ask whether the difference in features does lead to a difference in phonosemantic mappings. This brings us to the third step where we determine the contrastive elements, i.e., whether some segmental elements are semantically contrastive. As Table 5 indicates, although there are gaps in syllable structure (indicated by asterisks), all possible codas /p k ʔ m ŋ/ can behave as contrastive elements. When a series of syllables are completely absent then near minimal pairs are given. Overall, it seems there is no pattern to capture all the gaps in the data. For example, we cannot say that if a labial coda or nasal coda is phonosemantically contrastive then its homorganic counterpart must also be contrastive. We can only conclude that if /k/ is not phonosemantically contrastive, then /ŋ/ is not contrastive either (n.b., the reverse is not true). But it is not clear how this observation for phonosemantically contrastive patterns is phonologically motivated given that /ŋ/ is in complementary distribution with /k/ in checked tone syllables. A similar observation for /p/ and /m/ is almost possible except for *khop and /khom/. Since no phonological alternation is evident for their distribution, what this means is that all codas can proceed as part of the phonological roots for phonosemantic analysis.
When the roots are grouped together by coda /p k ʔ m ŋ/, a general semantic pattern emerges. Please see Supplementary Materials for all phonological roots listed according to coda. Chaoyang ideophone roots ending in /k/ denote SOUND events which have relatively sharp, loud, and punctual endings (gunfire, metal hammering, clacking of an abacus) which are necessarily integral to their perceptual meaning and referent. Ideophone roots ending in /ŋ/ also denote events which have relatively sharp, loud endings but, instead of being punctual, one can propose that these events exhibit a somewhat longer offset of perceptual resonance than their homorganic coda counterparts (firecrackers, canon fire, ringing in the ears). Ideophone roots ending in /p/ denote SOUND events which involve endings where two surfaces or objects come into contact (water drops hitting the ground, the jaws whilst chewing, bubbles forming then popping then forming again), however the amplitude of that contact is not necessarily integral to their perceptual meaning and referent. Ideophone roots ending in /m/ also denote SOUND events which involve endings where two surfaces or objects come into contact: beating drums, water moving from side to side in a container. However, they exhibit a somewhat longer offset or perhaps louder perceptual resonance than their homorganic coda counterparts. Finally, we have the glottal stop which makes up the highest token count (44 roots) in the Chaoyang ideophone inventory. Ideophone roots ending in /ʔ/ denote SOUND events which end inaudibly but perhaps perceptually punctual in aspect. The glottal stop coda implies that the event is not continuous, but temporally rather short, and additionally the ending is not audibly executed. If a speaker wants to make an open syllable audibly shorter in duration without adding any articulatory-derived phonosemantic connotations from the labial or velar codas, then insert glottal stop. In effect, the glottal stop is the most suitable stop if an event is necessarily perceived as short or having a basic or unadorned ending, i.e., not an ending implying two surfaces or objects coming into contact = /p/, not an ending implying sharp or forceful perceptual nature = /k/. The glottal stop encodes the least specific, or least depictive, ending of the codas in Chaoyang. This is perhaps why it is so ubiquitous. Not all events can be categorised into those with endings as depicted by codas /p k m ŋ/.

Consolidating meanings into phonosemantic mappings
This section focuses purely on how to derive phonosemantic mappings. Table 6 shows all phonological roots for Chaoyang ideophones with /k/ in coda position. These datapoints are provided for the reader to work through the flow of Figure 2 below. All phonological roots with codas are available in the Supplementary Materials. The first through sixth steps in Figure 2 demonstrate the analysis required to propose a phonosemantic mapping. These include (1) choosing the semantic category of phonological roots to serve as the dataset, (2) selecting which syllable position to analyze, (3) selecting which segment to analyze from that syllable position, (4) determining semantic relations between the relevant phonological roots, (5) creating a falsifiable mapping, and (6) finally cross-checking this mapping against other segments. The seventh and eighth steps are relevant to justifying that phonosemantic mapping with articulatory gestures (cf. §5.2). Note that while Figure 2 takes the semantic category of SOUND as its example dataset, the same steps apply to all semantic categories. Table 7 below shows the phonosemantic mappings we propose for the codas of Chaoyang SOUND ideophones following the steps laid out in Figure 2. Please consult the Supplementary Materials for all phonological roots with codas /p m ŋ ʔ/ to see how their phonosemantic mappings in Table 7 are derived. The shared articulatory features between Chaoyang coda velars (/k/ and /ŋ/) and between coda labials (/p/ and /m/) supports the notion of their relative semantic overlap. Velars /k/ and /ŋ/ denote similar types of endings but differ in how perceptually punctual they are. The same goes for /p/ and /m/. It is interesting to note, however, that the nasals /ŋ/ and /m/ have some intuitively synonymous ideophone roots, e.g., 'sound of canon fire' -/kom/ and /kuaŋ/, while the stops /k/ and /p/ have the perhaps less intuitive 'of squeezing a raddish' -/khiak/ and /ĩʔ.ãp/.
Now that the phonosemantic mappings have been established, we can move on to the fourth step to propose the articulatory gestures which act as the perceptuo-motor affordances for the mappings in Table 7. 10 (1) Group the phonological roots which fall under the semantic category of SOUND. (2) Choose a syllable position to analyze, e.g., coda.
(3) Choose a coda segment, e.g., /k/; (4) Group all SOUND phonological roots with /k/ in coda position, determine the shared semantic characteristics across this /k/ subgroup. (5) Condense the shared characteristics into a falsifiable mapping, e.g., whether or not a percept is inherently loud and has a punctual ending can be disproven. (6) Cross-check this mapping with other codas and determine whether it applies only to /k/ or other phonemes as well. If only applicable to /k/, there is a good chance that this mapping is valid. See Step 7: If the mapping is indeed applicable to other phonemes, then the mapping should be captured by articulatory features shared across the phonemes, e.g., /p t k/ = [-airflow]. However, if these phonemes do not share articulatory features, then the mapping is invalid.
(7) Break /k/ down into articulatory features. At least one of these should analogically support the mapping from the fifth step. This is the crucial feature which ultimately supports the perceptuo-motor analogy behind the mapping. (8) Cross-check the distribution of the crucial feature(s) in other codas. If the /k/ mapping is correct, the crucial feature(s) should not support perceptuo-motor analogies for other codas in this semantic domain.

Using perceptuo-motor analogies to justify phonosemantic mappings of Chaoyang codas
This section refers to the seventh and eighth steps illustrated in Figure 2. We start with the labial codas as they are perhaps the most straightforward. Both /p/ and /m/ in coda position have mappings which denote two surfaces or objects coming into contact. The lips, and their articulation of opening and closing, provide a simple perceptuo-motor analogy for this aspect of the phonosemantic mapping. For now, we can call the perceptuo-motor affordance behind this phonosemantic mapping: [+oral contact] -as it involves contact made by articulators (lips and tongue respectively).
In contrast with the velar codas /k ŋ/, the labial codas denote referents of louder or sharper endings which involve some kind of sudden or explosive movement (e.g., bursting, cracking, gunfire). That is to say, there seems to be a difference in perceptual intensity of an ideophone's referent distinguished by these codas, whereby velars are most intense: velars (explosions) > labials (beating drums) > glottal (knuckles popping, farting) > empty coda. From an articulatory standpoint, this contrast in referential intensity can be mapped through perceptuo-motor analogy: the degree of articulatory contact made by the coda is analogical to the degree of acoustic intensity denoted by the referent. Velars by nature require the tongue leave the resting position, raise the tongue body, make contact with the soft palate, and then return to resting position. Of course, in labials /p m/ there is no tongue movement required -the tongue is in resting position -and labial articulation in coda position only requires that the lips come to a close -another resting position. Though tongue resting positions are known to vary cross-linguistically (Gick et al. 2004), we assume that resting position is perceptually interpretable as inactivity of the tongue. For now, we can minimally label the perceptuo-motor affordances behind this phonosemantic mapping: [+oral contact], [-tongue resting] -as it involves contact with the articulators which requires them to be in a non-neutral position.
Nasals /m ŋ/ in coda position denote different meanings overall but are similar in that they both denote somewhat longer offset of perceptual resonance. This longer offset mapping is derived from the voiced nature of these nasal codas. That is to say, the vibration of the vocal folds differentiates them from their voiceless non-nasal counterparts /k p/. But it is not voicing alone which allows for the perceptuo-motor affordance behind /m ŋ/. The [+sonorant] nature of /ŋ m/ allows for continuous airflow from the nasal passage, meaning the codas of these ideophones can also be expressively extended, e.g., [tom:::] or [kuaŋ:::], a phenomenon known to occur cross-linguistically (Ofori 2009;Nuckolls et al. 2016), perhaps to mimic a lasting resonant nature of their referents, which stop codas cannot otherwise achieve. Since voicing and sonorancy are inherent to nasals, the perceptuo-motor affordance can simply be labelled [±nasal airflow]   (not as punctual as /-k/) -p two surfaces or objects come into contact punctually, but amplitude not necessarily integral -m two surfaces or objects come into contact but exhibit a longer offset (than /-p/) -ʔ events which end inaudibly but punctually Glottal stop /ʔ/ in coda position denotes referents with a perceivable (and necessary) ending but an ending that in itself is not audible. Endings of such referents are equivalent to an abrupt cutting off of audible sound, i.e., preventing the vowel from continuing until air flow restrictions would permit otherwise. From a (naïve) articulatory standpoint, it seems that, during a glottal closure no articulators are moving, but only that airflow has been blocked. That is to say, for the glottal stop, the tongue remains in resting position while the lips can remain open or as they were during the articulation of the nucleus of the syllable. No obvious visual cues (lips, jaws) are available. Tactile cues are also not quite as salient because the tongue remains in resting position. All these points can be posited as contributing to the perceptuo-motor affordances behind the glottal stop and its phonosemantic mapping. Firstly, referents depicted by glottal stops in coda position are thus punctual in nature only -they cannot perceptually endure, last, or continue -as is true for the articulatory nature of plosives /p/ and /k/ (but unlike nasal codas which can endure). However, unlike /p/ and /k/, referents denoted by glottal stops do not require any form of contact or collision (sharp, loud or otherwise), they simply end. This is encoded via perceptuo-motor affordances of (1) no oral contact made with the articulators, and (2) the tongue remaining in resting position. Therefore, in contrast to all the other stops, we can label the perceptuo-motor affordances of the glottal stop as [-oral contact], [+tongue resting], [-nasal airflow].
It is important to note the assumption here that the general articulatory nature of stop, plus its occurrence at the end of a syllable, denotes a referent with a perceptual endpoint or terminus. This is not a new phonosemantic idea (Hinton et al. 1994). What it means is that access to perceptuo-motor affordances is dependent on articulatory properties and their arrangement according to syllable structure. While the phonosemantic mappings of nasal codas were also discussed in terms of perceptual endpoints, articulatory properties were used to argue how such a perceptuo-motor affordance of "endpoint" could be accessed in relation to other (articulatorily similar) consonants in coda position. One might argue that the nasal codas discussed above are in fact not true endpoints because they do not cut off airflow completely and can be articulated to last for expressive purposes e.g., [tom:::] or [kuaŋ:::]. While this is true, it is a narrow line of argument which takes the nasals out of their articulatory and semantic context for Chaoyang. The nasal airflow arguably adds a semantically-attested perceptuo-motor quality of resonance to an articulation otherwise found in phonemically-related stops. One could go a step further by comparing Chaoyang ideophones with codas to those without, but that is beyond the scope of this paper.

Phonosemantic mappings beyond auditory meanings
A phonosemantic analysis should start with the most fundamental semantic category of ideophones on Dingemanse's (2012) implicational hierarchy: SOUND. Ideophones belonging to SOUND, i.e., auditory ideophones, should be the most straightforward to analyze because they are unimodal: speech sounds imitating real-world sounds (Dingemanse 2013). The unimodal nature of SOUND provides a direct form of structure-mapping from which other semantic categories can build upon whilst retaining nuances of perceived imitativeness (cf. Emmorey 2014 for different kinds of structure-mapping). For this reason, our methodology assumes that mappings in one semantic category inform mappings from other categories through analogical extrapolation. This kind of analogical extrapolation is easily observable at the word-level, e.g., boom 'sound of explosion' (SOUND) > boom 'sudden appearance out of nowhere (as an explosion visually conveys)' (MOTION, VISUAL). This suggests that the perceptuo-motor analogies encapsulated by the phonosemantic mappings for boom (SOUND) are likewise accessible to boom (MOTION, VISUAL). In short, SOUND ideophones are analyzed first because the phonosemantic mappings therein should, later, help to identify and explain phonosemantic mappings for other categories, e.g., MOTION, VISUAL, SENSORY. Throughout the rest of this section, analogical extrapolation at the phoneme-level is explained using several semantic categories from Chaoyang ideophones.
Since all ideophones are in some way structure-mapping, different semantic categories of ideophones should therefore denote structures of meaning, of events. In turn, these various structures belonging to various categories are mapped by phonemes which are distributed across all semantic categories, e.g., Chaoyang /p/ is attested in ideophones from all semantic categories. Seeing that semantic categories possess different meaning structures, it is then possible for phonosemantic mappings to be specific to semantic categories as well as syllable position within a category. For example, in the semantic category of SOUND in Chaoyang ideophones, the /p ONSET / could mean 'bursting' or 'release of pressure' while the /p CODA / means 'two surfaces or objects come into contact punctually' (cf. Table 7). For categories other than SOUND, the phonosemantic mappings could be different but should at least be relatable through analogical extrapolation. 11 For example, in the semantic category of MOTION, /p ONSET / could mean 'two surfaces coming apart' which is acceptable since this mapping is entailed by its SOUND counterpart 'bursting' as well as the result of its other meaning 'release of pressure.' In other words, our methodology does not allow for /p ONSET / to mean 'bursting' or 'release of pressure' for SOUND whilst 'light, feathery contact' for MOTION. Moreover, the validity of any analogical extrapolations can be tested through empirical research and native speaker input.
Our phonosemantic mappings in Table 7 were based on SOUND ideophones. But now we must explain how our phonosemantic mappings can be applied to ideophones from MOTION, SENSORY, COGNITIVE semantic categories. Of all the codas in Chaoyang, /ʔ/ is the most common across MOTION, SENSORY, and COGNITIVE categories. The question is then whether our phonosemantic mapping from Table 7 is still relevant. Given that our phonosemantic mapping for /ʔ/ is applicable to 'events which end inaudibly,' our articulatory explanation can still apply. What we need to reconcile is the "punctual" (or non-lasting) aspect of the phonosemantic mapping for /ʔ/. This is best done by looking at the non-auditory ideophones (Table 8) according to their semantic category. By looking at the MOTION ideophones first, we can observe a shared semantic characteristic of suddenness or abruptness and this becomes more apparent if we consider which sort of gestures might occur with each ideophone (e.g., a back and forth movement, a pell-mell or zigzagging manner of movement, a jerking movement of a squeeze or squish). Suddenness or abruptness is indeed punctual and need not be audible. For MOTION ideophones, this phonosemantic mapping seems to hold for now.
For SENSORY and COGNITIVE ideophones, 'suddenness' as a phonosemantic mapping for /ʔ/ in coda position is not as straightforward. The difficulty here is perhaps because the further along Dingemanse's (2012) continuum we move away from SOUND (> MOTION > SENSORY > COGNITION) the further away from unimodal depiction the nature of perceptuo-motor analogy becomes. Sound lends itself well to depicting SOUND due to an inherently unimodal semantic nature (Dingemanse 2013). MOTION, while multi-modal in nature, can still be understood as resulting in sound(s), thus only adding a thin layer of analogical complexity for depiction. For example, alternation of vowels can indicate an alternating or fluctuating motion, e.g., English zigzag, ding-dong, tick-tock, splish-splash. Likewise, English boom is the sound as a result of an explosion but is also used to depict something appearing or occurring suddenly, like an explosion would, albeit without any auditory implications. For SENSORY ideophones in Table 8, we can propose that 'intense pain,' 'manner of being startled,' 'manner of being stunned,' 'manner of something going rotten' 12 are all of a telic and therefore punctual nature. Their occurrence or the perception thereof is sudden and requires no internal process. The total blockage of airflow in /ʔ/ is perhaps the most important articulatory gesture which affords the perceptuo-motor analogy of 'suddenness.' We would thus expect other stop codas in Chaoyang to create similar perceptuo-motor analogies (due to their shared articulatory gesture of [-airflow]) for SENSORY ideophones. And they do -/ip/ 'a dull ache' and /pok/ 'throbbing manner of a headache' are the only SENSORY ideophones in Chaoyang which end with stops other than /ʔ/. Again, as Figure 1 illustrates, this is a case where we need to move below the phoneme level, to articulatory gestures shared between phonemes, in order to propose an explanation for how a perceptuo-motor analogy enables a phonosemantic mapping to take hold.
COGNITIVE ideophones pose a much greater problem as there is less of a straightforward (or resultative) sound related to the depiction of a cognitive percept. 13 It is difficult to explain how 'punctuality' or 'suddenness' applies to ideophones like 'cheerfulness,' 'showing off,' 'aloofness,' or 'apathy.' To tackle this problem, we should first look at ideophones with similar syllable structure (i.e., homophones or minimal pairs). This brings us back to the analogical process behind the semantics of English onomatopoeia boom = sound of explosion > (explosions occur suddenly) > end result: a non-auditory ideophone depicting a sudden, instantaneous, or unexpected occurrence. As Table 9 shows, for 'cheerfulness' and 'apathy' there are homophone and minimal pair SOUND ideophones which bear analogical semantic resemblance. For 'showing off' and 'aloofness' we have no homophones or minimal pairs, therefore we must rely on SOUND ideophones which entail similar articulatory gestures, i.e., [+nasal airflow], to support any analogical relation. We might also want to ask native speakers directly about their intuitions regarding the 12 One might argue that rotting is more a process than a punctual event. While we agree with this statement, this ideophone seems depictive of the sudden sensation of something having gone rotten, as when one takes a whiff of milk to see whether it is still good to drink and finds that it is well past its expiry date. 13 We could look at what hand gestures speakers use to accompany these ideophones. It is possible that COGNITIVE ideophones are in fact MOTION ideophones related to hand gestures that are associated with COGNITIVE meanings. depictive nature of these ideophones. Native speaker intuitions (including their accompanying hand gestures) might tell us something that the phonological structure cannot tell us on its own. In short, explaining the phonosemantic structure of COGNITIVE ideophones in Chaoyang relies on (1) analogical comparisons at the word-level using homophones from lower-level semantic categories, i.e., SOUND, MOTION or (2) seeing how the relevant phonemes pattern in lower-level semantic categories if no direct homophones are available. Figure 3 illustrates how different semantic categories of ideophones can utilize different articulatory gestures to scaffold their perceptuo-motor analogy. SOUND ideophones require all articulatory features listed in the dashed bundle embodied by /ʔ/ to retain its phonosemantic mapping in coda position: inaudible, punctual ending. SENSORY ideophones need only one feature ([-air flow]) in the bundle to retain their phonosemantic mapping: suddenness. This means that semantically similar SENSORY ideophones can also map to other phonemes which embody the same articulatory feature [-air flow] in coda position, as discussed earlier. Finally, COGNITIVE ideophones with /ʔ/ in coda position present no direct phonosemantic relationship to the articulatory features embodied by /ʔ/. Instead, COGNITIVE ideophones retain their depictive nature through analogical relations to the structure of other ideophones (entire syllables, partial syllables, or shared articulatory features). It is thus possible that COGNITIVE ideophones obtain their depictive nature through lexical association to other imitative words which in turn derive their own depictive nature from perceptuo-motor analogies via articulatory gesture. However we can still explain COGNITIVE ideophones using articulatory features. In order to do this, we must first look at other ideophones structurally resembling said COGNITIVE ideophones as we have shown in Table 9. It is therefore necessary that COGNITIVE ideophones be phonosemantically analyzed last of all semantic categories. At the very least, we cannot begin to explain the phonosemantic structure of COGNITIVE ideophones for a given language until all SOUND ideophones have been analysed according to their articulatory features first.

Using phonological probability measures to test the validity of phonosemantic mappings and their perceptuo-motor analogies
The purpose of this section is to see whether the phonological structure of ideophones is (partially) predictable based on the probabilistic phonological patterns of the lexicon. We calculated the statistical profiles of Chaoyang syllables in two ways: (1) positional probability of phonemes, and (2) transitional probabilities of the phonemes assuming bigram probability. When measuring these two probabilities we further assumed a phonological hierarchy of words, i.e., words are organized according to onset, nucleus, and coda. Probabilities were calculated using all syllables from the Xin Chaoshan Zidian (2015) counted according to type frequency (for Chaoyang readings) which totalled to 18,892 monosyllabic words. 15 These syllables were coded in according to the syllabic structure of words, onset, nucleus, coda, and tone. For our purposes here, tone was ignored because of its highly predictable nature in Chaoyang ideophones (most ideophones being systematically low tone). The monosyllabic words in our dataset and the complete results of positional and transitional probabilities based on them are provided in Supplementary Materials. The positional probability of phonemes was calculated according to onset, nucleus, and coda. Since we are demonstrating our methodology focusing on codas of Chaoyang ideophones, only the positional probability of coda is relevant here. Table 10 compares the positional probability of the coda at phoneme level among prosaic words versus ideophones. Table 10 excludes phonemes and features with zero probability in the coda position.
The transitional probability of phonemes was calculated according to bigram probability of a nucleus followed by all of the attested codas, /ŋ/, /k/, /m/, /ʔ/, and /p/. Table 11 shows the results. What is most notable about these findings is that none of the non-zero bigram probabilities are attested in nucleus-coda sequences in Chaoyang ideophones. This would indicate that codas of Chaoyang ideophones are not statistically predictable based on the preceding vowel. That is to say, the realization of a coda in Chaoyang ideophones is not readily determined by what precedes it. [+high] etc., were involved in the most frequent codas found among ideophones. For transitional probabilities, our results showed that the likelihood of having consonants with [+back] or [+high] features is very high regardless of preceding vowels. However, this is not the case among ideophones, since the most frequent codas such as /ʔ/ and /p/ do not involve such features. This means that while Chaoyang ideophones are marked by certain systematic traits (e.g., predominantly low tone, exempt from tone sandhi, lacking /b, z, g/), their syllable structure is no different from the rest of the language. This means that Chaoyang ideophones do not adhere to a marked syllable structure -a factor which might have explained why codas are not predictable based positional or transitional probabilities.   Taken together, our results show that the codas of Chaoyang ideophones are arranged for reasons other than positional or transitional probability.

Implications
By proposing physiologically accessible articulatory movements as features, e.g. [±oral contact], [±tongue resting], [±airflow], [±nasal], it is possible to make concise comparisons of ideophone phonosemantic structure across languages and methodologies. We can see (1) whether ideophones in other languages also make use of these features in the same imitative ways as Chaoyang, and (2) whether participants rely on these perceptuo-motor features when creating novel iconic forms during psycholinguistic experiments (Imai et al. 2008;Assaneo et al. 2011;Perlman & Lupyan 2018;Taitz et al. 2018). In addition, we can now formulate a tentative, yet clear, and testable hypothesis for why ideophones are easier to learn despite a speaker's language background.
Phonosemantic mappings make up the structure of ideophones. Phonosemantic mappings derive their meaning through perceptuo-motor affordances. Perceptuo-motor affordances are not phonology-or language-specific, but are accessible to all speakers because they are derived from bundles of physiologically accessible articulatory properties (e.g., tongue in resting position, oral contact, airflow through the nose, glottal closure etc.), albeit arranged according to a language-specific syllable structure. These bundles provide perceptual or sensory information which are in turn seen as analogous to their referent. And, in turn, making these sensory-referent connections is possible because creating analogical relations is an inherent cognitive ability not exclusive to language.
For example, even though Dutch and Japanese sound systems and phonological inventories differ considerably, Dutch speakers were able to learn Japanese ideophones better when presented with their congruent Dutch meaning than their incongruent Dutch meaning (Lockwood et al. 2016), e.g., Japanese kibi-kibi was better learnt as its true meaning of 'energetic' than the opposite meaning of 'lifeless'. According to our hypothesis, this is because Dutch speakers were able to first sidestep phonological differences between Japanese and Dutch, and then access and relate the fundamental articulatory properties of the Japanese ideophone to aspects of its congruent meaning. That is to say, the perceptual information provided by the articulatory properties of Japanese ideophones was not sufficient for making analogical connections to the incongruent Dutch meanings. Lastly, it is important to note that Japanese ideophones used in Lockwood et al.'s (2016) experimental paradigm spanned across a range of semantic categories.
The findings of Lockwood et al. (2016) seem to go against the observations made by Güldemann (2008) 16 whereby "the larger the class of ideophonic items in a language, the lower the possibility of iconicity." Our methodology offers a way to mitigate these opposing views because we assume that phonosemantic mappings from one semantic category interact with and inform other semantic categories. We rephrase Güldemann's (2008) statement with special attention to "the lower the possibility of iconicity" below.
The larger the class of ideophonic items in a language, the higher the likelihood that this class of ideophonic items spans more than just two or three semantic categories from the Dingemanse (2012) hierarchy. According to our methodology, this would mean that, the more semantic categories of ideophones in a given language, the more layers of analogical extrapolation of a form-meaning mapping from one semantic category to another. This would not result in a "lower possibility of iconicity" as Güldemann (2008) states, but, instead, a higher amount of obscured or indirect iconicity-what Sonesson (1997: 2) terms "secondary iconicity". That is to say, more semantic categories in the ideophone inventory of X-language results in multiple layers of (inter-related) secondary iconicity which is not entirely obvious to those who do not speak X-language. Non-speakers of X-language are thus denied access to the means with which to extrapolate X-language phonosemantic mappings from one (more fundamental) semantic category of ideophones (i.e., SOUND) to the next. However, Lockwood et al. (2016) demonstrated that non-speakers can access these phonosemantic mappings at an above chance level under the right semantic context afforded by experimental conditions, i.e., a forced-choice task.
Testing the implications of our methodology as well as our revision of Güldemann's (2008) claim requires further experimental work, like that of Lockwood et al. (2016), across speakers of various language backgrounds as well as ideophones from different languages. Additional in-depth analysis on the inventories of ideophones from other languages, using the methodology proposed here, is important for understanding structural differences in terms of how articulatory properties are bundled together to form a perceptuo-motor affordance behind a phonosemantic mapping. For example, we would expect ideophones from languages with restricted syllable structure (e.g., CV-only languages) or with limited sound inventories to differ in phonological structure from ideophones of languages without such restrictions. The question is, are the same articulatory properties accessible despite such typological differences? And does semantic category (ideophones other than onomatopoeia, e.g., MOTION, SENSORY, COGNITIVE) affect how perceptuomotor affordances form phonosemantic mappings? That is to say, if /k/ in coda position denotes "sharp" and "punctual" endpoints, does it also share perceptuo-motor affordances with non-auditory ideophones like those which depict the sharp and punctual pain of a pinprick? With respect to answering such a question, we have stipulated an order to follow based on Dingemanse's (2012) semantic hierarchy. At the end of §5.2 we showed how it is necessary to first analyse the most fundamental (unimodal) semantic categories of iconicity (SOUND) before moving on to multimodal categories (MOTION, SENSORY, COGNITIVE).
Though not overtly mentioned in this paper, we do not exclude additional structural (i.e., syntactic) or extra-linguistic (i.e., cultural) factors. We also do not consider how acoustic factors might contribute to the formation of perceptuo-motor analogies underpinning iconic sound-to-meaning (phonosemantic) mappings. We left out acoustic factors mainly for concerns of space and also because we exemplified our methodology using consonants which lend themselves better to articulation as opposed to acoustics. However acoustic and other factors should be considered for future comparisons of ideophone structure with aims of identifying universal properties of human perception expressed through linguistically imitative means (cf. Perlman & Lupyan 2018). Finally, in our mappings there were no exceptions, and this is perhaps because we were able to succinctly group our ideophones into semantic categories and make phonotactic relations relevant to these groups. How much freedom can be allowed for other languages (and other syllable positions) requires further investigation from other languages. While it may not always be the case that, once phonologically-predictable and phonotactic elements have been dealt with, what segmentally remains can be explained through iconic or imitative means, our methodology should aim for maximal coverage of the remaining segments.
Using this methodology, we can now provide a gestural explanation for why onomatopoeia and ideophones are iconic. We now have grounds to explanation why systematic patterns like those attested in some phonaesthemes (e.g., gl-in glitter, glisten, glimmer) are not iconic. At the start of this paper we cited the gl-phonaesthemes as an example of systematicity often misconstrued as iconicity. With our methodology in place, we are now better equipped to argue for or against the presence of iconic affordances in phonaesthemes and other prosaic words. For example, the tenets of our methodology seem to support Blust's (2003) imitative explanation for the occurrence of /ŋ/ in nose-related prosaic words throughout Austronesian languages. Figure 4 below illustrates how articulatory features of our methodology can subsume language-specific phonemes and thus reveal similarities in the structure-mappings of cross-linguistic ideophones. In this case, we use the ideophones 'sound of a doorbell' from Chaoyang, English, and Japanese to illustrate our point. The dashed lines show the articulatory features in common for onset and coda position respectively. First, we can account for the segmental differences between Chaoyang, English, and Japanese ideophones using phonotactics. For the codas, /ŋ/ and /n/ are the only permissible nasal codas in Chaoyang and Japanese respectively. The English use of /ŋ/ is perhaps homophony avoidance with dean and dawn. However, the fact that [+tongue root] is used in both Chaoyang and English /ŋ/ might speak to a [+nasal airflow] [+tongue root] perceptuomotor affordance which is otherwise not phonotactically permissible in the coda position of Japanese. With regards to onsets, in Japanese, coronals /d/ or /t/ are not permissible because they must be realized as affricates before /i/ (Labrune 2012). Seeing as voicing systematically depicts intensity in Japanese ideophones (Kakehi et al. 1996), e.g., korokoro 'rolling' > gorogoro 'heavy rolling,' /b/ is arguably inapplicable here. Comparing the Japanese example to the Chaoyang and English, suggests that coronal stops, rather than labial stops, are preferred when depicting the onset of a bell chime. The choice of coronal stop can be articulatory explained as a tapping contact against a passive surface-analogical to the contact required of a chime. In Chaoyang and English, coronal stops are permitted before /i/ and so they are able to make form-meaning mappings in this way. Japanese, on the other hand, must make do with what articulatory contact is phonotactically permissible before /i/. With this observation in mind, we can explain the onsets for English and Chaoyang. In English, the only stops which can occur in onset position without aspiration are /d/ and /g/. Thus, /p/ is not possible for English because it must be realized as [p h ] in onsets. The [+voice] nature of /d/ seems to be a phonological compromise in order to avoid the obligatory aspiration of [t h ] whilst maintaining [-tongue root] by avoiding /g/. In Chaoyang, /p t k/ are all permissible in onset position of ideophones. The use of /t/ here is faithful to our explanation that an active articulator making contact with a passive articulator is analogically related to the physical contact involved in ringing a chime. Note that /d/ is not a phoneme in Chaoyang. The avoidance of [+aspiration] and [+tongue root] for all onsets in Figure 4 is beyond the scope of this illustrative example. What our methodology allows us to hypothesize from this example is that these articulatory gestures (or a combination thereof) should be attested in semantically similar ideophones across languages. Moreover, we can hypothesize that Chaoyang speakers might rely on the articulatory cues shown in Figure 4 to map, learn, or guess these Japanese and English meanings. Essentially, we now have a means to argue for whether prosaic words or phonaesthemes contain articulatory features which allow speakers to make perceptuo-motor analogies connected with their meanings. For example, if speakers rate the prosaic word balloon as highly iconic, our methodology allows for comparison between the articulatory features of balloon and the articulatory features of ideophones to see if there is any crossover which could result in prosaic balloon being perceived by native speakers as highly iconic. Following on from this, take the phonaestheme -ump (i.e., clump, hump, bump, lump, slump, stump, mumps, plump), there is an etymological explanation for the appearance of this phonaestheme in certain words. 17 But future studies can now use our methodology to see whether the bundle of articulatory features embodied by said phonaestheme, [+closed lips] [+resting] [+nasal air flow], is found in similar semantic realms intralinguistically (e.g., How do English onomatopoeia make use of these features? How many semantically unrelated prosasic words exist with the same -ump sequence?) and cross-linguistically in either prosaic words (e.g., hump, bump, slump in Malay, Estonian, Cantonese, Zulu, Quechua etc.) or ideophones. One might argue that the articulatory choreography required by the features of -ump [+closed lips] [+resting] [+nasal air flow] allows for perceptuo-motor analogy of 'roundness' simulated by closing the lips + tongue in a neutral position + bilabial stop creating pressure in front of the oral cavity behind the lips from unreleased air. More data and detailed phonotactic analysis, as required by our methodology, is of course needed to support such a wishful explanation for the supposedly perceptuo-motor analogies underpinning the structure of these phonaesthemes.
Finally, in a similar vein to phonaesthemes, sound-to-meaning correspondences have been noted across many languages for prosaic words which are not intuitively imitative in nature (i.e., not ideophones), e.g., ash, bite, bone, breasts, knee, leaf (Blasi et al. 2016; cf. Aryani 2018 for German affective words). Though a concrete explanation for such findings remains to be seen, the methodology proposed here has great potential to determine how iconically grounded these robust cross-linguistic mappings actually are.

Conclusion
Prior to the methodology laid out in this paper, there has been no clear way to differentiate and qualify forms of iconicity versus forms of systematicity in spoken language. Because of this, systematicity in language has sometimes been misappropriated as a form of iconicity. Regular systematic patterns, such as phonaesthemes in English, have been explained as systematically occurring in language because of iconic underpinnings. But if systematic patterns are only found in related languages, what does this mean for the universal nature supposedly inherent to iconicity? Despite the fact that systematicity has been defined as usually arbitrary and rooted in statistical relationships (Dingemanse et al. 2015), those who still claim that instances of systematicity are in fact underlyingly iconic are yet somehow unable to explain what makes these supposedly iconic underpinnings different from or similar to those attested in iconic forms like onomatopoeia. Our methodology answers this question by showing how articulatory gestures (lip movement, tongue movement, air flow etc.) can be used to identify the iconic underpinnings for 17 Clump from Old English clympre 'lump, mass' ultimately from Proto-Indo European *kluƀ which derived Old High German chlobo 'knot, club'; Hump from the Indo-European root *kemb-'bend' (Watkins 2000); Bump from similar origins as lump but also related to (obsolete) bub 'pimple' from post-classical Latin bubon-'nodular swelling or abscess'; Lump from 16 th century Danish lumpe 'lump, stump, block, log'; Slump from Low German slump 'heap, mass'; Stump from Middle Low German stump and Middle Dutch stomp 'mutilated, blunt, dull'; Mumps is a plural derived from mump (obsolete) 'inarticulate, speechless' (cf. mumble), compare to Icelandic mump 'murmur'; Plump borrowing from Dutch plomp 'plump, squat, rude, clumsy' cognate with Middle Low German plump 'clumsy, uneducated, rude'; all etymologies from OED (2018) unless otherwise stated.
imitative words and explain them using perceptuo-motor analogy. Conversely, our methodology asserts that if a linguistic form cannot be explained through articulatory gesture, and thus a perceptuo-motor analogy, it cannot or should not be considered iconic. Following on from this, our methodology also holds that the universally accessible nature of articulatory gestures and their sensory properties coupled with speakers' universal ability to create analogy to scaffold meaning are what allows for words like ideophones to be perceived as iconic and therefore imitative. This would mean that purely systematic forms, as seen in some phonaesthemes, should not be explainable through articulatory gesture and as a result should be semantically opaque or difficult to learn for non-native speakers who have zero exposure to their systematic relationship in English or another related language. Such a claim about non-iconic words is in line with current ideophone research which has shown that Dutch speakers have an easier time learning iconic words from Japanese so long as the meanings of the Japanese words are their true meaning and not an incongruent foil (Lockwood et al. 2016).
In terms of explaining away the structure of imitative words like ideophones, our methodology shifts the focus of ideophone structure (and its relationship with soundto-meaning mappings) away from language-specific features (i.e., phonemes or even phonological features) and towards broader and more articulation-based categories with greater potential for linguistic universality, such as tongue position, tongue movement, airflow, nasality, open/closed lips. Given their anatomical qualities, these broader and more articulation-based categories should be accessible to all speakers regardless of phonological and phonotactic differences attested cross-linguistically. This is our attempt to solve the problem of iconicity being apparently universal without actually appearing so at the segment-level. In other words, our methodology uses broad categories of articulatory gesture to answer the question of "Why does Language A use sound X to imitate referent Y while Language B uses sound Z?" Another way of putting it, our methodology uses articulatory gestures to answer why dogs bark differently in different languages. With this articulation-based methodology, we can now differentiate universal properties of iconicity versus language-specific properties of systematicity.
Overall, our methodology unifies the visual and spoken modalities by adapting iconicity of the visual modality to the spoken modality using articulatory gestures. Gaps in how to identify and explain iconicity in the spoken modality have been filled here through crosspollination of knowledge from the culmination of research in the visual modality. This application across modalities speaks to the universal nature of iconicity yet again.

Additional Files
The additional files for this article can be found as follows: