Accounting for the stochastic nature of sound symbolism using Maximum Entropy model

Sound symbolism refers to stochastic and systematic associations between sounds and meanings. Sound symbolism has not received much attention in the generative phonology literature, perhaps because most if not all sound symbolic patterns are probabilistic. Building on the recent proposal by Alderete and Kochetov (2017), which attempts to integrate sound symbolic patterns with core phonological grammar, this paper shows that MaxEnt grammars allow us to model stochastic sound symbolic patterns in a very natural way. The analyses presented in the paper show that sound symbolic relationships can be modeled in the same way that we model phonological patterns. We suggest that there is nothing fundamental that prohibits formal phonologists from analyzing sound symbolic patterns, and that studying sound symbolism using a formal framework may open up a new, interesting research domain.


Introduction
In recent linguistic theories, it is almost standard to assume that the relationships between sounds and meanings are arbitrary. This thesis dates back to Hermogenes's view expressed in Plato's Cratylus, and it was very clearly articulated by Saussure (1916) as the first organizing principle of natural languages. The thesis of arbitrariness was reiterated as one of the design features of human languages by Hockett (1959). Few modern linguists would disagree with the thesis that language is a system that is capable of associating sounds and meanings in arbitrary ways.
On the other hand, many studies have revealed certain systematic relationships between sounds and meanings, an observation which is often referred to as "sound symbolism" (see Akita 2015;Dingemanse et al. 2015;Hinton et al. 2006;Lockwood and Dingemanse 2015;Nuckolls 1999;Sidhu and Pexman 2017; Svantesson 2017 for recent reviews). For example, Sapir (1929) and many subsequent studies have shown that low vowels tend to be judged to be larger than high vowels, and front vowels tend to be judged to be smaller than back vowels (Berlin, 2006;Coulter and Coulter, 2010;Jakobson, 1978;Jespersen, 1922;Newman, 1933;Ohala, 1994;Shinohara and Kawahara, 2016;Ultan, 1978). In both English and Japanese, there are stochastic tendencies for sonorants to be associated with female names and for obstruents to be associated with male names (Shinohara and Kawahara, 2013;Uemura, 1965;Wright and Hay, 2002;Wright et al., 2005). It is probably safe 110 S. Kawahara et al. to conclude based on these studies that there are tendencies in natural languages that certain sounds are associated with certain meanings.
These sound symbolic patterns are, crucially, stochastic or probabilistic, and almost never deterministic (Dingemanse, 2018). On the one hand, speakers of many languages feel that [a] is larger than [i] or [ɪ] (Shinohara and Kawahara, 2016), and this association is reported to hold in the lexicon of many languages (Blasi et al., 2016;Ultan, 1978). For instance, diminutive affixes, including English "-y" (as in "blank-y"), often contain a high front vowel in a number of languages (Ultan, 1978). However, in no way is this association exceptionless. For example, the English word big contains the "small vowel", [ɪ] (see also Di oth 1994). There is a sense in which [a] is felt to be bigger than [i], but few would argue that languages cannot use [a] to represent something small or cannot use [i] to represent something big. Similarly, although sonorants are often associated with female names, there are male names that contain sonorants and female names that contain obstruents. If sound symbolic relationships were truly deterministic, then as Locke (1689) and Saussure (1916) noted, all languages should use the same sound sequences to represent the same denotations. 1 While studies of sound symbolism are flourishing in phonetics, psychology and cognitive science (Akita, 2015;Dingemanse et al., 2015;Hinton et al., 2006;Lockwood and Dingemanse, 2015;Nuckolls, 1999;Sidhu and Pexman, 2017;Svantesson, 2017), sound symbolism has not received serious attention in the generative phonology literature. One clear exception is Alderete and Kochetov (2017), who developed a formal theory of sound symbolism, using a set of Optimality Theoretic constraints Smolensky, 1993/2004), E -(X). This set of constraints requires a certain phonological feature X to be realized to signal certain semantic features (see also Kochetov and Alderete 2011). E (X) is suited to account for non-probabilistic patterns driven by sound-symbolic principles such as palatalization in baby talk in Japanese and other languages. However, most if not all sound symbolic relationships are probabilistic, as discussed above. To this end, this paper expands on Alderete and Kochetov's proposal and shows that using Maximum Entropy model (MaxEnt: Goldwater and Johnson 2003) successfully accounts for stochastic aspects of sound symbolic patterns. The current proposal is illustrated with four case studies. 2 In addition to this theoretical contribution, two of our case studies report new empirical data, thereby expanding our knowledge of sound symbolism.

Names of Takarazuka Revue actresses
The first case study is based on a new set of empirical data, which comes from the names of Japanese Takarazuka Revue actresses. All Takarazuka actresses are female, but some actresses play a male role, while others play a female role. Importantly, the actresses' gender in Takarazuka context is fixed; i.e. one actress cannot act both as male and female during her Takarazuka career. In this study, we first explored the effects of sound symbolic relationships between female names and sonorants on the one hand, and male names and obstruents on the other, sound symbolic relationships which have been shown to hold in Japanese (Kawahara, 2017;Shinohara and Kawahara, 2013;Uemura, 1965) as well as in English (Wright and Hay, 2002;Wright et al., 2005).
1 However, we also need to take into consideration the fact that different languages use different sets of sounds, and they have different phonotactic restrictions (see Shih et al. 2018 andStyles andGawne 2017 for the implication of these cross-linguistic phonological differences on sound symbolism). The set of real world attributes being referred to (or more simply, the set of semantic denotations) may differ across languages as well (Shih et al., 2018).

Method
All of the 361 names of Takarazuka actresses, who were active as of July 2017, were analyzed. For each name, we coded the number of sonorants and obstruents, as well as whether that name is used for a male role or a female role. Each name contained at most three consonants.  The results appear in Figure 1, in which the y-axes represent the probability of the names being used for female actresses. The left panel shows that the more sonorants a name contains, the more likely it is used as a female name; the right panel shows that although the tendency is less clear, the more obstruents a name contains, the less likely it is used as a female name. These patterns accord well with the sound symbolic effects previously noted for Japanese and English.

Results
A multiple logistic regression analysis was run with gender as the dependent variable and the number of sonorants and the number of obstruents as independent variables. The result shows that sonorants significantly increase the likelihood of the name being used for a female name (β 1 = 0.822, z = 4.19, p < .001), while obstruents do not have a significant impact on the gender choice (β 2 = 0.05, z = 0.36, n.s.).
This analysis thus suggests that there is a stochastic tendency in such a way that sonorants are associated with female roles. Crucially, though, the effects are not deterministic; it is not the case that the presence of a single sonorant makes the whole name a female name 100% of the time; instead, the presence of a sonorant increases the probability of that name being used as a female name. This sort of stochastic pattern is commonplace-in fact, usually the norm-in sound-symbolic patterns in natural languages (Dingemanse, 2018). 3 While the proposal by Alderete and Kochetov (2017) successfully accounts for deterministic sound 3 This stochasticity may be one of the reasons why sound symbolism did not receive serious attention in the generative phonology literature. In generative phonology (Chomsky and Halle, 1968), it was standard to assume that elements and structures that are completely predictable are derived in the phonological component of grammar, although in practice some exceptions are usually tolerated; on the other hand, those that are not completely predictable-whose predictability was lower than 1.00-were assumed to be stored in the lexicon (see Shaw and Kawahara 2018 for a historical overview). This means that phonological patterns should be exceptionless, at least at some level of representation or analysis (see Shattuck-Hufnagel 1986 for relevant discussion). This assumption may have precluded generative phonologists from analyzing sound symbolic patterns, although there are likely to be other reasons, like the influence of Saussure (1916). symbolic patterns, it does not account for this sort of stochastic nature of sound symbolic patterns; we thus expand on their proposal to account for the gradient nature of sound symbolism, using the MaxEnt grammar model.

A MaxEnt analysis
Generative phonology is a function that maps one representation (e.g. underlying representation) to another representation (e.g. surface representation) (McCarthy, 2010); this is true, regardless of whether it is ruledbased or constraint-based, or whether it involves serial derivation or not. 4 The model proposed in this paper is the same: for the case analyzed in this section, our grammar is a function that takes names as inputs and maps them to probability distributions of two candidates, male names or female names. For example, it takes a name like Yurino, and calculates the probability of that name being used for a female name and the probability of the name being used for a male name.
The current model uses MaxEnt grammar, as it is suited to account for gradient mapping from one dimension to another (e.g. Goldwater and Johnson 2003;Hayes 2017;Hayes and Wilson 2008;Moore-Cantwell and Pater 2016;Tanaka 2017;Wilson 2006;Zuraw and Hayes 2017). 5 One particular attractiveness of this model in the current context is the fact that MaxEnt is able to predict probability distributions of output candidates, which can be compared with the observed probabilities. To capture the sound symbolic effect of sonorants in Figure 1, we posit the following constraint in (1): (1) *S →M : For each sonorant contained in name x, assign a violation mark if x is mapped to male names. 6 This constraint reflects the tendency that is observed in the left panel of Figure 1: sonorants are preferentially associated with female names. This constraint alone, however, cannot account for the fact that even if all the consonants in the names are sonorants, they can be male names 25% of the time (the rightmost bar of Figure 1, left). In order to account for this fact, we posit a *F N constraint, a constraint that is akin to *S constraints (see Daland 2015 for the role of *S in MaxEnt grammars). This constraint simply restates the fact that there can be male names.
MaxEnt grammar is similar to Optimality Theory Smolensky, 1993/2004) in that a set of candidates is evaluated against a set of constraints. Unlike Optimality Theory, however, constraints are weighted Zuraw (2000) has shown, however, that those patterns that are not completely predictable can be still systematic and productive ("patterned exceptions"), and much subsequent work (e.g. Ernestus and Baayen 2003;Hall 2009;Hayes and Londe 2006;Hayes and Wilson 2008;Hayes et al. 2009;Kumagai and Kawahara 2018;Moore-Cantwell and Pater 2016;Pierrehumbert 2001;Tanaka 2017) has demonstrated that phonological knowledge can be deeply stochastic. There are phonological patterns that are only probabilistically predictable but yet productive. rather than ranked, as in Harmonic Grammar (Legendre et al., 1990a,b;Pater, 2009Pater, , 2016. Based on the constraint violation profiles, for each candidate x, its Harmony Score (H-Score(x)) is calculated as follows: where w i is the weight of the i-th constraint, and C i (x) is the number of times candidate x violates the i-th constraint.
The H-scores are negatively exponentiated (eHarmony, e −H : Wilson 2014), which corresponds to the probability of each candidate. Intuitively, the more constraint violation a candidate incurs, the higher the H-score, and hence the lower the eHarmony (e −H ) is. The eHarmony values are relativized against the sum of all the eHarmony values, which is sometimes referred to as Z: The probability of each candidate x j , p(x j ), is To implement the analysis of Takarazuka actress names, we used the MaxEnt Grammar tool made available by Bruce Hayes, 7 which calculates the best weights for each constraint given the observed frequencies of each candidate, as well as the predicted probabilities based on these weights. The MaxEnt analysis tableau is shown in (2). The model takes each name as its input and calculates the probability of the name being used for a male name and the probability of the name being used for a female name. The analysis focuses on the number of sonorants that each name contains, as the number of sonorants was the significant factor in the analysis presented in section 2.2. (2) The MaxEnt analysis of Takarazuka names Note to the type setter. The MaxEnt tableaux in the paper are created using Microsoft Word. Each number above each tableau corresponds to the example number in the text. These are NOT tables, so please do not delete the grid lines (both horizontal and vertical). We would appreciate it if you can keep shading and double-grid lines as well.
( As shown, the MaxEnt tool yielded the weights of the two constraints (0.74 and 1.11), which produce expected percentages that are very close to the observed percentages (the right two columns).

Nicknames of Japanese AKB idol members
The sound symbolic association between sonorants and femaleness which we observed in the previous section manifests itself in another pattern in Japanese. Japanese idols often use nicknames that involve reduplication. Those instances usually show reduplication of syllables with sonorant onsets (e.g. mayu-yu, kayo-yo-n, miru-ru-n, and miyu-miyu, in which "y" represents a palatal glide). On the other hand, reduplication of syllables with obstruent onsets is rare. This asymmetry can be understood as an instance of the sound symbolic association between sonorants and femaleness.

Method
To quantitatively assess this hypothesis, the nicknames of all of the AKB idol group members were coded. The AKB idol group consists of sub-groups (AKB48, SKE48, NMB48, HKT48, NGT48, STU48, and SDN48), and all of these groups were included in the current analysis. The coding was based on the wikipedia website specifically devoted to these idol groups (https://48pedia.org/, accessed September 2017). The numbers of obstruents and sonorants that were targeted by reduplication were coded. As a baseline, all obstruents and sonorants that appear in their names were also coded. The current study focused on onset consonants and did not count coda consonants (Japanese coda consonants are limited to so-called "coda nasals"). There were no geminates in the names analyzed in the current study.  Table 1 shows the results. In total, there were 900 sonorants in the corpus of AKB idol names, and 45 of them were targeted by reduplication (5%). On the other hand, only 7 out of 643 obstruents (1%) were targeted by reduplication. This difference between sonorants and obstruents is statistically significant (χ 2 (1) = 16.44, p < .001). Viewed from a different perspective, only 58% of the consonants in all of the AKB names are sonorants (= 900/1,543), but 87% of the reduplicated consonants are sonorants (= 45/52). It thus seems safe to conclude that there is a bias toward targeting sonorants in reduplication.

Result and analysis
Accounting for these reduplication patterns requires only two constraints, one of which is very similar to (1): (3) a. *O →F : For each obstruent contained in name x, assign a violation mark if x is a female name. b. *R : Assign a violation mark for each syllable that is reduplicated.
The first constraint militates against reduplicating syllables with obstruent onsets in female names, which accounts for the very low probability of obstruents being reduplicated in the current dataset. The second constraint is necessary, because not all sonorants are reduplicated (this constraint is formalized as I "no breaking" by McCarthy and Prince 1995). The MaxEnt analysis of AKB nicknames using these constraints is provided in (4).
(4) The MaxEnt analysis of the reduplication pattern in AKB idol names As observed, the sound-symbolic constraint *O →F , together with a constraint that militates against reduplication, accounts for the difference between sonorants and obstruents in a very straightforward manner. One notable aspect of this analysis is that even when a candidate violates no constraints (i.e. [mayu]), it is not the case that it is assigned the probability of 1.00. This is because MaxEnt calculates the probability distribution over all candidates considered; less than optimum candidates are assigned some probabilities, and hence even the perfect candidate does not get "all the share."

Voiced obstruents and evolution levels in Pokémon names
As another case study, this section examines the role of voiced obstruents in determining the evolution levels of Pokémon characters. In the Pokémon game series, Pokémon characters undergo evolution, at most twice, and when they do so, they are called by a different name. Kawahara et al. (2018), based on all the Japanese Pokémon character names available as of October 2016, show that there is a positive correlation between the numbers of voiced obstruents contained in the names and the evolution levels of the Pokémon characters. 8 Based on the results of Kawahara et al. (2018), the crucial aspect of their data is reproduced in Table 2. The probabilities of the names being used for non-evolved characters (Evol level 0) decrease as the names contain more voiced obstruents (the leftmost column). The probabilities of names being used for the evolved characters (Evol levels 1 & 2) increase as the names contain one or two voiced obstruents (the right two columns). To account for these distributional skews, we posit two constraints in (5): (5) a. *V O →N E : For each voiced obstruent contained in name x, assign a violation mark if x is mapped to Evol level 1 names; assign two violation marks if x is mapped to Evol level 0 names. b. *E N : Assign two violation marks to Evol level 2 names; assign one violation mark to Evol level 1 names.
The first constraint is formalization of the sound symbolic correspondence between voiced obstruents and evolved characters-ideally, names with voiced obstruents are mapped onto names for the most evolved characters. In other words, it prohibits names with voiced obstruents from being used for non-evolved characters. *E N is akin to *S constraint (Daland, 2015;Smolensky, 1993/2004), which is suited to capture the fact that evolved characters are less frequent than non-evolved characters. The MaxEnt analysis of Pokémon names using these two constraints appears in (6). As was the case for the two cases analyzed above, the model takes names as input and yields probability distributions of each candidate; for this analysis, each candidate corresponds to each evolution level.
The MaxEnt analysis of Pokémon names , and one of the reasons may be that sound symbolic patterns are almost always gradient. With a formal tool like MaxEnt, which offers a natural way to account for probabilistic generalizations in phonology (Hayes & Wilson 2008), there is nothing that prohibits formal phonologists from analyzing sound symbolic patterns, which may provide an interesting general domain for future phonological research.
We observe that there is a close correlation between the observed probabilities of each name type and their predicted values.

Associations between sounds and shapes
The three case studies reported above were based on corpus data. However, sound symbolic patterns are not only observed in existing names, but also in experimental settings. This section models the results of a naming experiment reported in Kawahara and Shinohara (2012). This experiment examined the association between sonorants and round figures on the one hand, and the association between obstruents (in this experiment, oral stops) and angular shapes on the other. These associations were first noted by the influential work of Köhler (1947), who showed that a nonce word like takete is more likely to be associated with an angular object, whereas a nonce word like maluma is more likely to be associated with a round object. 9 In the experiment, 20 disyllabic stimuli containing stop onsets (e.g. [bakə]) and 20 disyllabic stimuli containing sonorant onsets (e.g. [wejə]) were presented together with various pairs of an angular shape and a round shape. Within each trial, the participants were presented with an auditory stimulus together with a pair of an angular and a round object, and were asked to choose which object would be a better match for the auditory prompt. The participants were 17 native speaker of English. The results show that nonce names with sonorant onsets were associated with round shapes about 70% of the time, whereas nonce names with obstruents were associated with angular shapes 68% of the time (the data were retrieved from Figure 6 of Kawahara and Shinohara 2012).
To account for these results, we posit two constraints, each of which directly corresponds to the observed sound-shape association: a. *S →R : For each oral stop contained in name x, assign a violation mark if x is used for a name of a round shape. b. *S →A : For each sonorant contained in name x, assign a violation mark if x is used for a name of an angular shape.
The MaxEnt analysis appears in (8). In fact, this analysis achieves a perfect match between observed and predicted values. This is because the structure of violation profiles is simple: each constraint takes care of one mapping (either from oral stops to angular shapes, or from sonorants to round shapes), and the number of candidates within each mapping is limited to two.

(8)
The MaxEnt analysis of sound-shape mapping

Conclusion
This paper has shown that the MaxEnt grammars can model sound symbolic patterns, especially with regard to their gradient aspects. The way we analyzed the sound symbolic patterns is not different from how phonologists analyze phonological patterns using MaxEnt. This means that formal grammatical tools which theoretical phonologists have been using for decades can be extended to analyses of sound symbolic patterns at no additional costs. We believe that this is an important success, because sound symbolism has not been extensively studied in the theoretical literature (modulo Alderete and Kochetov 2017). One of the reasons why sound symbolism was not analyzed in the theoretical literature may be that sound symbolic patterns are almost always gradient. A formal analytical tool like MaxEnt can naturally account for the gradient nature of sound symbolic association patterns. To the extent that MaxEnt offers a natural way to account for probabilistic generalizations in phonology (Goldwater and Johnson, 2003;Hayes, 2017;Hayes and Wilson, 2008;Moore-Cantwell and Pater, 2016;Tanaka, 2017;Wilson, 2006;Zuraw and Hayes, 2017), formal phonologists can analyze sound symbolic patterns in a way that may provide a new, interesting general domain for future phonological research.
To this end, one way in which we can branch out is analyses of other languages, because most of our case studies are based on data from Japanese. Since MaxEnt (and OT) are analytical tools that are very useful for cross-linguistic comparisons (McCarthy, 2008;Smolensky, 1993/2004), analyzing sound symbolic patterns in other languages using MaxEnt may shed light on how sound symbolic principles operate in natural languages in general.
Another promising domain for future research is to examine cases in which some sound alternations motivated by sound symbolic principles (like palatalization) interact with other phonologically-motivated constraints within a single system (Alderete and Kochetov, 2017). We thus would like to close this paper by discussing a possible case study of this kind. In Japanese nickname formation, sometime [h] is rendered into [p], and this alternation occurs arguably in order to express "cuteness" (Kumagai, 2019). We independently know that Japanese shows evidence for the phonotactic constraint that prohibits the configuration in which [p] occurs in a word with a voiced obstruent . Modeling the probabilistic [h]-[p] alternation, taking into consideration how this alternation interacts with the phonotactic constraint, would help us reveal how sound symbolic principles and phonological constraints can probabilistically interact within a single system-MaxEnt grammar would be suited to model this interaction, since the [h]-[p] alternation and the phonotactic constraint are both stochastic.