Epenthetic vowel production of unfamiliar medial consonant clusters by Japanese speakers

Existing nativized loanword studies have traditionally suggested that there are three epenthetic vowels in Japanese, which reflect both phonotactic restrictions and articulatory properties of certain consonant-vowel sequences in the language. Recent findings, however, call this tri-partite epenthesis pattern into question: First, several studies suggest that this epenthesis pattern is not true in the realm of perception and is not completely regular in production, and second, the relevant phonotactic restrictions seem to be weakening even outside of epenthesis contexts. This paper therefore investigates the extent to which the spontaneous choice of epenthetic vowels in the production of Japanese conforms to the traditional tri-partite pattern. Epenthesis was induced by presenting pseudo-word stimuli of the form of [aCCa] (C = a voiced consonant) to subjects orthographically. The findings suggest that indeed, the production pattern does not fully conform to what is generally reported for nativized loanwords; in particular, the traditionally “default” vowel [ɯ] is used by our participants frequently in all contexts, including the two where [o] or [i] is usually reported. That said, we also show that there is considerable variability across speakers as to which vowel is epenthesized, especially in the palatal context, and this variability includes tokens of vowels similar to all possible lexical vowels of Japanese.


Introduction
Vowel epenthesis is a common repair strategy used by speakers to adapt loanwords that contain unfamiliar phonological structure in the borrowing language (e.g., Davidson, 2006;Fleischhacker, 2001;Hall, 2011;Kabak & Idsardi, 2007;Kang, 2011;Uffmann, 2006). Such is the case in Japanese where vowel epenthesis serves to make non-native structures more native-like (e.g., Hirayama, 2003;Itô, 1989;Smith, 2006;Kubozono, 2015). For example, the English word 'pipe' [paɪp] is commonly pronounced as [paɪpɯ] with [ɯ] occurring in word-final position, as consonants other than [ɴ] do not occur word-finally in Japanese (Kubozono, 2015). The adaptation of unfamiliar consonant sequences by epenthesis in Japanese has served as a test case for studying the influence of native speech experience on speech perception and production (e.g., Dupoux, Kakehi, Hirose, Pallier, & Mehler, 1999;Dupoux, Pallier, Kakehi, & Mehler, 2001;Dupoux, Parlato, Frota, Hirose, & Peperkamp, 2011;Monahan, Takahashi, Nakao, & Idsardi, 2009;Peperkamp & Dupoux, 2003;Shoji & Shoji, 2014;Sperbeck, 2012;Yazawa, Konishi, Hanzawa, Short, & Kondo, 2015). For example, Dupoux et al. (1999) found that native Japanese listeners perceive an illusory vowel, [ɯ], 1 between sequences of consonants that are illicit in Japanese. This finding Japanese-speaking listeners to identify what vowel, if any, occurred between the two Cs in VCCV forms, where the place of articulation of the first C varied. In the labial, alveolar stop, and velar contexts, they perceived an epenthetic vowel between the two Cs 60-70% of the time, and in the palatal context, they perceived an epenthetic vowel 98% of the time. The choice of epenthetic vowel, however, did not always match the distribution described above. In the palatal context and the velar context, the perceptual epenthetic vowels were largely as expected; of the tokens where an epenthetic vowel was perceived at all, 94% were the expected [i] in the palatal context and 94% were the expected [ɯ] in the velar context. In the labial context, however, only 84% of perceived vowels were the expected [ɯ], with the rest being fairly equally distributed among [a], [e], and [i]. In the alveolar stop context, the discrepancy was even more extreme, with only 16% of perceived vowels being the expected [o], and 71% being [ɯ] instead, even though * [dɯ] is an illicit phonotactic sequence in native Japanese. It should be noted that there was no control group for language in this study; thus it is possible that these perceptual effects could be driven not by Japanese-specific patterns but rather by more general acoustic characteristics of the stimuli. Interestingly, Monahan et al. (2009) found that while Japanese-speaking listeners do perceive an illusory [ɯ] in velar VCCV contexts, they were able to discriminate alveolar VCCV sequences from similar sequences with either a medial [ɯ] or [o], suggesting that they perceived these as VCCV sequences without an epenthetic vowel (they did not test labial or palatal contexts).
There is also evidence from a different type of production study than that used in Yazawa et al. (2015) that suggests that the productive choice of epenthetic vowel may be different from that in lexicalized loanwords. Shoji and Shoji (2014) used a writing production experiment to examine patterns of vowel epenthesis in hypothetical loanwords from nonce words spelled in orthographic Latin script; native Japanese speakers transcribed the nonce words in Japanese characters, which forced them to either delete or epenthesize in every case. They found that in palatal contexts, where [i] might be expected, [i] [ɯ] did in fact occur 71% of the time in wordinitial clusters, while [i] was used 15.6% of the time, and [ɯ] was used 95.6% of the time in word-final clusters. They do not report the results of labial contexts separately, but do find the expected [ɯ] used in their 'other' contexts, which included labials, >90% of the time in both word-initial and word-final clusters. Thus, while the traditionally expected vowels were the most frequent in any given context, in several contexts, they accounted for fewer than half of the actual tokens.
Third, there is separate evidence that the phonotactic constraints governing loanwords are changing in non-epenthetic contexts. Pintér (2015, p. 121-122) points out that while older loanwords with /ti/ sequences were typically adapted as [tɕi] (e.g., 'team' adapted as [tɕi:mɯ]), more recent loanwords are adapted more faithfully with [ti] (e.g., 'party' adapted as [pa:ti:]). Kubozono (2015, p. 325) notes that a single loanword can also have multiple, age-related adaptations (e.g., 'tissue' being adapted as either [tʃitʃʃɯ] or [tetʃʃɯ] by older speakers but as [tiʃʃɯ] by younger speakers). That said, [t] and [d] have been observed to occur before [i] in at least some loanwords since at least 1950 (Bloch, 1950 cites examples like vanity case [vaniti] and caddy [kjadi:]). A similar change is reported for /tu/ sequences; despite the traditional constraint against [tɯ] in Japanese, Pintér (2015) reports that this sequence is in fact possible in more recent loanwords, e.g., 'Bantu' adapted as [bantɯ:]. As Pintér (2008, p. 112) also points out, even the official stance of the Japanese National Language Committee changed between 1954 and 1991; in 1954, [tɯ] was not acknowledged as a possible written syllable of Japanese, but in 1991, it was acknowledged though not officially supported. Thus, the phonotactic motivations for epenthesizing only [o] after alveolars and [i] after only palatals may be eroding. Indeed, an examination of the Balanced Corpus of Contemporary Written Japanese (National Institute for Japanese Language and Linguistics, 2011) indicates that all five lexical vowels can occur after both [d] and [dʑ] in loanwords (defined as words of foreign origin other than Chinese); see also discussion in Hall (2009Hall ( , 2013 showing that loanwords are eroding the predictability of several pairs of Japanese phonemes.
These results combine to raise questions about the current state of epenthesis in Japanese. Specifically, it would appear that the use of [o] after alveolar stops and [i] after palatals, i.e., the typical pattern in loanword epenthesis, is not currently a clear-cut pattern in perception, production, and other areas of loanword adaptation. This suggests that either perception and production factors are not the explanatory causes of the lexicalized loanword epenthesis patterns or that there may similarly now be a change in the epenthesis adaptation patterns themselves. The current study is designed to be a first step in understanding the current state of loanword epenthesis patterns from a production perspective. It involves a production task that tests the full range of epenthetic vowels used to break up VCCV sequences in Japanese; in order to maintain maximal control over the stimuli, however, these sequences are simply presented as nonce words, rather than being loanwords in any real sense (i.e., they are not associated with meaning or claimed to come from a particular foreign source language); we note, however that it has generally been claimed that loanwords and nonce words in Japanese tend to follow similar grammatical patterns. As Kawahara (2012Kawahara ( , p. 1194) points out, "both loanwords and nonce words show default accentuation patterns in Japanese (e.g., Katayama, 1998;Kawahara & Kao, 2012;Kubozono, 1996Kubozono, , 2006Kubozono, , 2008Labrune, 2012;McCawley, 1968;Shinohara, 2000), [and] neither native words nor Sino-Japanese words allow voiced geminates, while both loanwords and nonce words allow them (Itô & Mester, 1995." Kawahara goes on to demonstrate that Japanese speakers judge both loanwords and nonce words as being more natural when they follow Lyman's Law, even though Lyman's Law does not hold categorically outside of the native vocabulary. Thus, we have reason to believe that nonce words might be a reasonable proxy for loanwords.
To anticipate the results, some patterns were consistent with earlier studies, while there was considerable individual speaker variability in other cases. In particular, [ɯ] was consistently used by all 14 participants after both labials and velars, which is in line with reported patterns of loanword adaptation. However, after alveolar stops, only two participants used the expected [o], with the other 12 speakers either using [ɯ] or using some combination of both [o] and [ɯ], suggesting that the loosening of the phonotactic constraint against [dɯ] is possibly extending to epenthetic contexts. Perhaps most surprisingly, the palatal context induced a large range of epenthetic vowels; again, only two speakers consistently used the expected vowel, [i], in this context, while four used [ɯ], and the rest used some combination of almost every lexical vowel, including [e] and [a], which are not generally reported as being used epenthetically at all.

Background on Japanese
Before describing the details of the production experiment, we note the following relevant information regarding the Japanese phonological system. Modern Japanese has five phonemic vowel qualities, as shown in Table 1 (e.g., Akamatsu, 2000;Shibatani, 1990;Tsujimura, 1996;Vance, 1987Vance, , 2008. All vowels have a phonemic length contrast. In terms of vowel duration, [ɯ] is the shortest vowel in Japanese while [a] is the longest (Campbell, 1992;Han 1962, cited in Shoji & Shoji, 2014Yoshida, 2006). Further, among the five vowels, the high back vowel [ɯ] is the least likely vowel to be accented (Yoshida, 2006). It should be noted that this conservative analysis for Japanese allophonic status fails for Sino-Japanese and, especially crucial for the current study, loanwords. Pintér (2015, p. 125), for example, claims that the "innovative variety [of Japanese] … accommodates (almost) all logically possible CV combinations," including sequences like [ti], [di], [tɯ], and [dɯ], as mentioned above, and while he does not take a firm stance on the appropriate phonological representations of these sounds, he does treat the innovative forms as emergent contrasts, suggesting that they are not simply contextually predictable allophones.

Methodology
In order to more thoroughly investigate the nature of epenthetic vowels in Japanese, a production study was carried out. Native speakers of Japanese were asked to produce nonsense words that were likely to trigger epenthesis across a variety of consonantal environments, and then acoustic analyses were conducted to determine the identity of the epenthetic vowel in each context for each speaker.

Speakers
Fourteen native speakers of Japanese 3 (10 female, 4 male) participated in the production experiment, conducted at University of Canterbury, in New Zealand. Participants were recruited from local English language schools via posted fliers at the schools, and were compensated with a $20 voucher. Participant age ranged from 21 to 46 (mean = 27.3). All participants had lived in an English-speaking country for less than one year, and were on a working holiday or studying English. No participants reported any speech or hearing disorders. They had all received English language education for six years in junior high and high school in Japan, since English is compulsory from age 12. Their total years living in foreign countries including non-English-speaking countries was less than three years.

Materials
There were two types of stimuli created, those for a control condition and those for an experimental condition. The control condition was intended to elicit natural examples of each speaker's regular production of each of the five Japanese lexical vowels, in an inter-consonantal context. were intended to increase the variety of produced items in order to minimize participants' recognition of patterns in the stimuli. The 60 control items were repeated twice, while the 12 experimental items and 24 filler items were each repeated three times, to create a total of 228 trials. These trials were then divided evenly across two sessions, as shown in Table 3. A full list of production stimuli is given in Appendix A. The total possible number of epenthetic vowels for each speaker was 36 (12 experimental stimuli * 3 repetitions). For the vowels in the phonotactically licit control stimuli, {a, e, i, o, ɯ}, there were 24 instances of each (12 voiced consonantal environments * 2 repetitions) during the two sessions. Thus, there were 156 tokens of interest for each speaker. It should be noted, 3 Five additional participants were excluded either because they were highly bilingual or because they misread the stimuli. 4 Following Mattingley et al. (2015), the flanking consonants were chosen to be voiced obstruents in order to avoid potential challenges that could arise in analyzing vowels between voiceless consonants since they tend to become devoiced in Japanese (Vance, 2008;Shaw & Kawahara, 2018). however, that not all items were always produced as intended. Thus, these numbers reflect the maximum possible number of tokens per speaker.

Procedure
Stimuli were randomized and presented to each participant using E-prime software (Schneider, Eschman, & Zuccolotto, 2012). Each pseudo-word (e.g., aguba) was represented in Roman orthography (Hepburn system) and appeared in the following carrier sentence, presented in Japanese characters, e.g., Kore mo aguba desu "This is aguba, too" (Figure 1). The carrier sentence was in Japanese to help encourage participants to produce the stimuli using their Japanese phonology. A Tascam HD-P2 audio recorder with 44,100 samples/s, 16 bit/s, and a Beyer dynamic head-mounted microphone were used for recording, with speakers recorded individually in a sound-attenuated room at the University of Canterbury. Note that we chose to present the stimuli orthographically rather than auditorily to avoid any additional interference from perception and to reduce bias on the part of the participants toward any particular epenthetic vowel (or lack thereof) based on clues from the stimulus. Smith (2006), for example, shows that Japanese loanwords may in fact have 'doublet' adaptations, one with epenthesis and one with deletion (e.g., 'Hepburn' as either [hep.pɯ.baːɴ] or [he.boɴ]), and argues that the deletion cases likely arise from perceptual factors while the epenthesis cases are more likely influenced by orthographic factors. Given our interest in epenthesis here, orthographic stimuli seemed preferable, though we acknowledge that this choice can have consequences for the results of loanword studies (e.g., Vendelin & Peperkamp, 2006). We discuss this matter further in Section 4.
The procedure was described to participants in Japanese. After seeing a stimulus on the computer screen, they were asked to say the whole sentence, including pronouncing the stimulus item as if it were a Japanese word. 5 If participants thought that they had misread an item, they were able to pronounce it one more time. The participant then pressed any key on the keyboard to display the next stimulus. Each participant produced a randomized list of 228 items during each session.

Acoustic measurements
As mentioned above, not all of the produced items matched the intended targets. When participants did not produce a phonologically expected vowel or consonant, the token was excluded (decided upon both auditorily and acoustically) (e.g., [ Table 4 summarizes where the discrepancies occurred, broken down by vowel and consonantal context. Recall that for each combination of lexical vowel and C1, there could have been six tokens per speaker, for a maximum of 84 tokens across the 14 speakers. For the epenthetic vowels (symbolized with a plain V in Table 4 and subsequently), there could have been nine tokens per speaker, for a maximum of 126 tokens across the 14 speakers.
As can be seen in Table 4, of the total possible 2184 tokens, the production task resulted in a 1971 recorded lexical and epenthetic vowel tokens that could be analyzed. Note that participants in fact inserted epenthetic vowels between two consonants in 100% of the experimental tokens where they were expected. However, only 463 of the 504 epenthetic stimuli were included, because the accompanying consonants were not always produced as expected (e.g., [abda] misread as [adVda]). Figure 2 illustrates examples of both a lexical (2a) and an epenthetic (2b) vowel in the context [ad_ba] as produced by participant M2; in both cases, the vowel quality was [ɯ]. The total word duration for the word with lexical [ɯ] is 0.33 s, while that for the word with epenthetic [ɯ] is 0.31 s.
The duration of all words and vowel tokens was measured, and the values for F1, F2, and F3 were extracted using Praat (Boersma & Weenink, 2014). Formant measurements were taken at the midpoint of the relevant vowel. While the focus of the analysis is on the quality of epenthetic vowels in [aCCa] sequences, vowels from the control pseudowords were used for comparison, e.g., [aCVCa]. Specifically, the quality of a given epenthetic vowel (V) was determined by comparing it acoustically to the baseline vowels produced by each speaker. All vowel plots given below show normalized mean F1 and F2 values, with data ellipses enclosing 95% of the data for each lexical vowel [i, e, ɯ, o, a] and the epenthetic vowel (V), using the stat_ellipse() function in the ggplot2 package (Wickham, 2016) of R (R Core Team, 2017), which in turn is based on the dataEllipse() function in the car package (Fox & Weisberg, 2011). The formant values were z-score normalized using the Lobanov normalization procedure in NORM (Thomas & Kendall, 2007) to remove overall effects of speaker sex. All original formant values were measured in Hz.

Statistical analyses
All data were analyzed using R (R Core Team, 2017). Linear mixed-effects models were created using the lme4 (Bates, Maechler, Bolker, & Walker, 2015) and lmerTest (Kuznetsova, Brockhoff, & Christensen, 2017) packages, in which the normalized F1 value, the normalized F2 value, or the vowel duration was predicted from vowel quality for each of the preceding consonantal contexts, across separate analyses. The random intercepts in the analyses were Speaker and Word, and the fixed effect was Vowel. In all cases, the question of interest is whether the epenthetic vowel is similar to any particular lexical vowel; hence, the epenthetic vowel was always set to be the baseline value.

Characteristics of the control (lexical) vowels
We start by considering the acoustic characteristics of the vowels in the control condition, to establish a baseline of the vowel characteristics against which the epenthetic vowels can be compared. First, we consider vowel duration.  (Campbell, 1992;Han, 1962, cited in Shoji & Shoji, 2014  Next, we consider the quality of each vowel. Figure 3 shows the overall normalized F1/F2 spaces for each lexical vowel across all 14 speakers and all four consonantal contexts, which on the whole are fairly well separated. It can be seen that  [a] are similar in backness. Note that both of these latter vowels are actually more central than back. An articulatory study (Nogita, Yamane, & Bird, 2013) reported that the vowel conventionally described as the high back vowel in Japanese is, in fact, a rounded high central vowel [ʉ] in younger speakers. Participants in the current study were mostly under 35 years old, except for two participants in their 40s. Although the vowel [ɯ] is phonetically central, for readers' convenience, we use [ɯ] for this vowel, following the usual convention.  higher than that of the epenthetic vowel (i.e., to be significantly lower vowels), while [i] is predicted to have a statistically significant F1 value that is lower than that of the epenthetic vowel (i.e., to be significantly higher). For F2 (

F1 and F2 Analyses
As is the case in the labial context, we expect from prior descriptions that the quality of the epenthetic vowel in the velar context [ɡ] will be similar to that of the lexical vowel [ɯ], and once again, the current results largely corroborate that expectation. Figure 6 shows the overall vowel space for the lexical vowels from [ɡVC]-forms and epenthetic vowels from      As can be seen in Figure 8a, the vowel space of the epenthetic vowel overlaps substantially with both the high vowel [ɯ] and the mid vowel [o], indicating that there may be variability as to which vowel is epenthesized, a result that is strikingly different from that seen in the labial and velar contexts.
A closer look at the results reveals that much of this variability can be attributed to variability across speakers (see Figure 8b). Only two speakers (M1 and M3) had epenthetic vowels that clearly matched the expected pattern of being [o] (see Figure 9a). Another seven (F1, F4, F9, F10, F13, F15, and M2) had a clear [ɯ] in this context instead (see Figure 9b). Finally, five speakers (F5, F7, F12, F16, and M4) had epenthetic vowels that either alternated between the two categories or spanned both categories. 8  . All tokens were checked manually by the first author, who is a native speaker of Japanese, and any that were questionable were verified by the third author, who is a native speaker of English. suggesting that the following context may be influencing the quality of the epenthetic vowel as well. This division of the speakers into subsets was based on both visual inspection of their individual vowel plots and statistical analyses of each individual speaker (the complete results of which are shown in Appendix B). That said, we acknowledge that the amount of  intra-speaker replication is quite small, and the experiment was not designed to examine this kind of individual variability (see also discussion in Senn, 2014); we are not trying to make strong claims about the relative strength or frequency of any of these sub-patterns. Instead, the key point here is that the alveolar-stop context resulted in a considerable degree of both intra-and inter-speaker variability in a way that is quite different from that of the labial and velar contexts discussed above.

Duration analyses
We now turn to an examination of the durations of the epenthetic vowel in the alveolarstop context. For the two sub-groups of participants that seemed to consistently produce an epenthetic vowel quality similar to one of the lexical vowels, the duration results are also consistent with their producing that same lexical vowel. In particular, it is notable that the epenthetic vowel is not simply similar to the shortest vowel (regardless of quality), but rather seems to match the duration of the lexical vowel that it is similar in quality to. That is, for the two speakers for whom the epenthetic vowel is similar in quality to [o]   and Table 12a). For the seven speakers for whom the epenthetic vowel is similar in quality to [ɯ], the duration is also short and consistent with both [i] and [ɯ], although also not quite statistically significantly different from [o] (see Figure 10b and Table 12b).

F1 and F2 analyses
Finally, we turn to the palatal context, [dʑ]. Recall that the expected epenthetic vowel based on earlier studies would be similar to the lexical vowel [i]. However, this was not found to be the case for the majority of speakers in this study. In fact, only two speakers produced a vowel similar to [i], while four produced epenthetic vowels similar to [ɯ], and for the others there was a great deal of variability. As above, we describe the results in terms of two factors, formant frequency and vowel duration. Vowel quality results are described in terms of observable trends in the data; they were not analyzed statistically since the dataset for each pattern is small. Statistical results for each individual speaker in this context are given in Appendix C. Figure 11a shows the individual epenthetic and lexical vowels across all speakers in the palatal context. As can be seen, the epenthetic vowels are broadly distributed across the entire vowel space, overlapping with each of the other five vowel categories. 9 Based on a  Figure 11b and the statistical analyses in Appendix C, speakers generally fall into one of three groups, as discussed just below: (a) epenthetic-[i]; (b) epenthetic-[ɯ]; or (c) variable.

combination of visual inspection of the individual plots in
Of the 14 speakers, two (F7, F10) appeared to consistently produce an epenthetic [i] in this context, the vowel observed after palatals in prior studies. As shown in Figure 12, the acoustic space of the epenthetic vowel for these two speakers completely overlaps with that of lexical vowel [i].   Four speakers (F1, F13, F16, M2) consistently produced an epenthetic vowel that is most consistent with their lexical [ɯ] in this context, as shown in Figure 13. Although the epenthetic vowel ellipsis extends slightly into the region of [i] and [e], it mostly overlaps with the acoustic space of [ɯ]. Notably, the lexical [ɯ] vowel in this context appears to be fronted as compared to its production in the other contexts, such that it also overlaps with lexical [i] and [e].
Finally, the remaining eight speakers (F4, F5, F9, F12, F15, M1, M3, M4) were not sorted into either of the two groups above, since each of these speakers produced epenthetic vowels of multiple different qualities. Statistical analysis did not support that the speakers produced any specific vowel in the palatal context. Vowel quality results are described in terms of observable trends in the data in Figure 11 and are summarized in Table 13. Again, these results are not meant to be definitive descriptions of what these speakers might always do, but instead are intended to highlight the wide range of variability in this context, including the use of [a] and [e].

Duration analyses
Turning to duration, the vowel durations for the epenthetic vowel in the palatal context overall were also quite variable-epenthetic vowels were as short and as long as the shortest and longest lexical vowels, which isn't surprising given that there were tokens of the epenthetic vowel in this context that matched each possible lexical vowel quality. Figure 14 provides boxplots and Table 14 provides a summary of vowel durations in the palatal context for the two groups of speakers who were relatively consistent in their productions in this context, i.e., the two speakers who produced [i] and the four speakers who produced [ɯ]. These groups of participants were also more consistent with their vowel durations; both groups produced consistently short vowels, similar in duration to the lexical vowel they seemed to be producing.
For speakers who appear to use the vowel [i] as epenthetic after the palatal [dʑ], the plot in Figure 14(a) and For speakers who appear to use the vowel [ɯ] as epenthetic after the palatal [dʑ], the plot in Figure 14(b) and Table 14(b) show that the shortest vowel of this group is the In summary, for the two groups that produced a single epenthetic vowel quality in the palatal context, their epenthetic vowel was consistently short, and not significantly different in either case from the duration of the lexical vowel they seemed to use. In both cases, though, this meant that the epenthetic vowel was not significantly different from either [i] or [ɯ].

Discussion
The current study was designed to directly test the nature of epenthesis as a strategy to break up unfamiliar consonant clusters in Japanese using orthographic input. The experiment revealed two primary results of interest. First, the production of consonant clusters consistently yielded epenthetic vowels when [aCCa] pseudo-word stimuli were presented (all 504 target tokens included epenthetic vowels, although some tokens were not included in the analysis because the accompanying consonants were misread). This suggests that the general Japanese phonotactic restriction against consonant clusters influences the production of such clusters, as expected. Second, the results are only partially consistent with previous studies in terms of which epenthetic vowel is used. The baseline hypothesis would be that the quality of epenthetic vowels would be similar to patterns found in lexicalized loanword phonology, i.e., [ɯ] after labials and velars, [o] after alveolar stops, and [i] after palatals. After labial and velar consonants, the present study did indeed find that the quality of the epenthetic vowel was quite consistently [ɯ], as has been found in studies of Japanese loanword adaptation (Hirayama, 2003;Kubozono, 2015;Yazawa et al., 2015). However, the current results diverge from those predicted from loanword studies and those found in Yazawa et al. (2015) in the alveolar and palatal contexts. We observed that while some speakers produced [o] in the alveolar context and [i] in the palatal context, such speakers were actually in the minority. Instead, the epenthetic vowel [ɯ] was commonly used in each of these contexts, although there was also a great degree of variability among individuals in the type of vowel that was inserted, as we discuss further below. The epenthetic vowels produced by each speaker in each context are summarized in Table 15. Note that only one speaker, F7, was close to producing all and only the expected vowel qualities in each context, and even she produced unexpected tokens of [ɯ] in the alveolar context. Interestingly, F7 is also the oldest speaker among the participants (age = 46), which might be suggestive that older speakers tend to have a more conservative pattern. However, the next oldest speaker, M2 (age = 40), consistently produced [ɯ] in all contexts. In fact, three speakers in the current study (F1, F13, and M2) consistently used [ɯ] as the epenthetic vowel in all contexts; F1 and F13 were among the younger speakers overall (ages 23 and 22, respectively). The remaining eleven speakers had varying patterns. As above, we present this summary not to put too much weight on the 'pattern' produced by any given speaker, but rather to showcase the striking contrasts between (1) the regularity of the expected epenthetic [ɯ] in the labial and velar contexts, (2) the tendency for both expected [o] and unexpected [ɯ] to be used in the alveolar context, and (3) the striking irregularity in the palatal context (though note that even in this highly variable context, 10 of the 14 speakers did use at least some instances of [ɯ]).
Recall from Section 1 that other recent studies have also investigated these epenthesis patterns, and the conflicting results across these prior studies were in part what motivated the current study. Interestingly, the current results do not match any of those other results, although there are some similarities. Yazawa et al. (2015) conducted a production study where the expected tri-partite pattern was in fact largely found. One difference between that study and the current one is that Yazawa et al. used a text-reading task in which participants read the Aesop fable "The North Wind and the Sun" in English, whereas pseudo-words were used in the current study. Many of the contexts in this passage where epenthesis would be expected were word final rather than word medial, as in the current study, and it is possible that the epenthesis patterns are simply not the same across these two kinds of contexts.
Additionally, the Yazawa et al. (2015) participants (who were all English learners) were likely in 'English mode,' given the task, whereas the current participants were specifically instructed to produce the words as if they were Japanese. A priori, this difference might be expected to have biased the results in the opposite direction-that is, one might think that Japanese nonce words would be more likely to follow the traditional tri-partite pattern than English words being produced in English. That said, there are at least two possible reasons for the actual results. One would be that the specific English words that were produced with epenthetic vowels might have lexicalized loanword counterparts that happen to follow the more traditional pattern. Alternatively, the epenthesis in the Yazawa et al. study might in some sense have been more 'naturalistic,' in that it happened while participants were reading out a full passage, with their attention presumably directed toward fluent English production more generally. In the current study, on the other hand, attention was focused on illicit consonant clusters in individual nonce words. It is possible that in the more naturalistic setting, epenthesis patterns similar to those in lexicalized Japanese loanwords were more likely to arise spontaneously, while in the more targeted setting, participants were more likely to apply meta-awareness of some sort (e.g., trying to be consistent across their productions of multiple nonce words with different consonantal contexts). It is also potentially important to note that, while most of the Yazawa et al. (2015) participants "produced at least one epenthetic vowel," there were only 518 total epenthetic tokens in their data, despite having 71 participants each reading a passage with more than 60 likely opportunities for epenthesis to occur; that is, epenthesis occurred in only around 12% of the places where it might have, suggesting that their participants were very much English-like in their productions. The current study had approximately the same number of actual tokens of epenthetic vowels (463), but these were concentrated in the productions of only 14 speakers, and our participants did in fact epenthesize in 100% of the expected contexts. Thus, the apparent conformity of the tripartite pattern in Yazawa et al. is diluted across a wide set of speakers and contexts. Of the four contexts tested here, all fourteen speakers produced the expected vowel at least some of the time in the labial and velar contexts, and more than half of them produced the expected vowel at least some of the time in the alveolar and palatal contexts. If the rate of epenthesis had been lower in the current study, it is possible that the 'aberrant' instances would have been more sparsely represented, and the epenthesis patterns have looked more as expected. Shoji and Shoji (2014) also conducted a production study whose results do not match the Yazawa et al. (2015) ones, but their results only partially line up with the current ones. Recall that the production task in Shoji and Shoji was orthographically based: Their participants were provided with nonsense words written in Latin script (e.g., consuch, zod, bkautu) and had to re-write them in Japanese characters. Interestingly, they did find that the tri-partite pattern held for the vast majority of their word-final epenthetic contexts, but that it broke down in the word-initial contexts (again suggesting that epenthesis patterns depend at least partially on word position). Of course, in the current study, the consonant clusters targeted by epenthesis were word-medial, making it harder to directly compare the results. That said, in both the palatal and the alveolar stop word-initial contexts, Shoji and Shoji did find, as do we here, that [ɯ] is an extremely common epenthetic vowel, being used around 25% of the time in each context (more than any other vowel except the traditionally expected ones, which occurred 35-45% of the time). Thus, there is at least some converging evidence that Japanese speakers treat [ɯ] as a good candidate epenthetic vowel in the alveolar-stop context in nonce word production.
In terms of perception, both Monahan et al. (2009) and Mattingley et al. (2015) used VCCV stimuli, analogous to those used in the current study. It should be noted that the VCCV stimuli in both studies contained stop release bursts. The presence of the stop release has been known to influence speech perception and facilitate non-native speakers to perceive illusory vowels (e.g., Daland, Oh, & Davidson, 2019). Monahan et al. tested only alveolar-stop and velar contexts, using an AX discrimination task. For the velar contexts, they did indeed find that Japanese listeners seemed to perceive an illusory epenthetic vowel, specifically, [ɯ], as would be expected. But they found that in the alveolar-stop contexts, neither [o] (as would be expected if perception mirrored traditional production) nor [ɯ] (as would be expected if the illusory vowel were always the 'default' vowel) was perceived as an illusory vowel. Instead, Japanese listeners seem to have perceived these sequences faithfully as VCCV sequences, on par with English listeners. Mattingley et al., on the other hand, did test the same full set of contexts as the current study, using an identification task. They found that in the labial, velar, and palatal contexts, an illusory vowel was heard the majority of the time (though that varied from 70% for the labial and velar contexts to 98% for the palatal context), and that the illusory vowel usually matched the traditional prediction (i.e., [ɯ] for labial and velar and [i] for palatal). In the alveolarstop context, they found that 60% of tokens were identified with an illusory epenthetic vowel (echoing Monahan et al.'s finding that illusory vowels are somewhat less likely in this context), but that within that 60%, the majority (70%) were [ɯ] and only 16% were [o]. Thus, again, at least for the alveolar-stop context, there seems to be evidence from perception as well that [ɯ] is a viable epenthetic vowel.
The current study was intended to describe what the production patterns are, rather than to try to explain why the patterns might differ from those traditionally reported. Especially given the varying results across the various recent production and perception studies, it would be premature to try to come up with an explanation for any given set of results. That said, the independent observations that the typical phonotactic constraints in loanword adaptation are loosening (e.g., Pintér, 2008Pintér, , 2015Hall, 2013;Kubozono, 2015) suggest at least one pathway of change. Specifically, if there is no longer a phonotactic constraint against the sequence [dɯ] (at least in loanwords), then there is no reason not to use the default epenthetic vowel [ɯ] in this context just as in the labial and velar contexts, exactly as seen in the current results.
Interestingly, even in the older and more conservative stages of Modern Japanese, there has not been a phonotactic restriction against [dʑɯ] sequences; the only historical phonotactic restriction after the palatals was that [e] could not occur. The ostensible reason for the use of epenthetic [i] after palatals simply comes from an articulatory or perceptual closeness between the palatals and [i] (Kubozono, 2015). 10 But, as Kubozono (2015, p. 330) points out, "[t]his raises the interesting question of why the palatoalveolar fricative [ɕ] 11 usually takes /ɯ/ 12 rather than /i/." Perhaps, then, the use of [ɯ] in the other three contexts has enabled it to be used after palatal affricates as well. This possibility would not, of course, explain the huge amount of intra-and inter-speaker variability in terms of other vowels used epenthetically in this context (such as [e] and [a]), but we are somewhat reluctant to theorize too much about their use given that other studies have not found similar trends.
That said, one potential explanation for the variety in the palatal context comes from the influence of English orthography; see, e.g., Vendelin and Peperkamp (2006) for general discussion. In English orthography, the letter <j> is most often followed by <u> (247 words in the IPHOD corpus, Vaden, Halpin, & Hickok, 2009); next by <a>, <o>, or <e> (around 180 words each in the IPHOD corpus), and least often by <i> (only 46 words in the IPHOD corpus). Given that (1) the current participants were in an English-speaking country at the time of participation, (2) the target words were written using Romanization instead of Japanese characters, and (3) most of the participants showed at least some influence of English orthography by misproducing at least some of the <ge> or <gi> sequences as starting with [dʑ], it is not outside the realm of possibility that they could have chosen an epenthetic vowel in the <j> context based on what they thought was likely given English orthography. Interestingly, however, in terms of English phonology, [dʒ] is most often followed by [i] or [ɪ] (613 words in the IPHOD corpus), 13 followed by [e] or [ɛ] (304 words), or another vowel (fewer than 100 words each). And, the current participants did certainly produce the orthographic <j> as [dʑ] (indeed, it was the most accurately produced of the consonantal contexts, with 94% of possible tokens being produced with the correct consonant; the next most accurate was the labial context, with 90%). Furthermore, there's no particular evidence that the current participants relied on English orthographic frequency patterns for any of the other contexts; in the IPHOD corpus, <b> is most frequently followed by <a>, while both <d> and <g> are most frequently followed by <e> (and in fact all three are least often followed by <u>), so if English orthography is playing a role here, it is unclear as to why it would do so only for <j>. Other potential explanations for the unusual behaviour in the palatal context (the following consonant, priming from adjacent stimuli, the age or gender of speakers, etc.) would similarly be limited in their ability to uniquely predict variability in this context, because such factors were consistent across all contexts.
Another possibility to consider is that the apparent epenthetic vowels that are seen in the current study might in fact not be truly epenthetic but rather the result of gestural mistiming between the production of the first and second consonant in the sequence. 14 If this were the case, then it would make sense that the quality of many of the vowels across the contexts was similar, as it would not be phonologically governed, but rather based on articulatory facts. Davidson (2010) considers the difference between full epenthetic vowels and transitional vocoids in cases where a schwa was inserted between two consonants word-initially by English and Catalan speakers. To diagnose the difference between these scenarios, she examines duration, F1, and F2. A transitional vocoid would likely be shorter than a lexical vowel, have a lower F1 value (because it would be produced not with a tongue height target but rather as a temporary lowering between the surrounding stop closures), and have a lower F2 value in this context (because it would be likely to anticipate the upcoming [a] "even more than for normal vowel-to-vowel coarticulation," as the [a] would be the next upcoming vowel target; Davidson, 2010, p. 283). As noted in the above results sections, however, none of these characteristics were found. That is, the epenthetic vowels were never significantly different from the lexical vowel they were most similar to in any of these three dimensions. Thus, we believe that the epenthetic vowels found in our tokens are indeed inserted vowels, with accompanying vowel targets, and not simply insufficient attempts to produce CC clusters.
In terms of the larger implications of this work, while the task in the current paper was a non-word production task intended to provide insight into how illicit consonant clusters might be adapted by native Japanese speakers in a loanword situation, the stimuli in the experiment were not themselves loanwords, and so we cannot claim that loanword adaptation patterns are changing. That is, our stimuli were not associated with meaning and did not originate from some foreign source that might be known to the participants. Thus, some factors that likely influence the way that loanwords are adapted were absent from the current study-in particular, participants did not have acoustic models of the words, nor did they have any knowledge of ways in which they might be related to other related loanwords, and there was no larger social context that might influence the adaptation. This is why we cannot claim that loanwords would necessarily follow the same trends as those seen here, but we can speculate that loanword adaptation may be undergoing changes such that loanwords with medial CC clusters in which the first member is a (voiced) alveolar stop or a palatal may no longer follow the clear tri-partite pattern traditionally reported in the literature. Supporting this speculation is the fact that, as was noted in Section 1, Japanese loanwords and nonce words have been shown to follow similar phonological patterns (Kawahara, 2012

Conclusion
The current paper reports on a production experiment that directly tests the nature of epenthesis as a strategy to break up unfamiliar consonant clusters in Japanese. All fourteen of the speakers in the experiment consistently produced an epenthetic vowel in VCCV sequences, following the expected phonotactic patterns of syllable structure in Japanese. The choice of epenthetic vowel, however, was not always as expected. The expected vowel [ɯ] was used in the labial and velar contexts, but also by many speakers in many tokens in both the alveolar-stop and the palatal contexts, where [o] and [i] would have been expected, respectively. Although a full explanation for the variability of the results is beyond the scope of the paper, the results do suggest that the independent loosening of the phonotactic constraint against [dɯ] sequences may be affecting epenthesis strategies typically assumed to be governed by this constraint. We thus predict that in new loanwords with consonant clusters (at least, word-medial clusters with voiced obstruents), we may increasingly see [ɯ] being used as an across-the-board epenthetic vowel, rather than the quality continuing to be governed by preceding context.