Nasal place assimilation and the perceptibility of place contrasts

Abstract Typological studies of place assimilation show that nasal consonants are more likely to assimilate in place than oral stops (Cho, 1990; Jun, 1995, 2004; Mohanan, 1993). Jun (1995, 2004) argues that this typological asymmetry derives from a difference in the perceptibility of the place contrasts in nasal consonants and in oral stops. Since the place contrasts in nasals are perceptually weaker than the place contrasts in oral stops, speakers are more willing to neutralize the former. However, the previous phonetic and psycholinguistic experiments do not provide unambiguous evidence for the weaker perceptibility of the place contrasts in nasal consonants (Hura et al., 1992; Mohr & Wang, 1968; Pols, 1983; Winters, 2002). To offer additional experimental findings bearing on this debate, this paper reports two similarity judgment experiments and two identification experiments in noise, which all show the lower perceptibility of the place contrasts in nasal consonants in coda. The results are compatible with— and thus can lend support to—Jun’s (1995, 2004) idea that the asymmetry in place assimilation may result from a difference in the perceptibility of place contrasts.

1 Introduction 1.1 The issue-why do nasals assimilate in place?
The word-final nasals in (1) assimilate to the following consonant in place; oral stops in (2) Ohala (1990) (and also Blevins 2004Blevins , 2006Yu 2004 among others) for a related view. Although he emphasizes the role of perceptibility in shaping phonological patterns, in his model, the way in which perception affects phonological patterns is through misperception by listeners, rather than deliberate control of speakers. This paper does not address this general alternative. See Hayes & Steriade (2004), Hura et al. (1992), Martin & Peperkamp (2011) and Steriade (2001) for relevant discussion. See also Boersma (2008) for a proposal that derives the effect of neutralization of less perceptible contrasts as an emergent property of a learning algorithm. This paper focuses on investigating the perceptibility difference of the place contrasts between oral stops and nasals; we do not commit ourselves to any particular theoretical implementation of how to incorporate this perceptibility difference into a phonological grammar. Our choice of using a speaker-oriented description-e.g. "speakers possessing knowledge of perceptibility effects"-should thus be tentative.
Likewise in Hindi, all nasals within a morpheme must be homorganic to the following stop, as in (3), whereas oral stops do not obey this restriction, as in (4) (Jun, 1995;Ohala, 1975Ohala, , 1983b. [gupta] 'last name' Jun (1995Jun ( , 2004 argues that the asymmetry between nasals and oral stops comes from the perceptibility of the place contrasts in nasals and oral stops. He argues that the place contrasts in nasals are less perceptible than those in oral stops, and that speakers are thus more willing to neutralize a place contrast in nasals than in oral stops. In other words, nasal place assimilation is "perceptually more tolerable" than oral consonant place assimilation, because the former involves less of a perceptual change. This claim has been echoed by several researchers. Boersma (1998) suggests that "[m]easurements of the spectra...agree with confusion experiments (for Dutch : Pols 1983), and with everyday experience, on the fact that [m] and [n] are acoustically very similar, and [p] and [t] are farther apart. Thus, place information is less distinctive for nasals than it is for plosives (p. 206)" (see also Boersma 2008). Ohala & Ohala (1993) likewise maintain that "[nasal consonants'] place cues are less salient than those for comparable obstruents" (pp. 241-242). Beddor & Evans-Romaine (1995) suggest that "[a]n acoustic-perceptual account of nasal place assimilation might argue that place distinctions are perceptually less salient for nasal consonants than for oral stops" (p.147) and conclude that "place of articulation in syllable-final nasals is not perceptually robust" (p. 164). See also Martin & Peperkamp (2011) for general discussion of this view; for studies on acoustic and perceptual characteristics of nasal place contrasts, see Beddor & Evans-Romaine (1995), Fujimura (1962), Kurowski & Blumstein (1984), Kurowski & Blumstein (1993), Malécot (1956), Narayan (2008), Repp (1986), and references cited therein.

Disagreements in previous studies
A question then is whether nasal place contrasts are indeed less perceptible than oral place contrasts. However, the evidence for the lower perceptibility of the place contrasts in nasal consonants in previous phonetic and psycholinguistic studies is mixed.
A similarity judgement task by Mohr & Wang (1968) showed that English speakers judge nasal minimal pairs as more similar to each other than oral consonant minimal pairs. However, in their stimuli, nasal pairs were placed in coda, whereas oral consonant pairs were placed in onset. Since we know independently that place contrasts are generally more perceptible in prevocalic position than in postvocalic position (Benkí, 2003;Fujimura et al., 1978;Jun, 1995Jun, , 2004Ohala, 1990;Steriade, 2001), this result should be taken with caution. Kawahara (2009)  ), and asked them in a forcedchoice format which pair involved more similar sounds. The result shows that the nasal minimal pair was judged to be more similar than oral consonant minimal pairs. However, this study is based on orthography, and the perceptibility of the place contrasts was tested in onset position, while consonants that undergo assimilation are usually placed in coda position (Beckman, 1998;Jun, 1995Jun, , 2004McCarthy, 2011;Ohala, 1990). Pols (1983) showed that Dutch speakers perceive the place contrasts more accurately in oral stops than in nasal consonants under different noise conditions, while controlling for the position within words (including word-final position). Hura et al. (1992) performed an identification experiment of various word-final consonants-nasals, voiceless oral stops, and fricatives-in preconsonantal position. They found that nasals showed the highest confusion rate in terms of place, stops next, and fricatives least. Statistically speaking, the difference between nasals and obstruents was significant, but the difference between nasals and oral stops did not reach significance.
Indirect evidence for the lower perceptibility of the place contrasts in nasals has also been presented from the analyses of verbal art, such as rhyming and imperfect puns. It has been known that speakers can pair two non-identical sounds in rhyming (a pattern known as half rhymes) and imperfect puns. When they do so, they prefer to pair two similar sounds (Holtman, 1996;Steriade, 2003;Zwicky, 1976;Zwicky & Zwicky, 1986). Studies of Japanese hip hop rhymes (Kawahara, 2007) and imperfect puns (Kawahara & Shinohara, 2009) show that Japanese speakers are more willing to match nasal consonant pairs than oral consonant pairs. These comparisons in the Japanese data, however, are based on onset position, not in coda position. Nasal pairs are also commonly found in English rock lyrics (Zwicky, 1976) and English imperfect puns (Zwicky & Zwicky, 1986), which appear in coda position (e.g. mine vs. tryin'). However, no statistical comparisons are made between the frequencies of nasal pairs and those of oral consonant pairs.
To summarize, the studies reviewed so far provide (indirect) evidence that place contrasts are less perceptible in nasals than in oral stops.
On the other hand, there are also a few studies that fail to support the claim that nasal place contrasts are less perceptible than oral stop place contrasts. The second similarity judgment experiment reported in Kawahara (2009), which used auditory stimuli, did not show that nasal place contrasts are less perceptible. However, this study presented the stimulus pairs only once, and therefore conclusions based on these results remain speculative. Winters (2002) points out that the results of Hura et al. (1992) do point to the right direction, but emphasizes that the difference between nasals and oral stops did not reach statistical significance. He furthermore cites other studies (Singh & Black, 1966;Wang & Fillmore, 1961) that failed to support the weaker percetibility of the place contrasts in nasal consonants. Winters's (2002) own experiments-identification experiments in four listening environments (comfortable listening level, in noise (6dB S/N-ratio and -6dB S/N-ratio) and speech reception threshold (at about 40dB))-did not reveal a difference between nasals and oral stops in terms of the saliency of the place contrasts. The results in fact showed the evidence for higher saliency of oral stops' place contrasts in the speech reception threshold condition, but also showed the evidence for the opposite pattern in the other three conditions.
To summarize, it is not clear from the previous experiments that nasal place contrasts are indeed less perceptible than oral consonant place contrasts, especially in coda. This study offers new pieces of information bearing on disagreement among the previous studies reviewed above.
To summarize the research questions, they are (i) do we find a perceptibility difference in place contrasts between nasals and oral stops at all? and (ii) if so, in what environments, and under what conditions? To address these questions, this paper reports two similarity judgment experiments and two identification experiments in noise. The first two similarity judgment experiments test the perceptibility of place contrasts in clear listening environments; Experiment I uses tokens with released stops, and Experiment II uses tokens with significantly weakened releases. The next two experiments are identification experiments in a noisy condition; Experiment III tests the perceptibility of place contrasts in word-final position, and Experiment IV tests it in pre-consonantal position. All experiments support the hypothesis that the place contrasts are less perceptible in nasals than in oral stops. Although the general debate about the perceptibility of place contrasts in nasals and oral stops needs to be further studied, our results offer a substantial piece of information bearing on this debate.

Experiment I: Similarity judgment experiment
The first experiment was a similarity judgment study, most directly building on an experimental paradigm used by Mohr & Wang (1968). This study builds to a lesser extent on Greenberg & Jenkins (1964) who compared only voiced stops and voiceless stops (see also Babel & Johnson 2010, Fleischhacker 2001, Huang 2004, Huang & Johnson 2010, Kato et al. 1997 among others for studies using this paradigm to investigate knowledge of perceived similarity). In this experiment, native English listeners were presented with pairs of sounds minimally different in place, and were asked to judge the perceived similarity between the two sounds. The experiment used naturally-produced-but acoustically edited-stimuli. The experiment built upon the previous studies reviewed in section 1.2, but controlled factors that may affect similarity ratings: (i) all the stimuli were placed in post-vocalic position, and (ii) amplitude and pitch were made uniform across the stimuli.

Stimuli
The three conditions were nasals, voiced stops, and voiceless stops. For each condition, all three combinatorial possibilities of different places were included (i.e. labial vs. coronal, labial vs. dorsal, coronal vs. dorsal). All the stimuli were mono-syllabic and had initial vowel [A]. The target consonants were all placed in coda because place assimilation usually occurs in coda position (Beckman, 1998;Jun, 1995Jun, , 2004McCarthy, 2011;Ohala, 1990). Thus our stimuli consisted of

Recording and acoustic editing
Two female native speakers of English (both from New Jersey) each produced all the stimuli in a sound attenuated booth. One speaker was the second author of this paper. Their speech was recorded through an AT4040 Cardioid Capacitor microphone with a pop filter and amplified through an ART TubeMP microphone pre-amplifier (JVC RX 554V), digitized at 44K with 16 bit quantization level. The stimuli were placed in a frame sentence: "Please say the word X three times." To avoid flapping and reduction of word-final consonants, both speakers released all the word-final consonants. The speakers repeated each token 10 times. Some illustrative spectrograms are shown in Figure 1. The target stimuli were extracted from the frame sentence at zero crossings using Praat (Boersma & Weenink, 1999-2014. To avoid similarity ratings being affected by non-relevant phonetic factors such as differences in amplitude or pitch, the stimuli were re-synthesized with a flat pitch contour at 250Hz and with the peak amplitude of 0.7. Out of 10 repetitions, those that had phonetic distortions (e.g. clipping, heavy creakiness, unintended vowel qualities, nasal bursts) were excluded. After that, four tokens from each speaker were chosen as the stimuli for the listening experiment. Pairs of sounds were created by concatenating two sounds with 500ms silence inter-

Procedure
In this experiment, one pair of sounds was presented to our listeners per trial without any orthographic representations of the stimuli. The participants were asked to judge the similarity of each pair using a 5-point-scale: 1. "almost identical", 2. "very similar", 3. "similar", 4. "not so similar", 5. "completely different". Superlab (ver 4.0, Cedrus) on Macintosh computers was used to present the visual and sound stimuli and to record responses. All the participants wore high quality headphones (Sennheiser HD 280 Pro), and registered their responses using an RB-730 response box (Cedrus). The experiment took place in a sound attenuated room.
The experiment started with a practice block with 20 pairs in order for the participants to establish their subjective scale of similarity. These stimuli were unique to the practice block.
An experimenter stayed in the listening room during the practice session so that the participants could ask questions after the practice session is over. The main session was organized into two blocks, with a break in-between, each block presenting tokens from one speaker. We blocked the experiments by speaker so that the listeners would not be distracted by individual speech style differences. All pairs of sounds were repeated seven times. Hence for each phonological pair, the listeners judged their similarity 56 times (7 repetitions * 4 tokens * 2 speakers). Superlab randomized the orders of the stimuli within each block.

Participants
Twenty-one undergraduate students completed this experiment, but the data from two speakers were not analyzed because they were not native speakers of English. All the participants received extra credit for linguistics courses.
One may argue that English listeners may not be appropriate for this experiment, as English has a prefix that exhibits nasal place assimilation (i.e. in-). This alternation in English may make the place contrasts in nasals less distinct, because alternation between two sounds may arguably shrink the perceptual distance between the two (e.g. Hume & Johnson 2003, Huang & Johnson 2010; though see also Steriade 2003). However, using English listeners may not be problematic for three reasons. First, prefixal nasal place assimilation is not without exceptions: un-does not undergo place assimilation. Second, the target consonants in the first three experiments are placed in word-final position, and the place contrasts are contrastive in this position for both nasals and oral stops in English. Third, English exhibits assimilation of oral stops across word boundaries as well, as in ba [g] girl 'bad girl' (Ellis & Hardcastle, 2002;Nolan, 1992 Table 1 illustrates the average similarity ratings in Experiment I. First of all, the comparison between the three manners of articulation shows that nasal pairs were judged to be most similar to each other; voiced stop pairs were judged to be more similar than voiceless pairs. A general linear mixed model shows that MANNER had a significant impact on similarity ratings (t = 51.06, p < .001), but PLACE did not (t = −1.42, n.s.). A contrast analysis

Results
comparing nasals and voiced stops shows that MANNER significantly impacted similarity ratings (t = 36.10, p < .001), and so did PLACE (t = −2.15, p < .05). PLACE was perhaps significant because the labial-coronal pair in the nasal condition has a slightly higher rating than the other two place pairs. More importantly, the significant effect of MANNER shows that nasal pairs were rated more similar than voiced stop pairs. Another contrast analysis compared voiced and voiceless stops, and revealed a difference in MANNER (t = 14.68, p < .001.), but not in PLACE (t = −.03, n.s.). Voiced stop pairs were rated more similar than voiceless stop pairs.

Bearing on the place assimilation asymmetry
The results support the hypothesis that the place contrasts are less salient in nasal pairs than in oral stop pairs. This difference in the perceptibility of the place contrasts may be the reason for the place assimilation asymmetry, as suggested by a number of previous researchers (Beddor & Evans-Romaine, 1995;Boersma, 1998Boersma, , 2008Jun, 1995Jun, , 2004Ohala & Ohala, 1993;Steriade, 2001). More generally speaking, this result supports the general principle that speakers are more willing to neutralize less perceptible contrasts (Boersma, 1998;Huang, 2001;Hura et al., 1992;Kawahara, 2006;Kohler, 1990;Lindblom et al., 1995;Steriade, 1997Steriade, , 2001Steriade, , 2008. Winters (2002) raises the hypothesis that "any perceptual differences which exist between nasals and plosives might only emerge under noisy conditions" (p. 12), by comparing previous studies on the perceptibility differences in nasals and oral stops (Hura et al., 1992;Pols, 1983).
However, the results above show that the difference between nasals and oral stops does emerge under clear listening environments as well, at least if we use a similarity rating paradigm.
In addition to the difference between nasals and oral stops, we also obtained a difference in similarity ratings between voiced and voiceless consonants. This observation replicates previous similarity judgement studies (Greenberg & Jenkins, 1964;Mohr & Wang, 1968). This difference is also observed in the combinability of consonants in Japanese rap rhymes (Kawahara, 2007).
Japanese speakers are more willing to pair voiced stops with mismatched place than voiceless stops with mismatched place in creating rap rhymes.
However, phonologically speaking, we do not know of a language in which only voiced consonants assimilate but voiceless consonants do not; e.g. /dg/ → [gg], but /tk/ → [tk]. It is possible that further typological research on place assimilation may find a language that instantiates this pattern. To the extent that this pattern is a true gap, it remains as a puzzle why the perceptibility difference between voiceless stops and voiced stops is not reflected in phonological patterns. 2 Diane Bradley (p.c.) raised the possibility that assimilation of voiced consonants is blocked by an independent reason: since voiced geminates face an aerodynamic problem, many languages avoid them (Hayes & Steriade, 2004;Ohala, 1983a;Westbury, 1979;Westbury & Keating, 1986): it is challenging to maintain sufficient transglottal airpressure drop with long obstruent closure while maintaining glottal airflow to sustain voicing. However, while voiced geminates do suffer from this aerodynamic problem, so should unassimilated voiced obstruent clusters, because speakers would need to maintain voicing during long obstruent closure. Note also that place assimilation does not necessarily result in geminates when the targets and triggers differ in manner (e.g. when triggered by fricatives).
2 Schane (1972) proposes an idea which assumes that the perceptual difference between voiced stops and voiceless stops does shape a phonological pattern. He proposes that coda devoicing occurs to enhance place contrasts in coda: voiceless consonants are favored over voiced consonants because the place difference is more salient for voiceless consonants than for voiced consonants. However, coda devoicing can be construed as a case of neutralization of a phonological contrast that is not well perceptible (Steriade, 1997(Steriade, , 2008).

Place effects
Next, some remarks on the patterns of different place pairs are in order. Phonologically speaking, coronals are more likely to undergo place assimilation than labials and dorsals (Cho, 1990;Jun, 1995Jun, , 2004Kochetov & So, 2007;Paradis & Prunet, 1991). If this asymmetry is due to a difference in perceptibility, then this hypothesis predicts that pairs that involve coronals should be judged to be more similar than the labial-dorsal pair: coronals tend to assimilate because their cues are not highly perceptible (Boersma, 1998(Boersma, , 2008Byrd, 1992;Jun, 1995Jun, , 2004Kochetov & So, 2007).
However, this prediction is not borne out in our experiments: the labial-dorsal pairs were not particularly judged to be dissimilar, compared to pairs involving coronal consonants.
We should also bear in mind, however, that in asymmetries in place assimilation, the directionality matters; e.g. it is more likely for coronals to become dorsals than for dorsals to become coronals. On the other hand, the similarity judgment task in the current experiment is symmetric.
Since the focus of this paper is the differences in the perceptibility of the place contrasts between different manners of consonants, we will set aside the discussion on differences between place of articulation within each manner.

Experiment II: Similarity judgment Experiment II
The next experiment tested whether the similarity judgment patterns observed in Experiment I would hold without clear release bursts. As observed in Figure 1, the tokens in Experiment I were clearly released. The role of release bursts in the perception of place contrasts has been well known (Kochetov & So, 2007;Malécot, 1956;Smits et al., 1996;Stevens & Blumstein, 1978;Tekieli & Cullinan, 1979;Winitz et al., 1972). Some authors argue that released consonants resist assimilation (Jun, 2003;Kohler, 1990;McCarthy, 2011;Padgett, 1995), because release bursts provide such a strong cue to the perception of place distinctions. Hura et al. (1992) as well as Winters (2002) used non-released voiceless stops in testing the perceptibility difference between nasals and oral stops. A question arises whether the similarity judgment pattern we observed in Experiment I still holds without clear release bursts. This experiment was thus designed to investigate whether the similarity differences we observed in Experiment I could be due to the clearly released tokens.

Stimuli
To test whether the perceptual asymmetry between the nasal place contrasts and the oral place contrasts would be observed without release bursts, from the tokens we used in Experiment I we spliced off original releases of voiced and voiceless stops at zero-crossings. Without any bursts, however, the stimuli sounded as if there were no consonants at all. Therefore, we recorded weak releases of one speaker (the second author) for [p, t, k, b, d, g] in the context of [A ]. (The other speaker left the lab by the time we ran this experiment, so only the tokens from the second author were used.) We adjusted the average amplitude of the original tokens to 70dB and that of releases to 40dB and concatenated them. To be conservative-i.e. to be biased against the conclusion that the place contrast is less perceptible for nasals-we retained original, clear nasal releases. We also readjusted the average amplitude of nasal tokens to 70dB. Waveforms and spectrograms of edited tokens are shown in Figure 2. As shown in Figure 2, the new releases of the stops are extremely weak-they were there only to signal the presence of word-final consonants.

Other aspects
The procedure of Experiment II was identical to Experiment I, except for two aspects. One is that we used speech from only one of the speakers for the reason mentioned above. Second, we included both orders between the two elements in a pair (e.  Table 2 shows the average similarity ratings in Experiment II. A general linear mixed model shows that MANNER had a significant impact (t = 30.87, p < .001), but PLACE did not (t = −0.14, n.s.). A contrast analysis comparing nasals and voiced stops shows that nasal pairs were judged to be more similar than voiced stop pairs (t = 13.33, p < .001). PLACE did not turn out to be significant in this analysis (t = −1.63, n.s.). Another contrast analysis compared voiced stops and voiceless stops, and revealed a difference in terms of MANNER (t = 14.31, p < .001.), but not in terms of PLACE ( t = −0.34, n.s.). i.e. they had advantage in conveying place contrasts, but they were nevertheless judged to be most similar. As with Experiment I, the results support the hypothesis that place of articulation is less perceptually salient in nasals than in oral stops (Jun, 1995(Jun, , 2004). This perceptual difference holds even when nasals retain their clear releases and oral stops have only very weak releases.

Experiment III: Identification experiment in noise
The third experiment aimed to verify the perceptibility differences observed in the previous two experiments with an identification task in noise. Hura et al. (1992) ran their identification experiment in a clear listening environment and obtained only 5.2% of misidentification. This low percentage of misidentification may be the reason for why they did not obtain a significant difference between nasals and oral stops. As reviewed in the introduction, a number of other identification experiments in noise have been run in the past, and they showed conflicting results. Pols (1983) found the expected difference between nasals and oral stops, whereas Winters (2002) did not. To add more experimental results bearing on this issue, we ran an identification experiment in noise. What is new in our Experiment III is that it emulates the real communicative situation most closely, by using cocktail party noise to cover the stimuli.
There is another motivation for this experiment. The two previous similarity judgment experiments involve an off-line task which involves conscious judgments by listeners. While the results support Jun's (1995; idea that perceptibility differences underlie the differences in the likelihood to undergo assimilation, it would be ideal to further support this idea by a task that does not involve conscious judgments.

Noise and S/N-ratios
The noise used in this experiment was cocktail party noise, taken from the study used in Kawahara (2006). The reason for using this particular type of noise was to emulate the real communicative situations most closely. To obtain the cocktail party noise, Kawahara (2006) recorded a party using a SONY TCD-D8 portable DAT recorder. The recorded sound was divided into three-second noise stretches. Six such stretches were superimposed on top of one another.
Building on Binnie et al. (1974), the current experiment used three S/N-ratios: -6dB, -12dB, and -15dB where the signal dB was kept at the average of 60dB. Praat (Boersma & Weenink, 1999-2014 automatically adjusted the duration of the noise file to the duration of each stimulus by the overlap-and-add method, and superimposed the adjusted noise file to each stimulus file.

Procedure
Superlab (ver. 4.0, Cedrus) was used to present the stimuli. For each stimulus, possible responses given were binary. For example, for a sound stimulus [Am], in one trial, the two visual responses were "am" or "an"; in the other trial, the two visual responses were "am" or "aN". This format allowed us to calculate the perceptual distance between any two minimal pairs differing in place.
For each pair of visual cues, both possible orders were included in the test (e.g. "am" and "an"; "an" and "am"). The visual cue for [N] was "ng".
The experiment started with a practice run in which the participants practiced the identification experiment, using a pair that differed in voicing, not in place. The practice session presented 10 items, and an experimenter stayed in the listening room so that the participants could ask questions after the practice run. The main session consisted of three blocks separated by a break sign.
Each block contained all the stimuli for each S/N-ratio (9 target stimuli * 5 tokens * 2 visual cue combinations * 2 visual cue orders = 180 tokens). All participants wore Sennheiser HD 280 Pro Headphones and used an RB-730 response button box (Cedrus) to register their responses. The order of the stimuli within each block was randomized by Superlab.

Participants
Twenty-four native speakers of English participated in this study for course credits in linguistics or psychology classes. No participants who participated in the previous two experiments participated in this study. One speaker failed to respond to more than half of the trials, and hence this person's data was excluded.

Analysis
We used a signal detection analysis to calculate the perceptual distance between each sound pair (Macmillan & Creelman, 2005). For each binary comparison, we calculated its d ′ -value, using z(Hit) − z(F alseAlarm). This signal detection analysis has an advantage of teasing apart sen-sitivity, which reflects a perceptual distance, from bias, a listener's strategic bias to choose one option over the other (Macmillan & Creelman, 2005). 3 To analyze the d ′ -values statistically, a linear mixed model was run in which S/N-RATIO, MANNER, and PLACE were fixed factors. Table 3 illustrates the average d ′ -values of each pair in Experiment III. The higher the d ′ -value, the more perceptible the pair was.  t = −2.14, p < .001). As with the two previous similarity judgment experiments, the place contrasts are less salient in voiced stops than in voiceless stops.

Discussion
To summarize, the identification experiment in noise shows the perceptibility hierarchy expected from the previous two experiments: voiceless stops > voiced stops > nasals, supporting the idea that nasals' place contrasts are weaker than oral stops' place contrasts. In fact, nasal place contrasts seem almost non-perceptible-i.e. d ′ -values are close to zero-under -12 dB and -15 dB S/N-ratio conditions. Indeed, the lower bounds of the 95% confidence intervals-the average values minus the margins of errors-overlap with zero in these conditions.
The current identification experiment thus yet again revealed a perceptibility difference of the place contrasts between nasal consonants and oral stops, supporting Jun's hypothesis (Jun, 1995(Jun, , 2004. This result accords well with that of Pols (1983), but not with that of Winters (2002). The difference between the current experiment and that of Winter may have come from two sources.
First, we used naturalistic sounds-both the targets and noise-to replicate the real communicative situations. In particular, noise was similar to those that speakers and listeners face in real communicative situations.
Second, the target consonants in the current experiment were placed in word-final position rather than in pre-consonantal position. The next experiment tested if the perceptibility differences observed in this experiment still hold in pre-consonantal position in which place assimilation occurs in phonology.
5 Experiment IV: Identification experiment in pre-consonantal position

Introduction
The previous identification experiment shows that nasal place contrasts are less perceptible than oral consonant place contrats. The final question that we address is whether the same asymmetry holds in preconsonatal position, in which place assimilation actually occurs in phonology.

Stimuli
To create preconsonantal environment, we first recorded the same speaker pronouncing [Ap@, At@, Ak@, Ab@, Ad@, Ag@] with stress on initial vowels. We then spliced off the initial stressed [A] vowels, and adjusted the amplitudes of the remaining portions-the unstressed second syllables-to 60dB.
We then concatenated each stimulus from Experiment III with the syllable that starts with a consonant that is non-homorganic to either of the two visual cues; for example, for the sound [Am] whose two visual cues were "am" and "an", the concatenated CV syllable was [g@]; for the sound [Ak] whose two visual cues were "at" and "ak", the concatenated CV syllable was [p@]. We chose non-homorganic consonants in order to prevent our listeners from defaulting to assimilated percept in the listening experiment (Beddor & Evans-Romaine, 1995;Kochetov & So, 2007;Malécot, 1956;Ohala, 1990).
Our pilot experiment shows that with a following CV syllable, the task is harder and in the -15dB S/N-ratio condition, listeners would perform almost near chance in all three conditions. Therefore, we tested only -6dB S/N-ratio and -12 dB S/N-ratio condition. In this experiment, we repeated each token twice.

Procedure
The procedure for this experiment is almost identical to that of Experiment III, except that the listeners were asked to identify the quality of initial syllables. The stimulus structure was as follows: for each S/N-ratio condition, we had 9 target stimuli * 5 tokens * 2 visual cues * 2 orders * 2 repetition=360 tokens. The order of the stimuli within each block was randomized by Superlab.

Participants
Twenty-two students participated in this study for class credits in either linguistics or psychology classes. No participants who participated in the previous three experiments participated in this study. Table 4 shows the average d ′ -values of each consonant pair in Experiment IV.

Discussion
We observe that the d ′ -values are generally lower in this experiment than in the previous experiment in which the target places were placed word-finally. This difference shows that the presence of a following consonant can mask the perception of coda consonants (even when the coda consonants' releases were not masked acoustically) (see Beddor & Evans-Romaine 1995 for a similar result).
Most importantly, we again observe the perceptibility hierarchy: voiceless stops > voiced stops > nasals, except for one reversal in the labial-coronal pairs between nasals and voiced stops in -6dB SN ratio condition.
6 General discussion

Summary
To summarize, all of the four experiments show the following perceptibility hierarchy of place contrasts: voiceless stops > voiced stops > nasals. The perceptibility differences were observed regardless of whether stops were clearly released (Experiment I, III, IV) or not (Experiment II).
The differences were also observed in both clear listening environments (Experiments I and II) and in noisy environments (Experiments III and IV). The differences hold both in similarity rating experiments (Experiments I and II) and in identification experiments under noise (Experiments III and IV). Finally, the differences were observed both in word-final position (Experiments I-III) and in pre-consonantal position (Experiment IV).
The comparison between the two tasks-similarity judgment tasks and identification experiment in noise-also show that these two tasks reveal comparable results in terms of the perceptibility of contrasts (though see Babel & Johnson 2010) and moreover, that speakers can make conscious judgments about the perceptibility of contrasts (Steriade, 2008).
Overall, the current results are compatible with what is predicted by Jun's (1995Jun's ( , 2004 hypothesis that nasal place contrasts are perceptually weaker than oral stop place contrasts. More generally, the results are also compatible with the hypothesis that speakers are more willing to neutralize contrasts that are less perceptible (Boersma, 1998;Huang, 2001;Hura et al., 1992;Kawahara, 2006;Kohler, 1990;Lindblom et al., 1995;Steriade, 1997Steriade, , 2001Steriade, , 2008.

Remaining questions
One remaining question is where the disagreement about the percetibility of place contrasts in the previous literature comes from, in particular the difference between the current results and Winters (2002). As discussed above, it could come from the difference in the kinds of noise that were used.
The current experiment used naturalistic sounds-both the targets and noise-to replicate the real communicative situations. In particular, noise was similar to those that speakers and listeners face in real communicative situations. Therefore, we can conclude that Jun's hypothesis may be on the right track, to the extent that speakers perceive nasal place contrasts less in a realistic speech setting. However, fully investigating the source of differences in the previous literature is beyond the scope of our paper.
Another question is why the nasal place contrasts are judged to be less distinct than the oral consonant place contrasts, and why the place contrasts were judged to be less distinct in voiced stops than in voiceless stops. For the first difference, Jun (1995; hypothesizes, following Malécot (1956), that coarticulatory nasalization in adjacent vowels blur the formant transition information, making the place contrasts in nasals less distinct. See also Fujimura (1962) for related observations about the acoustics of nasals, and Beddor & Evans-Romaine (1995) for more general discussion. Our experiment was not designed to test this hypothesis directly, and a future experiment is necessary.
For the second difference, it may be that since the pressure build-up behind the closure is stronger for voiceless consonants than for voiced consonants, bursts are stronger for voiceless consonants than for voiced consonants. Since bursts play an important role in cueing place distinctions (Kochetov & So, 2007;Malécot, 1956;Smits et al., 1996;Stevens & Blumstein, 1978;Tekieli & Cullinan, 1979;Winitz et al., 1972), stronger bursts of voiceless consonants may result in more distinct percepts. However, recall that in Experiment II, the difference in perceptual similarity still holds when we controlled for the amplitudes of releases. Alternatiely, Chen (1970) suggests that voiceless stops' closure is made with greater articulatory force and higher acceleration than voiced stops' closure, which may result in stronger formant transition cues. Admittedly, this hypothesis is speculative, and pursuing it further is beyond the scope of this paper.
A yet another limitation of this study is the fact that the participants of the current experiments are limited to the native speakers of English. There is of course thus a question of whether the current results hold for speakers of other languages. We hope that our experimental results are replicated with speakers of other languages.

Phonetic perceptibility and phonological patterns
While the overall results support Jun's hypothesis at least under a noise that mimics realistic speech setting, we also find a perceptual asymmetry which is not necessarily reflected in phonological patterns: we consistently found that voiceless stop place contrasts are more salient than voiced stop place contrasts, but as far as we know, this difference is not reflected in phonology. It is possible that further investigation of place assimilation typology will reveal a language in which only nasals and voiced consonants assimilate, but to the extent that such a pattern is a true gap, then our results show that not all perceptibility differences can be reflected in phonology, i.e. that the perceptibility scales that underlie phonological patterns involve certain abstraction (Gordon, 2002;Kochetov, 2006;Kochetov & So, 2007). An important question to be addressed in future research is what distinguishes perceptibility differences that are reflected in phonology and those that are not.  (Mester & Itô, 1989;Rice, 1993). In this sense, there is no natural phonological class that distinguishes voiced stops and nasals, in exclusion of voiceless stops. This hypothesis is merely a speculation and needs to be tested in future research.
We admit that our experiments were not designed to address all of these questions, and the current paper indeed opens up many more research questions than it answers. However, it is not realistic to address all of these questions in one paper-we hope that more perception experiments will be conducted to address these issues. Nevertheless we hope to have offered one substantial step bearing on the issue of the perceptibility differences of place contrasts in nasals and oral stops, and its possible implication for phonological patterns of place assimilation. At the very least, the current experiments have shown that the prediction made by Jun (1995; can be confirmed in some experimental settings.