The link between syllabic nasals and glottal stops in American English

Examples of syllabic nasals in English abound in phonological research (e.g., Hammond, 1999; Harris, 1994; Wells, 1995), but there is little explicit discussion about the surrounding consonant environments that condition syllabic nasals. This study examines the production of potential word-final syllabic nasals in American English following preceding consonants including oral stops, glottal stops, fricatives, flap, and laterals. The data come from a laboratory study of read speech with speakers from New York and other regions, a corpus of read speech with speakers from the Pacific Northwest and Northern Cities, and a spontaneous telephone speech corpus. Acoustic analysis indicates that [n̩] is only prevalent after [ʔ], with some extension to [d] or [ɾ]. Variation in rates of [n̩] versus [ən] is found across the speakers in a group, not within individual speakers. An articulatory sketch to account for the prevalence of [n̩] after coronal and glottal stops is laid out. To link this realization to the presence of the [ʔ] allophone in pre-syllabic nasal environment, previous analyses of acoustic enhancement proposed for glottally-reinforced [tʔ] in coda position (e.g., Keyser & Stevens, 2006) are extended to the syllabic nasal case.


Introduction
Accounts of the appearance of syllabic nasals in English for both British (especially Received Pronunciation) and American varieties have tended to be based on impressionistic or introspective data. In (1), potential contexts for syllabic nasals in word-final position are illustrated using examples from these papers. As these examples show, almost all of the consulted sources mention the post-/t/ or glottal stop environment as a possible place for a final syllabic nasal, but many other possible environments have also been asserted. For example, Simpson (2005) suggests that there might be lexical idiosyncrasy for some words when [k] precedes a potential syllabic nasal; for example, taken could surface either as [teɪkən] or [teɪkn̩ ] (see also Szigetvári, 2002 for potential assimilation of [n̩ ] after a non-coronal stop). Wells (1995), on the other hand, explicitly states that the probability of a word-final syllabic nasal following a sonorant consonant, in words like sullen or common, is approaching zero. Several of the sources cited in (1) also mention other potential environments where a syllabic nasal could be found, including before a wordfinal [t] or [s], as in absent, accident, residence, stem-finally but preceding a morpheme, as in threatening, reasonable, or even word-internally in a monomorpheme, as in lavender, ordinary (Cruttenden, 2008;Hammond, 1999;Heselwood, 2007;Mora Bonilla, 2003). Such cases are not included in the laboratory and corpus studies presented in this paper, and will not be considered further.
(1) a. Following [ʔ] (most American English) or [t] (some British English): button, frighten (Carley, Mees, & Collins, 2017;Cruttenden, 2008;Hammond, 1999;Harris, 1994;Heselwood, 2007;Keyser & Stevens, 2006;Mora Bonilla, 2003;Shockey, 2003;Simpson, 2005;Trager & Bloch, 1941;Wells, 1995) b. Following [ɾ] or [d]: sudden, hidden (Carley et al., 2017;Heselwood, 2007;Keyser & Stevens, 2006;Mora Bonilla, 2003;Wells, 1995) c. Following some fricatives: seven, brazen (Carley et al., 2017;Hammond, 1999;Keyser & Stevens, 2006;Mora Bonilla, 2003;Wells, 1995) d. Following non-coronal stops: happen, reckon (Polgárdi, 2014;Rubach, 1996;Szigetvári, 2002) The references in (1), many of which discuss British English, suggest that [n̩ ] should be most likely to be found following a coronal consonant. Yet, there is very little explicit discussion of the relationship between the preceding consonant and the likelihood of the production of a syllabic nasal. One exception is the Roach (2009) textbook, where he (anecdotally) observes that in British English, [n̩ ] is most likely after alveolar stops and fricatives (but not postalveolar affricates), rare after velar stops, and variable after bilabial stops. He also remarks that [n̩ ] is more common than [ən] after [f] and [v]. Moreover, in the case of American English, what is produced as [t] in many varieties of British English is said to be produced as either a true glottal stop or as a glottally-reinforced [t ʔ ] (e.g., Byrd, 1994;Eddington & Channer, 2010;Huffman, 2005;Kahn, 1980;Keyser & Stevens, 2006;Pierrehumbert, 1995;Roberts, 2006;Seyfarth & Garellek, 2015. Since the accounts referenced in (1) are mostly introspective, the study in this paper aims to clarify the distribution of word-final syllabic nasals in American English, in order to determine whether they are particularly widespread, or instead limited to occurring after specific sounds. The main focus of many previous accounts is instead on whether potential syllabic nasals are derived from /ən/, or whether the syllabic nasal itself is underlyingly represented. Many authors assume that /ən/ is the underlying sequence (Gussman, 1991;Harris, 1994;Polgárdi, 2014;Shockey, 2003;Toft, 2002). Trager and Bloch (1941, p. 232) state that "there is often free (stylistically determined) variation" between the syllabic and vowel variants in American English, and Rubach (1996) notes that in British RP, the presence of schwa is optional, but syllabic nasals are mostly obligatory in American English. Stevens and Keyser (2010) state that syllabic nasals are due to overlap between the vocalic and nasal consonant articulations, and assume that the resultant syllabic nasal should contain acoustic cues to the existence of the vowel, such as a longer duration and low-frequency amplitude that is more attributable to a vowel than to a nasal consonant by itself. Wells (1995) stipulates that the schwa is present in the UR, in part because he notes that impressionistically the presence of schwa is variable for some speakers or dialects, and he proposes a rule that converts a schwa + sonorant C sequence to a syllabic consonant. Such a rule is applied with variable frequency, being most likely after a stressed syllable ending in an alveolar plosive (e.g., button, sudden) and least likely after a sonorant consonant (e.g., sullen, common).
Mora Bonilla (2003) rejects accounts with an underlying vowel, instead positing that there is no underlying vowel, but if a nasal consonant follows a lower sonority sound like a stop or a fricative, then it will be forced to assume syllabicity to improve the sonority profile of the word (e.g., mitten: /mɪtn/ à [mɪtn̩ ]). Mora Bonilla observes that a limitation of his account is that at least according to the examples that some phonologists have previously given, words like foreign, which is claimed to have a syllabic nasal [fɒɹn̩ ] in Southern British English, should not occur because it does not contain the proper sonority environment for syllabic consonant formation.
Whether or not examples like foreign really pose a problem for Mora Bonilla's account is an empirical question given the dearth of instrumental studies. An early study to examine the proportion of potential syllabic nasals in speech, using a spoken speech corpus, is Roach et al. (1992). This study examines both American English and British English using the syllabic nasal and [ən] notations provided by the transcribers in the TIMIT corpus and the British SCRIBE corpus. For both corpora, they compare the numbers of words transcribed with [n̩ ] to those that have a potential syllabic nasal environment but are transcribed with [ən]. Although Roach et al. do not specify whether the examples are both word-internal (e.g., lightening [laɪtn̩ ɪŋ]) and word-final (e.g., button), it is assumed that they are since both cases are previously discussed in the paper. Results show that in American English, 9.3% of potential [n̩ ] (total potential N = 7135) were transcribed as syllabic, while 8.9% of potential [n̩ ] (potential N = 2216) in British English were transcribed as syllabic. While the phonetic criteria used by the transcribers are not provided, it is nevertheless noteworthy that these proportions are quite small. Toft (2002) reports on a small production study in British English in which the preceding consonantal environments before potential [n̩ #] Eddington and Savage (2012) examine the [ʔ(ə)n] environment specifically and show that the presence of the schwa may be dialectally conditioned. In particular, young, female Utahans are most likely to produce [ʔən], followed by the youngest male Utahans in their sample, as compared to non-Utahans who almost exclusively produce [ʔn̩ ].
Taken together, the few available phonetic studies suggest that it is likely that [n̩ ] occurs after a coronal consonant, and in particular a coronal stop, but that syllabic nasals may otherwise be rarer than is assumed in some of the older impressionistic literature. Roach (2009) notes that the likelihood of syllabic [n̩ ] after coronal stops is because 'nasal release' occurs in this environment, where the tongue tip stays raised for the coronal stop that is oral for [t] or [d] and becomes nasal for a subsequent [n]. Instead of producing a schwa, then, the [n] is produced syllabically (see a similar description in Hall, 2006;Ladefoged & Johnson, 2014;Zue & Laferriere, 1979). However, as shown in (1), plenty of sources do assume that syllabic nasals can and do occur after non-coronal and non-stop consonants as well.

Research questions
The goal of this paper is two-fold. First, we present data from several sources to examine how dialectal background and, to a limited degree, speech style affects the distribution of syllabic nasals in word-final position in American English. The first study is conducted in a lab setting, with speakers from the New York area and from other regions reading sentences containing words with potential word-final syllabic nasals. The second study is a similar analysis of the appropriate words identified in the University of Washington/Northwestern University (UW/NU) Corpus (Panfili, Haywood, McCloy, Souza, & Wright, 2017) in order to investigate whether speakers from other dialect regions in the United States, namely the Pacific Northwest and Northern Cities, have a similar or different pattern of results. These studies are supplemented with data from spontaneous speech from the Fisher Conversational Telephone Speech (CTS) corpus (Cieri, Graff, Kimball, Miller, & Walker, 2004), in order to examine whether the rates of syllabic nasals found for spontaneous speech are similar to those in read speech. This corpus spans the North, Midland, South, and West of the United States.
The second goal of this research is to examine whether certain phonetic contexts condition the particular realization of [ən]/[n̩ ] word-finally in American English by varying the preceding consonant contexts, including non-coronal and coronal oral stops, glottal stops, fricatives, and laterals. Based on the discussions from previous research, we hypothesize that the presence of a schwa is conditioned by the preceding phoneme, in particular, coronal stops. This would suggest an articulatory motivation for the distribution of syllabic nasals along the lines sketched by Roach (2009). That is, the continued tongue tip raising from the coronal oral stop to the nasal consonant blocks the acoustic realization of the vowel and is produced on the surface as a syllabic nasal instead. We also examine whether there is lengthened duration of the nasal consonant, which could be consistent with Stevens and Keyser's (2010) suggestion that there may be overlap of the schwa and nasal consonant articulations.
Interpreting an outcome consistent with an effect of a preceding coronal stop is complicated by the issue of how the surface variants of coronal stops in American English interact with the realization of either [ən]  To be more specific, the symbol [ʔ] is used in this paper as a stand-in for multiple phonetic realizations, including a glottally reinforced [t ʔ ], a full glottal stop, and a period of glottalization (creaky voice) with no closure (Huffman, 2005;Pierrehumbert, 1995;Seyfarth & Garellek, 2020). The specific implementations will be discussed in more detail throughout the paper.
If the coronal stop that might articulatorily condition [n̩ ] is actually a glottal stop or a period of glottalization, can an articulatory pressure for [n̩ ] still be determined? To preview the results of the study, syllabic nasals are indeed especially likely after /t/, which is always produced as one of these glottalized variants. Therefore, in the discussion, we develop a possible explanation for why American English might have developed [ʔ] in precisely this environment, and what the articulatory relationship between [ʔ], [d/ɾ], and a [n̩ ] realization could be.

Participants
The participants were 25 monolingual speakers of American English (ages 19-32, 3M, 22F). All of the speakers had been living in the New York metropolitan area for at least a year, with 15 of the speakers having always lived in New York City or suburban New Jersey, Connecticut, or Long Island. Of the remaining speakers, the home states included California (3), Georgia (1), Illinois (2), Missouri (1), Oregon (1), Vermont (1), and Virginia (1). Participants were paid $5 for their time. This research was approved by the Institutional Review Board at New York University. , frequency could not be easily controlled in developing this wordlist. However, frequency will be included as a linear factor where relevant in the analysis. It was also difficult to balance the number of morphologically simple versus complex words, since many words with word-final [ən]/ [n̩ ] are morphologically complex, so we have not tried to control that factor in the stimuli.
The words in Table 1 were combined into 26 sentences in phrase medial position, in sentences that were intended to avoid intonational boundaries or other pauses after the potential syllabic nasal. Example sentences are given in (2), and the whole list is in the Appendix. A total of 989 words were analyzed; 11 were removed because of mispronunciation, hesitations, or pauses right after the potential syllabic nasal.
Example sentences a. We had to loosen the wheel on the wagon to remove it. b. Miguel's symptoms may worsen if the doctors weaken his dosage. c. Gail has never eaten at Burger Heaven in Brooklyn.
For the recording, participants were seated in a sound-treated room and were given a randomized list of the 26 sentences. They were told to read each sentence at a rate that felt natural to them. They read each sentence once aloud into a Shure SM10A headworn microphone attached to a Tascam DR-40 digital recorder. The sentences were recorded to uncompressed wav files at 44kHz. The recording session took about seven minutes to complete.

Data Analysis
Textgrids in Praat (Boersma & Weenink, 2020) were created for each file. For each target word, the following information was marked: whether or not a schwa was present, the intervals corresponding to the nasal and the schwa (  closure (with no formants or acoustic information other than voicing during the closure) followed by an observable burst. For [ɾ], there could be a short period of closure, but no visible burst, or there could be weakening to an approximant that was produced with lower intensity formants (consistent with the phonetic implementations found in Fukaya & Byrd, 2005;Herd, Jongman, & Sereno, 2010;Warner, Fountain, & Tucker, 2009;Zue & Laferriere, 1979). All instances of [ʔ] had at least two pulses of glottalization on the preceding vowel and sometimes on the following nasal, but to be marked as a true glottal stop, there had to be at least 20 ms of closure preceding the nasal or the schwa, if one was present. Otherwise, the glottal stop was realized as a period of glottalization with no oral closure (see Figure 6 in Section 3.3). Schwas were marked by looking for changes in intensity compared to adjacent sounds and differences in formant structure between the vowel and the following nasal. There was typically a very obvious intensity difference between the schwa and the following nasal, with the amplitude of the higher frequencies in the nasal (corresponding to frequencies higher than F3 in the vowel) lowering dramatically. Likewise, the nasal consonant typically had a nasal formant that was concentrated at a different frequency than F2 of the preceding vowel. The boundary between vowel and nasal was very clear and easy to demarcate in nearly all of the cases. The only potentially difficult case concerned preceding [l]. In this environment, a schwa was marked present when F2 rose and F3 lowered after the characteristic lowering of F2 and raising of F3 that characterizes intervocalic /l/ in American English (e.g., Recasens, 2012). These changes in formants were usually also accompanied by an increase in intensity between from the [l] to the schwa. In the very few cases that were difficult to determine, the first two authors together decided whether a vowel was present. Several examples of presence and absence of a schwa, as well as variation in [d/ɾ], and [ʔ] are shown in Figure 1.
The labeling and segmenting procedure was carried out initially by the second author, who received training in acoustic phonetics and speech science classes, and further training specific to the segments being examined in this study. The first author then reviewed all of the Praat textgrids, and in cases where there was disagreement (about 15% of the data), the first and second author then met together to resolve what the proper labels and segment boundaries should be. After this consensus procedure, no further data were removed.

Results
For the analysis of the presence versus absence of schwa, results were analyzed using a mixed effects logistic regression using lme4 in R (Bates et al., 2018). The binomial dependent variable was the production of either a syllabic nasal or a schwa, with preceding consonant category ([-cor] stop, fricative, [l], [d/ɾ], [ʔ]) as a fixed effect. The preceding consonant category was sum coded, with the level 'fricative' held out. In addition, in order to further investigate whether the New York speakers exhibit any group differences compared to the remaining speakers from more diverse places, a factor of region (NY, other) is also a fixed effect and was sum-coded. The model also included the interaction of preceding consonant category and region. Speaker and word are included in the model as random intercepts.
For the analysis of the duration of the nasal in syllabic versus [ə] contexts, a linear mixed effects regression was carried out using lme4 and lmerTest (Kuznetsova, Brockhoff, & Christensen, 2013 The proportions of syllabic nasals versus schwas are shown in Figure 2. Results given in Table 2 indicate that preceding consonant context is significant for all of the contexts, but this only means that all of the values for these contexts are either significantly above or below the mean for the rate of schwa presence over all of the contexts. More critically, results in Table 3 from Tukey tests using the multcomp package in R (Hothorn, Bretz, & Westfall, 2008)  As for speaker region, there is no significant main effect, but there is an interaction for non-New York speakers with [ʔ]. The significant interaction for [ʔ] is a result of the especially big difference between the rate of syllabic nasals produced by New Yorkers (63%) as compared to the other speakers (91%).    tokens with a syllabic nasal. Rather, visual inspection of individual speakers demonstrates that except for a small number of people, the majority of speakers produce either <25% or >75% syllabic variants. This is shown in Figure 3. The speakers from New York are marked in the axis labels; while the non-New York speakers are concentrated in the >75% syllabic nasal category for [ʔ], the New Yorkers are spread throughout the range. For [d/ɾ], the speakers are more evenly divided. For both [ʔ] and [d/ɾ], New Yorkers are the majority of the speakers in the <25% syllabic nasal category.
When it comes to the pronunciation of the coronal allophones, a closer look at the relationship between [d/ɾ] production and schwa or syllabic variants shows that there is no relationship between when speakers produce [ɾ] or [d] and when they produce a schwa or a syllabic nasal. On the other hand, for the [ʔ] words, speakers are more likely to produce a schwa when it is accompanied by glottalization than when [ʔ] is produced as a glottal stop with a period of complete glottal closure. This is shown in Table 4.   (Brysbaert & New, 2009), and their interaction was carried out, with random intercepts for words and speakers. Results show that as expected, there is a main effect of preceding consonant (β = -4.84, z = -5.03, p < 0.001), but no main effect of frequency (β = -0.055, z = -1.70, p = 0.09) or interaction between preceding consonant and frequency (β = 0.051, z = 1.17, p = 0.24). While frequency is not significant, the negative coefficient could indicate a trend toward a greater proportion of schwas present when lexical frequency is lower if a bigger corpus were used. If the schwa variant is considered to be more hyperarticulated or less reduced, this trend is consistent with studies which have shown that more frequent words are often the more articulatorily reduced variants (e.g., Aylett & Turk, 2006;Gahl, 2008;Pluymaekers, Ernestus, & Baayen, 2005).
Lastly, we examine the duration of the nasal depending on whether it is syllabic or preceded by schwa. Results for this analysis are shown in Tables 5 and 6.   . These results also show that the variability seen in the overall results in Figure 2 primarily reflect across-speaker variability, not within-speaker variability, as illustrated in Figure 3. Likewise, the minimal frequency effect on how [ən]/[n̩ ] words are produced is consistent with the across-speaker variability shown in Figure 2 being the most important factor in explaining any gradience in the aggregate data, though there may be a tendency toward greater rates of syllabic nasal as frequency increases. Finally, syllabic nasals are significantly longer than the nasal in [ən] for [d/ɾ] and [ʔ]. Since the latter two are the main environments where there are enough syllabic nasals produced for the effect to be robust, further discussion of the effect of nasal length will focus on these preceding consonants.
In the following section, we carry out a similar analysis of the rates of [ən]/[n̩ ] to investigate whether the patterns in the lab study, with its emphasis on the New York region, also extend to the Pacific Northwest and Northern Cities.

Participants
The UW/NU Corpus consists of 33 talkers from the Pacific Northwest (11M, 9F) and Northern Cities (7M, 6F) dialect regions reading the IEEE Harvard sentences. Of these, 20 talkers were used (8 NC, 12 PN), since these ones produced all of the sentences containing all of the words in Table 6.

Stimuli
From the transcripts of the sentences used in this corpus, we identified 24 unique words (nine of which appeared more than once, in two or three different sentences) that contained the same type of potential word-final syllabic nasals that were used in the laboratory study. These are shown in Table 7, and the sentences containing these words are in the Appendix. In addition to the preceding consonant categories from the laboratory study, three words from this corpus had a preceding nasal. In total, 1153 words were analyzed. Note that since we used all possible words with final potential syllabic nasals, there are some instances here where the target word is the first or last in the sentence. Given the overall low rates of syllabic nasal production in the lab study, we decided to maximize the number of words rather than exclude these initial and final words. An examination of the results indicates that words in these positions had the same schwa/syllabic nasal profiles as the other words in their preceding consonant category.

Results
Results were analyzed using a mixed effects logistic regression. The dependent variable was syllabic nasal or schwa responses, with preceding consonant category ([-cor] stop, fricative, nasal, [l], [d/ɾ], and [ʔ]) and speaker dialect as fixed effects and speaker and word as random intercepts. The interaction between preceding consonant category and dialect was also included in the model. Both factors were sum coded.
The proportions of syllabic nasals versus schwas for UW/NU data are shown in Figure 4 and statistical results are given in Tables    for speakers from other regions (see Figure 2), which are both lower than the Pacific Northwest speakers in this study.
Since these results are based on read speech, the next section reports on data from a smaller corpus study of spontaneous speech as a preliminary attempt to investigate whether potential syllabic nasals are implemented similarly in read and spontaneous speech.

Spontaneous speech data
The stimuli for the spontaneous speech analysis were taken from the Fisher CTS corpus (Cieri et al., 2004), which was created in 2003. This corpus contains conversational telephone speech between two participants previously unknown to one another who are given a prompt to talk about. The corpus was developed to assist with the development of automatic speech recognition. Because the goal of this portion of the study is to get a general overview of spontaneous speech at large and whether it mirrors the read speech patterns, the speakers are not controlled for age, sex, or regional background. In the corpus overall, 38% of speakers are 16-29, 45% are 30-49, and 17% are over 50. Female speakers comprise 53% of the talkers. Speakers in the corpus are mostly evenly split over four broad geographic regions: North, Midland, South, West. Words were taken from speakers across all of the geographic regions.
Of the 40 words that were used in the laboratory study, those words which were produced at least 8 times in the Fisher CTS corpus were chosen. If there were fewer than 10 tokens in the corpus for any word, all of the tokens were used. If there were >10 tokens, a random sampling of 10 tokens were extracted from the corpus. This resulted in 230 utterances representing 25 of the 50 words from the lab study (Ns = [-cor]  Results for this data are very similar to those for the lab study, except that the proportion of syllabic nasals produced after [d/ɾ] now matches the proportion for a preceding [ʔ]. This is shown in Figure 5. The same logistic regression and Tukey tests described in Section 2.1.4 indicate that there are significant differences for [d/ɾ] and [ʔ] with all other preceding consonants (both p < 0.001), but no significant difference between [d/ɾ] and [ʔ] or among any of the other preceding consonants.
The data from spontaneous speech suggest that the same pattern is largely found as for read speech, though the proportion of syllabic nasals for the [d/ɾ] words is now in line with that for [ʔ], which is very high. It may be that if there is more reduction or overlap in spontaneous conversational speech than in read speech, the [n̩ ] is even more likely to surface, though notably, only where it is already permitted; rates of schwa do not go down in the rest of the environments.

General Discussion
The outcomes for the laboratory study, the UW/NU data, and the Fisher CTS spontaneous speech corpus all converge on very similar results. True syllabic nasals are most common after the [ʔ] allophone of /t/ that is almost obligatorily produced in many American English varieties and in these studies in this position. 2 In the read speech data, the rates of syllabic nasals after [ʔ] ranged from 63% to 85% across the groups examined in this study. A breakdown of the read data in the laboratory study, which was the best controlled for number of items in each preceding consonant category, showed that in fact these proportions reflect across-speaker variability, not within speaker. That is, some speakers produced almost all schwas after [ʔ], but most of the rest produced almost all syllabic nasals. 3 In spontaneous speech, the rates of syllabic nasal after [ʔ] are even higher, at 95%.
For most of the remaining preceding sounds in read speech, syllabic nasals are much less common, reaching 25% after laterals for the Northern Cities speakers, but not even reaching 20% for the other categories, for any other speakers. The exception to this is [d/ɾ], where speakers show intermediate rates of syllabic nasals (28-61%, depending on speaker group), but even this aggregate is misleading, since the speaker breakdown for the laboratory data shows that like [ʔ] words, speakers generally either mostly produce either syllabic nasals or schwas for [d/ɾ]. In this environment, more speakers produce [ən] than [n̩ ].
These results make it clear that syllabic nasals in word-final position in American English are limited in their distribution. They do not occur in most of the preceding consonant environments that have been provided as examples in many previous studies (see Section 1). On the other hand, the findings are consistent with the few papers that explicitly suggest that syllabic nasals should be most likely after coronal stops (Roach, 2009;Wells, 1995), and with the one empirical study of British English that showed this for a small number of speakers (Toft, 2002). As some have described (Carley et al., 2017;Roach, 2009;Zue & Laferriere, 1979), at first glance, a syllabic nasal following a coronal stop is articulatorily sensible, since the tongue can simply stay at the alveolar ridge during the transition from the oral to the nasal stop; this is what some authors have termed 'nasal release.' However, an account of where American English speakers produce syllabic nasals does not end here, because it is intertwined with the question of why [ʔ] might have arisen in American English in precisely this environment.

Variation and implementation in syllabic nasals
The general assumption that the default variant of [ən]/[n̩ ] is [ən], and that it can productively combine with many different sounds, is credible for at least two interrelated reasons. First, except in the [ʔ] and sometimes the [d/ɾ] environments, this study demonstrates that potential word-final syllabic nasals are actually produced as [ən]. Second, evidence from morphemic composition shows that [ən] is a productive wordfinal morpheme that is clearly realized with a schwa in many environments (e.g, the deadjectival morpheme, ripe à ripen, damp à dampen, past participles like broke à broken, choose à chosen, the demonymic morpheme, Mexico à Mexican, Chile à Chilean). Given that syllabic nasals are not realized in the large majority of environments in which they could occur, it is parsimonious to hypothesize that the default variant of these relatively 2 We are aware that there are dialects of North American English that have flap as a variant of /t/ in the potential syllabic nasal words (e.g., mitten [mɪɾən]), though these have not been described in a paper to our knowledge. No speakers in any part of this study produced this variant, so the discussion in this paper applies to those dialects that have glottalization before syllabic nasals. 3 Although the sample size for [ʔ] in the UW/NU corpus is much smaller, the same pattern of across-speaker variability holds. Only one Northern Cities speaker produces 40% of utterances with a schwa, with the remaining seven speakers at 20% or below. For Pacific Northwest speakers, three produced between 80-100% of [ʔ] words with a schwa, three speakers were between 40-60%, and the remaining six were at 20% or below. productive morphemes is [ən], and that [ʔ] and [d/ɾ] are the special environments where something happens. As a preliminary step, we argue that the original rise of the syllabic nasal in this environment is likely a result of a type of articulatory merging, along the lines of the descriptions of nasal release of a coronal stop followed by a coronal nasal (Carley et al., 2017;Roach, 2009;Zue & Laferriere, 1979). When /ən/ follows /t/, if the speaker leaves the tongue tip raised after producing a coronal stop in anticipation of the coronal nasal, then even if a reduced vowel gesture is produced by the tongue body, its realization would be masked if it is overlapped by the raised tongue tip. Such a configuration could occur for either /t/ or /d/, which are the two environments where syllabic nasal most often occurs. We return to the difference in rates between these two stops below. Moreover, a configuration in which the tongue tip stays raised for both a coronal stop and for /n/ is potentially compatible with a longer duration of the syllabic nasal, if speakers anticipatorily lower the velum as they would if the schwa were realized preceding a nasal (Bell-Berti, 1993;Cohn, 1993;Solé, 1995). That is, a portion of the tongue tip closure that should correspond to an oral stop might be realized as a nasal if the velum is anticipatorily lowered before the /t/ or /d/ itself is completed, which would acoustically turn the oral stop into a nasal one. Note that the duration results are consistent with Stevens and Keyser's (2010) speculation that syllabic nasals arise from the co-production of a tongue raising gesture for a coronal consonant and a tongue body gesture for a schwa. They also claim that the formant structure of syllabic nasals might be likewise affected, which we leave for future research.
In contrast, preceding sounds that do not have a coronal closure do not mask the schwa, since the articulatory advantage that results from not having to disrupt the tongue tip closure disappears if the closures preceding the nasal are produced with different articulators. It appears that even the coronal fricative articulation as in /s/ and /z/ is either not enough of a tongue tip constriction to justify leaving the tongue tip in place, or if speakers have differences in constriction locations for coronal fricatives and stops, as has been found for English speakers (Dart, 1998), then again, the advantage of raising the tongue tip to the same position that is required for /n/ is precluded.
Since individual speakers tend to be more or less categorical in implementing either schwa or syllabic nasal for [ʔ] and [d/ɾ], such a pattern suggests that the articulatory scenario described above is the phonetic precursor to what gave rise to the syllabic nasal variant following coronal stops, but not before other sounds. That is, today, [n̩ ] may be an allomorph of [ən] after coronal stops for those speakers who produce it consistently. It is not likely that [n̩ ]-speakers are intending to produce [ən] but due to common gestural overlap and reduction patterns in connected speech, end up with [n̩ ] every time; if this were the case, we should expect to see more variability within speakers as to whether they produce [ən] versus [n̩ ]. Instead, the use of the [n̩ ] variant may be dialectally conditioned, with New York and Pacific Northwest speakers having slightly lower rates than Northern Cities speakers and the other, geographically varied speakers in the laboratory study (though the data in this paper cannot speak to what might be conditioning these rates of allomorph choice). Note that we do not intend to rule out varying rates of gestural overlap entirely; we are not arguing that it is impossible for the [ə] to be obscured as a result of gradient overlap. To the extent that we do see any variability within speakers, gestural overlap could be playing a role. Nevertheless, if [n̩ ] is the variant that follows coronal stops, it may be easier to explain the [ʔ] allophone in American English. This is taken up in the next section.

Distinguishing between /t/ and /d/: [ʔ] before [n̩ ] in American English
We turn here to the realization of /t/ as [ʔ] before [ən]/[n̩ ] (Harris, 1994;Kahn, 1980). In one scenario, if [ən] is a morpheme that attaches to a lexical base that ends in /t/ (e.g., eaten /it + ən/), it could be expected that the form should surface as [iɾən]. Indeed, the ˈV __ V stress pattern of such a word is precisely an environment where flapping in American English typically occurs (e.g., Borowsky, 1986;Kahn, 1980;Patterson & Connine, 2001;Turk, 1992;Warner & Tucker, 2011), and with other unstressed morphemes, such as /-ɪŋ/ or /-ɚ/, flapping does occur (eating [iɾɪŋ], eater [iɾɚ]). Why then, in the case of /ən/, does it not usually result in flapping when it is attached to a stem that ends in /t/? Where does the glottal variant of /t/ come from in this environment?
A possible answer to these questions may be found in accounts of glottal reinforcement of /t/ in coda position more generally that attribute glottalization to acoustic enhancement (Keyser & Stevens, 2006;Pierrehumbert, 1995;Seyfarth & Garellek, 2015;Stevens & Keyser, 2010; but see counterarguments in Huffman, 2005;Seyfarth & Garellek, 2020). In this line of research, coda /t/ glottalization is argued to be an acoustic modification that is implemented in order to enhance a phonological contrast that is otherwise perceptually endangered in a particular environment. For example, Keyser and Stevens (2006) hypothesize that in English, there is impetus to cut off vocal fold vibration in voiceless coda stops that are preceded by a vowel so that too much carryover phonation is not produced during the stop closure. They argue that in labial and velar stops, the tongue surface posterior to the constriction can be stiffened to help inhibit the expansion of the vocal tract volume (Svirsky et al., 1997), but this is more difficult in coronal stops which must have a more flexible tongue body in order to make a tongue tip closure. Instead, in this environment, another way to suppress vocal fold vibration is to adduct the vocal folds for the purpose of either glottalization or a glottal stop if there is full adduction. Keyser and Stevens propose that this is an example of featural enhancement, and that it explains why glottalization is more common for /t/ than for other voiceless stops, and why it occurs in coda position.
Since American English speakers may already implement this method to distinguish /t/ from /d/ in codas, this articulation could have been appropriated for word-medial position to also enhance the distinction between coda /t/ and /d/ before [ən]/[n̩ ]. If words with /t/ are produced with a glottally reinforced [t ʔ ], then it would be clear to listeners that the speaker is intending to produce /t/ even though at the same time, they are not lowering their tongue tip between the closure for the stop and the closure for the subsequent nasal. Since a neighboring nasal consonant may be a particularly good aerodynamic environment for bleeding voicing into an adjacent stop (Davidson, 2016), glottalization would also be a way to prevent this from happening and signal to the listener that the speaker intended a /t/ instead of a /d/. Moreover, glottal reinforcement for /t/ is particularly prevalent before sonorant consonants in American English (either word-finally at word boundaries, or in words where it is unambiguously a medial coda, as in oatmeal or catnip) (Huffman, 2005;Pierrehumbert, 1994;Seyfarth & Garellek, 2015 so the potential syllabic nasal environment is a natural extension of where [t ʔ ] would be expected.
To make an acoustic contrast with [t ʔ ], underlying /d/ can either be produced as a short ballistic [ɾ] which requires moving away from the alveolar ridge, or as a [d] with a lesser degree of overlap between the tongue tip gestures of [d] and [n], both of which are different from a /t/ that is realized with glottalization. Either way, [ə] is more likely to be realized for /d/ than for /t/, as reported in Table 3. The presence of the vowel is potentially another cue that could help distinguish between /t/ and /d/, which may explain the differences between rates of [n̩ ] after these coronal stops. However, at the same time, there may be pressure for the [n̩ ] allophone to spread from /t/ to the other coronal stop, since rates of [n̩ ] are higher for [d/ɾ] than for any of the other remaining preceding consonants in these studies. While the spontaneous speech data from the Fisher corpus has a small number of tokens, in this data [ʔ] and [d/ɾ] have similar high rates of [n̩ ] following these consonants. Whether this is because articulatory phonetic processes like overlap and reduction increase in spontaneous speech or because these speakers simply have higher rates of [n̩ ] is unclear, but this question could be further pursued with a larger data set.
To briefly return to the comparison between environments for flapping (e.g., ea[ɾ]er 'eater,' wi[ɾ]er 'wider') and those for glottalization (e.g., ea[ʔ]en 'eaten,' but wi[ɾ]en 'widen') in American English, it is notable that acoustic contrast is maintained on the consonant for /t, d/ preceding the syllabic nasal, but both of those sounds neutralize to [ɾ] in other environments. Some research has indicated that where flapping occurs, speakers do produce slightly longer vowels before /d/ flaps than before /t/ flaps (Braver, 2014;Herd et al., 2010;Zue & Laferriere, 1979), and there are differences in F1 and F2 for the specific writer/rider pair (Kwong & Stevens, 1999), suggesting that the acoustic contrast is pushed to the vowel where flapping occurs. However, the same studies also show that listeners cannot productively use this length different to reliably distinguish /d/ flaps from /t/ flaps (Braver, 2014;Herd et al., 2010), so it remains an open question why American English maintains such a salient acoustic contrast between /t/ and /d/ in pre-nasal position, but apparently less so in the flapping environment. Anecdotally, some speakers of American English may have begun to produce [ʔn̩ ] as [ɾən] (e.g., kitten as [kɪɾən]), which could lead to the elimination of [n̩ ] as an environment where /t/ is produced as something other than a flap in intervocalic position. However, the extent of this change is currently unknown and will need to be revisited.

Acoustic and individual variability in the implementation of [ʔn̩ ]/[ʔən]
At this point, the phonetic account for syllabic nasals contains the following elements: Voiceless coronal stops before [ən]/[n̩ ] are realized with glottal reinforcement for acoustic enhancement, and there is greater prevalence of the [n̩ ] variant after these stops, which possibly originated from the oral tongue tip articulations for the /t/ and /n/ merging and masking the /ə/. This account, however, is only adequate for cases where speakers are raising their tongue tips for [t ʔ ] (which is probably usually realized with glottalization on the preceding vowel) but it is likely that some speakers are actually producing a full glottal stop that is not accompanied by any oral closure gesture at all. For the cases that were classified as full glottal stops in the laboratory study, it is not possible to know for certain whether a period of closure contains both an oral and a glottal closure, or only a glottal closure. At the same time, 47% of all of the [ʔ] utterances in the laboratory study were produced as a period of glottalization that had no evident closure (see Table 3). The distinction between a /t/ that is produced as only glottalization or with a period of closure is shown in Figure 6. Interestingly, speakers who produced a period of glottalization with no closure are significantly more likely to also produce a schwa (see Table 3), which is what might be expected when the tongue tip is not raised, since the incentive to keep the tongue tip raised from /t/ to /n/ is not present. Ultimately, if speakers variably implement a full glottal stop (with no simultaneous tongue tip closure), a period of glottalization, or glottally reinforced [t ʔ ], this may be indicative of speakers' knowledge that these realizations are not contrastive in English (e.g., Dilley, Shattuck-Hufnagel, & Ostendorf, 1996;Docherty & Foulkes, 1995Pierrehumbert, 1995;Pierrehumbert & Talkin, 1992;Sumner & Samuel, 2005). If there are speakers who implement a syllabic nasal following either glottalization or a full glottal stop, or both, this is consistent with the point that the phonetic sketch delineated at the top of this section may have been the articulatory impetus for the rise of the glottal stop allophone of /t/ preceding a syllabic nasal allomorph in American English, which has become conventionalized and now may be realized with a broader set of articulatory options. The frequency with which speakers actually raise their tongue tips could be investigated with an imaging technique like electromagnetic midsagittal articulography (EMA), ultrasound, or real time MRI (rtMRI).
While [ʔn̩ ] is by far the most common implementation for words with /t/ as the underlying preceding consonant, the results from the laboratory study show that there are six New York speakers who produce [ʔən] more than 75% of the time. Three of these speakers produce a period of glottalization in 62% of the cases, two speakers produce only a period of glottalization, and one speaker produces only a period of closure (i.e., glottal stop or [t ʔ ]). If productions with a period of closure do reflect [t ʔ ] with a tongue tip raising component, then presumably these speakers' coordination pattern has the tongue tip releasing after the [t ʔ ] before the subsequent [n] on the tongue tip tier, which means that the [ə] would be acoustically realized. For the cases when speakers produce a period of glottalization, then this would be timed to end before the vowel is completed. This may be a dialectal variant, such as for the New York speakers in this study, and perhaps an ongoing sound change for some areas, given Eddington and Savage's (2012) finding that [ʔən] is more common among young female speakers in Utah than the non-Western U.S. speakers in their study.

Conclusion
Despite the relatively expansive assumptions about the distribution of the syllabic nasal in word-final position in English in the literature, empirical evidence from a laboratory study, a corpus of read sentences spanning multiple dialect regions, and samples of spontaneous speech indicate that syllabic nasals in American English only occur reliably after a glottal stop, a period of glottalization, or glottally reinforced [t ʔ ], and to a lesser extent, following /d/, which can be realized as either [d] or [ɾ]. The results also indicate that there is dialectal variation in the rates of [ən] versus [n̩ ], with New Yorkers being the most likely to retain [ən], especially for /ʔ/, as compared to Northern Cities and the Pacific Northwest. Data for individual speakers show that most speakers prefer either [ən] or [n̩ ], with some variation across speakers but little variation within speakers.
Given the widespread presence of [ən], and a feasible articulatory account for why the continued raised tongue tip from a preceding coronal stop or flap would acoustically mask the [ə], giving rise to a [n̩ ], it is parsimonious to assume that [ən] is the default variant of the [ən]/[n̩ ] morpheme (when it is morphemic). Also, it is likely that [n̩ ] is an allomorph that speakers select rather than regularly generate via articulatory processes like connected speech overlap or reduction every time. We then considered why [ʔ] is the surface realization in American English instead of either [t] or [ɾ], since the combination of a stem and suffix like /ɹɑt+ən/ for rotten is an environment in American English where flapping is expected. This was explained by extending to pre-syllabic nasal position an acoustic enhancement account that has been proposed for why /t/ is glottally reinforced in coda position. Glottalization, whether in conjunction with [t], or as a full glottal stop or a period of creakiness, helps to distinguish /t/ from /d/ in the pre-nasal environment by implementing an articulatory variant associated with /t/ that already exists in other positions.

Additional File
The additional file for this article can be found as follows: • Appendix: The sentences read by the participants in the laboratory study. DOI: https:// doi.org/10.5334/labphon.224.s1