Prevocalic t-glottaling across word boundaries in Midland American English

Rates of t-glottaling across word boundaries in both preconsonantal and prevocalic contexts have recently been claimed to be positively correlated with the frequency of occurrence of a given word in preconsonantal contexts (Eddington & Channer, 2010). Words typically followed by consonants have been argued to have their final /t/s glottaled more often than words less frequently followed by consonants. This paper includes a number of ‘ internal’ and ‘external’ predictors in a mixed-effects logistic regression model and has two goals: (1) to replicate the positive correlation of the frequency of occurrence of a word in preconsonantal contexts (its ‘contextual frequency’) with its rates of t-glottaling in both preconsonantal and prevocalic contexts postulated by Eddington and Channer (2010), and (2) to quantify the factors influencing the likelihood of t-glottaling across word boundaries in Midland American English. The effect of contextual frequency has been confirmed. This result is argued to support a hybrid view of phonological storage and processing, one including both abstract and exemplar representations. T-glottaling has also been found to be negatively correlated with bigram frequency and speech rate deviation, while positively correlated with young age in female speakers.


Introduction
T-glottaling is the realization of the voiceless alveolar plosive /t/ as a glottal stop [ʔ]. It is also sometimes referred to as 'glottal replacement' (Milroy, Milroy, Hartley, & Walshaw, 1994), to distinguish it from 'glottal reinforcement' (Higginbottom, 1964), that is from the addition of [ʔ] to an oral stop. T-glottaling has been attested in numerous accents of English spoken both in Britain (Fabricius, 2002;Milroy et al., 1994;Trudgill, 1974) and in the United States (Eddington & Channer, 2010;Eddington & Taylor, 2009;Huffman, 2005;Levon, 2006;Roberts, 2006;Seyfarth & Garellek, 2015). Recently, it has been shown that in American English, [ʔ] is one of the possible realizations of /t/ across word boundaries before vowels, that is in contexts such as righ [ʔ] in. Such realizations are attested by Levon (2006), in the speech of New York Reform Jews, by Roberts (2006), for rural Vermonters, and by Eddington and Taylor (2009), who compared Utahns to speakers from elsewhere in the U.S. Eddington and Channer (2010) argued that the likelihood of t-glottaling is linked to the frequency with which a given /t/-final word occurs before consonant-initial words. This is an intriguing proposition. The evidence that the original paper provides for this proposition, however, can be further strengthened. Using different data, an improved statistical analysis, and a more precise method of calculating the proportion of occurrence of a word in preconsonantal contexts, the present paper confirms the finding of Eddington and Channer (2010): Higher rates of occurrence in preconsonantal contexts T-glottaling has long been observed to take place in American English dialects in several phonological environments (see Table 1 for a brief overview of reports). As is the case with other consonantal features, t-glottaling has been described more extensively for British English than for American English accents (though the number of studies on variation in consonants in American English accents is growing, e.g., Zhao, 2010;Gylfadottir, 2015;Yuan & Liberman, 2011). T-glottaling has been reported to occur word-internally (e.g., Trager, 1942;Zue & Laferriere, 1979), across word boundaries before sonorant consonants (Pierrehumbert, 1994) and before plosives (Huffman, 2005). It has only relatively recently been reported to occur across word boundaries before vowels, as in righ [ʔ] ankle (Eddington & Taylor, 2009;Levon, 2006;Roberts, 2006).
As Eddington and Channer (2010) observe, prevocalic glottaling across words may be seen as surprising in American English, when one considers the rarity of prevocalic glottaling word-internally. Within words, prevocalic glottaling is generally less commonly reported for North American than for British varieties: In contexts such as ci [ɾ]y, flapping typically takes place, precluding glottaling. Although word-internal prevocalic t-glottaling is attested in American dialects-it has been observed before /ən/ in dialects which 'unpack' syllabic /n̩ /, as in moun [ʔə]n (Eddington & Savage, 2012), and also, to a limited extent, in morphologically complex words such as pu [ʔ]ing, wai [ʔ]ing (Patterson & Connine, 2001)-word-internal prevocalic flapping is more common than word-internal prevocalic glottaling. In general, glottaling and flapping tend to occur in mutually exclusive contexts, as elaborated in the following. Some environments allow only glottaling. This is the case word-internally before obstruents and nasals, as in ou [ʔ] put or po [ʔ]ent, as well as in absolute final position, e.g., in tha [ʔ]. Before vowels and syllabic liquids within words, on the other hand, flapping is typical, as in ci [ɾ]y, be [ɾ] er, and li [ɾ]le. 1 Across word boundaries, only glottaling is allowed before obstruents, as in se [ʔ] back, and before sonorant consonants, as in abou [ʔ] you. But across word boundaries before vowels-the environment investigated in the present paper-both flapping and glottaling are common: righ [ɾ] around ~ righ [ʔ] around (Eddington & Channer, 2010). Eddington and Channer (2010) present a large-scale study of the competition of glottaling and flapping in unscripted speech. They analyzed a sub-part of the Santa Barbara Corpus of Spoken American English (DuBois et al., 2005), looking at all cases of word-final /t/ (preceded by a vowel, nasal, or liquid) followed by vowel-initial words (N = 1,101). In that study, all instances of /t/ were impressionistically coded as [t], [ɾ], [ʔ], or as deleted.
1 Trager (1942) reports having [ʔ] in prattle and glottal, but not in little and bottle.

Context Examples Sources
before nasals portent Trager (1942), Zue and Laferriere (1979) before sonorants hat rack, about you Pierrehumbert (1994), Kaźmierski et al. (2016) before plosives beet counter Huffman (2005), Dilley and Pitt (2007) syllable-finally cat, sent, belt Selkirk (1972), Kahn (1976), Cohn (1993) IP-finally 'hat, daypack ' Huffman (2005) across words, before vowels right around Levon (2006), Eddington and Taylor (2009) The middle column of Table 2 shows the results. In a non-negligible number of cases (262 out of 1,101, that is just under 24%), prevocalic /t/ was realized as a glottal stop. This might be seen as surprising, given the prevalence of prevocalic flapping word-internally in this segmental context. information should be accessible; a particular string of phonemes should be passed on to the phonetic component, and so no difference between time and thyme is expected. While word-form frequency effects could be accommodated by the feed-forward modular architecture, lexeme frequency effects cannot. Second, there is evidence of morphological information influencing phonetics: e.g., Strycharczuk and Scobbie (2016) found different degrees of goose-fronting in apparent homophones such as ruler and rul+er. Again, the phonological string is identical in each case, and the influence on phonetics of a module preceding phonology challenges modularity and exclusively abstract storage. Finally, there are the cumulative context effects, where the typical environment in which a lexeme occurs influences its phonetic realization. Seyfarth (2014) found that durations of words which are typically predictable from their immediate context (e.g., current) are reduced more than durations of words which are typically less predictable from their immediate context (e.g., nowadays), even when a particular instance of the typically predictable word (e.g., current) is not, in fact, predictable. Baumann and Ritt (2017) show that the development of the link between morphosyntactic category and word stress in English of the ˈresearch -reˈsearch type can be modeled by assuming that lexical stress is an accumulation of repeated adaptations to phrase-level rhythm. Eddington and Channer's (2010) finding poses another serious challenge to 'abstract only' models. If the likelihood of occurrence of a specific allophonic variant is influenced by the occurrence of a particular lexeme in a particular environment, then the storage of a phonetically rich representation is required. However, the statistical analysis employed in the original study leaves it open to criticism. In the first part of their statistical analysis, the authors fitted a logistic regression model to assess the influence of several variables on the likelihood of t-glottaling: The realization of a final /t/ in a given word was a binary response variable (glottaling versus any other realization). This model, however, did not include a measure of the typical environment of a word among the predictors. The hypothesis that prevocalic t-glottaling across word boundaries is driven by a contextual frequency effect was then tested separately, outside of the regression model, in the second part of the statistical analysis. In it, the authors measured the proportion with which each of the test words was followed by consonant-initial words. To get reliable estimates, they used a corpus much larger than the Santa Barbara Corpus: the Corpus of Contemporary American English (COCA) (Davies, 2010). Words realized with [ʔ] by the speakers were found to be followed by a consonant on average 64% of the time in the large corpus, whereas words realized with other allophones by the speakers were found to be followed by a consonant on average 60% of the time. Using ANOVA, the authors determined this difference to be statistically significant, and drew the conclusion that there is a relationship between the frequency of occurrence before consonants and the likelihood to undergo glottaling. A serious limitation of this approach is that the second part of the analysis did not control for potential confounds. The authors themselves demonstrate the influence of a number of factors on the likelihood of glottaling with their regression model in the first part of their analysis. These possible confounds, however, are not taken into account in the second part, the ANOVA test. Furthermore, the observations in the data set were not independent as they included both multiple tokens of the same words, and multiple words uttered by the same speakers. These two problems increased the risk that the significant result was a false positive. The present study addresses both these issues. The issue of confounding factors is addressed by including the proportion of consonant-initial words following a given test word in a large corpus (henceforth consonantal proportion) as one of several predictors in a mixed-effects logistic regression model featuring a number of other predictors, known from prior research to be of relevance for t-glottaling. The issue of non-independence of observations is addressed by incorporating random terms for participants and words into the model architecture.

Data
All bigrams-sequences of one word (w₁) immediately followed by another word (w₂)in which w₁ ends in /-Vt/ and w₂ begins with a vowel were retrieved from the Buckeye corpus  with the help of LaBB-CAT (Fromont & Hay, 2012), a publicly available corpus annotation and management suite. The Buckeye corpus is very wellsuited to this study as it contains unscripted speech of a homogenous group of speakers of a variety for which t-glottaling has not been the center of attention. It also comes with hand-corrected phone-level annotations, including allophones of /t/ (an evaluation of the accuracy of the transcriptions is described below). Cases where w₂ was not a lexical item, that is when it was represented as one of the following orthographic forms: um, uh, oh, o, uh-huh were excluded from analyses. This resulted in 7,317 hits, occurring in 5,375 speaker turns. The Buckeye Speech Corpus 2 contains recordings of 40 speakers from Columbus, Ohio. The participants were recorded conversing with an interviewer on everyday topics. The recordings are altogether around 40 hours long and contain approximately 300,000 word tokens. The city of Columbus, according to the dialectal division of North America of Labov, Ash, and Boberg (2006) belongs to the Midland dialect region. Speech patterns in this region can be expected to be typical of North American English, to the extent that any particular area can. "Many features of the Midland are the default features-that is, the linguistic landscape remaining when marked local dialect features are eroded" (Labov et al., 2006). While there are vocalic features that nonetheless separate the Midland dialect from the neighboring Northern and Southern dialect areas (such as the 'close approximation' of the vowels in the lot and thought lexical sets, Labov et al., 2006, p. 264), consonantal features, or glottaling specifically, is not known to show any particular pattern in this dialect. The speakers in the corpus are stratified for age (two categories: under 30 and over 40, actual ages are not provided to the corpus user) and gender.
The corpus comes with, among others, a phone-level annotation layer produced by "a group of trained phonetic annotators […] paid for corpus preparation" , p. 2341, based on spectrograms, waveforms, and auditory cues. The phone-level annotation layer includes /t/ allophony. A segment was labeled by the corpus annotators as [ʔ] if it "had perceptually creaky voicing accompanied by irregularity in pitch period timing in the waveform" , p. 2342. This study relies on these transcriptions, as there are good reasons to believe that the accuracy of coding of /t/ allophony in Buckeye's segment annotations is high. First, in a published study (Pitt, Johnson, Hume, Kiesling, & Raymond, 2005), the intertranscriber reliability for stops was found to be 93%. Second, in a study investigating sequences of word-final /t/ followed by /j/-initial words (/V_#j/) e.g., about you (Kaźmierski, Wojtkowiak, & Baumann, 2016), 65% of /t/s (528/808) were hand-coded as glottaled, based on a drop in amplitude and lack of release burst. A look at the transcriptions provided with the corpus reveals that 54% (446/833) are labeled as [ʔ] (the slight difference in N seems to stem from differences in querying with the bundled software versus with LaBB-CAT). Finally, in a study even more directly related to the present one, Seyfarth and Garellek (2015) inspected a subset of coda /t/s in the Buckeye corpus and found that the vast majority of [ʔ] labels (1,762/1,824, or 97%) correspond to glottal replacement, with no format transitions or release bursts indicative of [t]. The remaining small minority of cases (62/1,824, or 3%) correspond to glottal reinforcement [ʔ͜ t], as they have a detectable release burst. 3 Taken together, these results suggest that the accuracy of transcriptions of [ʔ] is high, and that, if anything, [ʔ] could be under-rather than over-reported in Buckeye's annotations.
All cases where no consonantal sound was present as a transcription of the final sound of w₁ were treated as cases of deletion. Cases where the final consonant of w₁ was transcribed as a voiced plosive /d/ were included together with those transcribed as a flap. The remaining cases, where the final sound of w₁ was transcribed with a plain 't' symbol were coded as [t], denoting a voiceless alveolar plosive. The rightmost column of Table 2 shows a general overview of the initial data set (N = 7,317). A comparison of proportions of the realizations of /t/ in Eddington and Channer's (2010) results and in the present data set shows a lower glottaling rate, and higher flapping and deletion rates for Buckeye.
For further analysis, including regression modeling, only cases where /t/ was realized either as a flap or glottal stop were kept (N = 5,803). This is motivated by the observation that it is these two allophones of /t/ that compete directly in word-final prevocalic position (cf. Eddington & Channer, 2010, p. 344). The realization of /t/ as [ɾ] or [ʔ] was therefore included as a binary response variable glottaled, with [ɾ] as a reference level, so that effectively the likelihood of glottaling over flapping is modeled. Figure 2 shows the final data set broken down by gender and age. A glance at the raw data shows that there are no speakers categorically preferring only one realization-everyone 3 The difference between glottal reinforcement and glottal replacement is not always unambiguous based on acoustics alone, however. The adduction of the vocal folds completely overlapping with the alveolar closure [ʔ͜ t̚] would mask the latter. The prevalence of such masked articulations is unknown, though Huffman (1998)   shows variability between the two. Further, younger female speakers seem to have higher rates of glottaling than the three other groups, with eight out of ten younger female speakers having a glottaling rate above the mean. Going against this general trend, Speaker 19, an older male speaker, has the highest glottaling rate in the data set.

Analysis
In any corpus of unscripted speech, the data is, by definition, not controlled at the collection stage, allowing confounding variables to be present. Thanks to multiple regression modeling, where several predictors are included in the same model, confounds can be dealt with at the stage of statistical analysis (Baayen, 2008). In other words, the influence of a variable of theoretical interest can be estimated while the influence of the other predictors is kept constant. In the present analysis, control variables were added mostly through automatic annotation functionality of LaBB-CAT (Fromont & Hay, 2012), as well as data transformation functions of the R package dplyr (Wickham, Francois, Henry, & Müller, 2017), as discussed separately for each variable below. A mixed-effects logistic regression model was then fitted to the data with the glmer() function from the lme4 package (Bates, Mächler, Bolker, & Walker, 2015) in R (R Core Team, 2018). The inclusion of random effects, that is the use of mixed-effects modeling, is appropriate where several data points are grouped, as is the case here, and is known to prevent the inflation of the rate of Type I errors (Baayen, Davidson, & Bates, 2008). Accordingly, to account for within-speaker variation that is not due to any of the variables listed in Sections 3.1-3.2 below, and to account for the lack of independence of multiple data points coming from the same speaker, speaker was included as a by-subject random intercept term. This takes care of differences in rates of glottaling across speakers. Furthermore, as the influence of the test variable consonantal proportion (see Section 3.1 below for details on the test variable) might be different on different speakers, a by-subject random slope for consonantal proportion was included (while including by-subject random slopes for all fixed terms would be desirable, it resulted in singular fit and was therefore not feasible). Similarly, word-level idiosyncrasies could not be ruled out, and were accounted for by including word, that is each /t/-final test word, as a by-item random intercept term. Bringing the influence of confounds under statistical control by including them as fixed effects, as well as accounting for interdependencies between data points by including random intercepts, is seen as a considerable improvement compared to the ANOVA test of the study by Eddington and Channer. Finally, as a remedy to initial model convergence issues, the BOBYQA optimizer was used.

Test variable: Consonantal proportion
The main variable of interest for this study is the proportion of consonant-initial words following a given word in a corpus. The motivation for investigating its influence on t-glottaling comes from Eddington and Channer (2010), yet the details of how it was computed have been slightly improved compared to that study. Here, it was computed based on the SUBTLEX-US (Brysbaert & New, 2009) corpus, a large collection of movie and TV series subtitles, shown to yield frequency measures that give accurate predictions in reaction time experiments. With the help of AntConc, a freeware corpus analysis toolkit (Anthony, 2014), all bigrams where the test word appeared as the first word were retrieved. Next, all words were supplied with a phonological transcription from the CMU pronouncing dictionary with the LOGIOS Lexicon Tool. 4 Finally, for each test word the number of consonant-initial following words was divided by the sum of consonant-initial following words and vowel-initial following words. By using phonological transcriptions instead of orthography, which was the case in the original study, higher accuracy has been achieved, as words with a consonant letter in spelling where none is present in pronunciation (e.g., hour) and words with no consonant letter in spelling where there actually is one in pronunciation (e.g., use) were classified correctly.
As is illustrated in Figure 3, the /t/-final words analyzed in the present paper tend to be followed by consonant-initial words more often than by vowel-initial words, based on SUBTLEX-US: The mass of the density in both panels is above 0.5. This is the case both in terms of token counts (mean = 0.672, median = 0.669, SD = 0.142, n = 5,803) and in terms of type counts (mean = 0.685, median = 0.696, SD = 0.111, n = 209). Incidentally, the assumption that glottaling is favored before consonants is bolstered by the statistics of glottaling of post-vocalic word-final /t/ across word boundaries in the Buckeye corpus. When the following word starts with a consonant, 5,112 out of 14,124 cases (36%) are glottaled, and when the following word starts with a vowel, only 870 out of 7,323 cases (12%) are glottaled.
This continuous variable was centered (by subtracting the mean) and standardized (by dividing by one standard deviation), giving consonantal proportion present in the tables and charts to follow. The hypothesis with regard to this variable, and so the main hypothesis of the study, is that it will be positively associated with the likelihood of a test word undergoing t-glottaling.

Control variables
Based on a review of existing relevant research on speech variation generally and t-glottaling specifically, the following control predictors were included in the model.

The frontness of the initial vowel of w₂
There are conflicting findings in the literature with regard to the influence of the frontness of the initial vowel of the following word on the likelihood of glottaling. On the one hand, Eddington and Taylor (2009) suggest that glottaling is favored by following front vowels, while Ostalski (2013) seems to have found the opposite. Whichever way the direction goes, these findings at the very least suggest that the frontness of the initial vowel of w₂ may be of some relevance to glottaling, and so it was included in the model as a binary predictor variable w₂ frontness. It was retrieved automatically from CMU transcriptions. It was entered into the model as a binary, treatment-coded variable (levels: not front, front, reference: not front).

The presence or lack of stress on the initial syllable of w₂
Previous research suggests that if the word-final /t/ is followed by a stressed syllable, it is more likely to be glottaled (Eddington & Channer, 2010). Therefore, a w₂ stress variable was included. It was derived automatically from the CMU transcriptions of each w₂ in the following manner. For monosyllabic words, if it was a function word (either a pronoun, preposition, conjunction, article, or interjection), it was coded as unstressed (unless it was one of the following words: all, each, own). If the monosyllable was a content word, it was coded as stressed. For polysyllables, the initial syllable was coded as stressed if it is transcribed either with primary or secondary stress in CMU. w₂ stress was entered into the model as a binary, treatment-coded variable (levels: stressed, unstressed, reference: stressed).

Bigram frequency
Words that frequently occur together are more likely to be planned together as a unit compared to word sequences that do not occur frequently next to one another (Bush, 2001;Tanner, Sonderegger, & Wagner, 2017). Being planned as a unit can be seen as making such sequences more word-like, and the phonological behavior at word edges can therefore be expected to approximate word-level phonology. In the present case, if the final /t/ of w₁ in frequent bigrams is subjected to the pressures of within-word phonology, it can be expected to flap more often. To account for this, log-transformed frequencies computed from the results were included as a continuous bigram frequency fixed effect in the model.

Speech rate
Research on motor planning in speech production suggests that fast speech rate might increase the influence of the neighboring phonological environment on the phonetic shape of a given form (Tanner et al., 2017). The hypothesis behind it is that at higher speech rates, larger chunks of speech are planned together. In the present case, the more likely a given bigram is to be treated as a single unit, the more precedence word-internal restrictions should take. For word-final prevocalic /t/, this suggests that higher speech rates should favor flapping. Speech rate was computed as a number of syllables (taken from CELEX2, Baayen, Piepenbrock, & Gulikers, 1995) in a given speaker turn divided by the length of that turn in seconds, yielding a syllables-per-second speech rate measure.
The resulting values were log-transformed and centered separately for each speaker by subtracting from each value a given speaker's mean. As a result, a speech rate deviation variable was produced, reflecting the hypothesis that it is speeding up or slowing down relative to one's habitual speech rate that might influence rates of t-glottaling, rather than that habitual speech rate itself (cf. Tanner et al., 2017).

Age
There is some indication in the literature that rates of t-glottaling are sensitive to the age of the speaker. In Eddington and Channer (2010), the rates were lower for speakers aged 30 and older. Such age-stratification might indicate change in progress. However, as is always the case with apparent-time data, age grading, namely "a regular change of linguistic behavior with age that repeats in each generation" (Labov, 1994, p. 46) cannot be excluded. Roberts (2006), in her study of a more homogeneous group of speakers-47 Vermonters-reports a more complex pattern. In her study, adolescents and older speakers have higher rates of glottaling compared to the middle group, namely parents. This could suggest age grading of glottaling in this community, as the group most likely to conform with societal pressures seems to avoid the (locally) stigmatized variable (Roberts, 2006, p. 231). Not having a hypothesis as to how age of the speakers from Columbus might influence their glottaling rate, age was still included as a binary treatment-coded variable (levels: younger [<30] and older [>40], reference: older, according to the speaker metadata present in the corpus) to account for the possibility that it does play a role.

Gender
In an early study of glottaling in the United States, Byrd (1994) found that the overall use of glottal stops was greater for women than for men, regardless of position in a word. On the other hand, glottals were more frequent among male than female speakers in the studies of Levon (2006) and Roberts (2006). This difference could reflect the social salience of glottaling in the communities studied there, as socially salient variables often show gender stratification (Trudgill, 1974;Labov, 1990). For the present data set, as no research is available as to the influence of gender on glottaling in Midland American English, there is no specific hypothesis about this relationship. However, given its role in language variation in general, and its influence on t-glottaling in other communities specifically, gender was included as a binary treatment-coded variable (levels: female, male, reference: female).

Interaction of age and gender
On top of having age and gender as fixed effects, there is also a good reason to include an age:gender interaction term in the model. Eddington and Taylor (2009) found that younger female speakers were the gender/age combination that glottaled the most of the four possible combinations. As Figure 2 suggests, the present data set might point in the same direction. Therefore, the inclusion of this interaction term is seen as theoretically justified. This finding, incidentally, might be seen as an indication of an ongoing sound change, as young women tend to be the leaders of change (Eckert, 1988;Labov, 1994).

Results
The results of regression modeling are summarized in Table 3. As is standard practice, p-values for this model were calculated based on asymptotic Wald tests. Each coefficient represents the estimated change in log-odds of glottaling over flapping as the value of the predictor increases (for continuous predictors) or when the level changes from reference to the one indicated in brackets (for categorical predictors). For age [younger], the coefficient obtains when gender is held at female, and for gender [male] the coefficient obtains when age is held at older As the table shows, the contribution of a number of predictors has turned out to be statistically significant. Partial-effect plots of all statistically significant terms are presented in Figure 4.

The test variable
Crucially, the variable of prime interest for this study, that is consonantal proportion, has a positive coefficient (β = 0.43), with an associated p-value of 0.002. As such, the test variable has been shown to be positively correlated with the likelihood of glottaling over flapping, at a statistically significant level. This effect is visualized in Figure 4, Panel A. The influence of contextual frequency, specifically of the frequency of occurrence in preconsonantal environment has therefore been confirmed, replicating the effect found by Eddington and Channer (2010). In the present study, this effect transpired even when other factors known to influence rates of t-glottaling have been accounted for by including them as covariates in the model, and when random effects were part of the model architecture, thus making a stronger case for the contextual frequency effect than the original study did.

Other linguistic variables
The frontness of the initial vowel of the following word has not turned out to be statistically significant (β = -0.01, p = 0.866), thus supporting neither Eddington and Taylor (2009) nor Ostalski (2013). There are two possible interpretations of this negative result: Either the influence of the frontness of the following vowel is indeed negligible, with previous findings perhaps showing experimental artifacts, or the effect is too small to be detected with the present data set. As vowel quality is not of primary concern here, it will not be pursued further.
With regard to the influence of lexical stress (Panel D in Figure 4) t-glottaling has turned out to be more likely if the following syllable is unstressed (β = 0.4, p < 0.001). This result is opposite to that of Eddington and Channer (2010). One possible reason for Figure 4: Effect plots for statistically significant model terms. For each term, the effect of the remaining terms was averaged using the effects package (Fox & Hong, 2009). The y axis in each plot represents the probability of glottaling. Continuous predictors (Panels A, B, and C) include jittered raw data points, included at the bottom (no glottaling) and top (glottaling). To allow the meaningful placement of the raw data points, the full range of probabilities is used in these panels. For categorical predictors (Panels D, E, and F), a selected portion of the probability scale (0-0.4) is plotted for greater readability. this differing result, suggested by an anonymous reviewer, is the difference in how stress was coded. Eddington and Channer (2010) took phrasal prominence into account when coding stress: For example, function words which "were given an emphatic rendering by the participant" (Eddington & Channer, 2010: 342) were coded as stressed. In contrast, in the present study only lexical stress was considered, and all function words were coded as unstressed (except all, each, and own). Perhaps more importantly, however, an after the fact investigation revealed that the effect of stress in the present study might be confounded by cases where the initial syllable of the following word was realized as a syllabic nasal: 328 such realizations were discovered in the data set. For instance, while the /t/ at the end of not in a bigram in the data set, not intentional, is intervocalic as far CMU dictionary transcription used for retrieving tokens is concerned, its actual realization in the corpus was [nɑʔn̩ tɛnʃnʌl]. Syllabic /n/ is known to favor glottaling (Zue & Laferriere, 1979). Indeed, in the present data set, the glottaling rate in cases where the initial segment of w 2 is transcribed as a nasal is 94% (307/328). By comparison, the glottaling rate in the remaining subset of the data is only 15% (827/5,475). In a model fit to the subset of the data set after removing the 328 cases where [ʔ] was not actually intervocalic, the effect of w 2 stress went in the same direction as in the original model, but was not significant (β = 0.17, p = 0.14). In the full data set, the glottaling rate is slightly higher before unstressed syllables (936/4,702; 20%) than before stressed syllables (198/1,101; 18%). In the data set with the following nasals removed, however, the glottaling rate is slightly lower before unstressed syllables (646/4,394; 15%) than before stressed syllables (181/1,081; 17%). Note that remaining effects are not substantially different compared to the original model: Crucially, the effect of consonantal proportion remains almost unchanged at β = 0.41 (p = 0.003). The extent to which syllabic nasals figure into the analysis of the Santa Barbara Corpus in Eddington and Channer (2010) is unknown. At any rate, the effect of following stressed syllables favoring glottaling found by Eddington and Channer, who took phrasal prominence into account when coding for stress, might support the influence of the presence of prosodic boundaries on glottaling. The effect of stress found in the present paper, due to the unfortunate inclusion of a confound, must remain inconclusive. The corpus frequency of the bigram (bigram frequency) and speech rate deviation, had an effect agreeing with that suggested by previous literature. Both are negatively correlated with the likelihood of glottaling: bigram frequency (β = -0.13, p < 0.001) and speech rate deviation (β = -0.12, p = 0.005). Seemingly, then, both an increase in speech rate and high bigram frequency are conducive to w₁ and w₂ being chunked together during motor planning, in which case a more word-like behavior of the two-word sequence transpires. In the present case, this means a higher chance of flapping, the process expected word-internally, to occur (cf. Kilbourn-Ceron, Clayards, & Wagner, 2020).
Given the effect of speech rate, as suggested by an anonymous reviewer, one may wonder whether the lower incidence of glottaling in the Buckeye corpus (15.5%) compared to the Santa Barbara corpus (23.8%) may be due to different overall speech rates in the two corpora. To investigate this possibility, I calculated articulation rate, that is the number of syllables per second of phonation, for all speakers in the two corpora. The extraction was automated with the syllable_nuclei script (de Jong & Wempe, 2009), which detects syllable nuclei, discards pauses, and calculates speech rate metrics. The speakers in the Buckeye corpus tend to speak faster (mean = 4.21, median = 4.16, SD = 0.425, unit: syllables per second) than speakers in the Santa Barbara Corpus (mean = 3.53, median = 3.81, SD = 1.02). As faster speech rate disfavors glottaling, the faster speech rate in the Buckeye corpus may be linked to lower glottaling rates. However, the difference in articulation rates as measured by the syllable_nuclei script might be exaggerated. While the exact same methodology was applied to both corpora, the script reported suspiciously low articulation rates for some speakers in the Santa Barbara corpus (all speaker means are presented in Figure 5). When extreme observations (absolute difference from the mean greater than two standard deviations) are removed, however, the difference in articulation rate between the two corpora persists (Buckeye: mean = 4.19, SD = 0.4, Santa Barbara: mean = 2.72, SD = 0.78).

Social variables
Predictors relating to social variables were age, gender, as well as the interaction term gender:age. The effect of age for the female speakers is statistically significant (β = 0.89, p = 0.002), with the female speakers in the younger age group (<30) showing higher predicted rates of intervocalic t-glottaling than female speakers in the older group (>40) (Panel E in Figure 4). The effect of gender for the older age group is not statistically significant (β = 0.12, p = 0.695). However, as the significance of the interaction term between gender and age shows (β = -0.85, p = 0.034), gender does play a role in influencing t-glottaling in that there is an appreciable gender divide among younger speakers (this effect is illustrated in Panel F of Figure 4): Younger male speakers glottalize significantly less than younger female speakers. Of the four gender by age group combinations coded in the data set, younger women have the greatest predicted probabilities of t-glottaling. This effect is similar to Eddington and Taylor's (2009) results, but not Eddington and Channer's (2010) results, who found no effect of gender at all. The leading role of young women in t-glottaling in the present study stands in contrast to both Roberts (2006) and Levon (2006), suggesting that the social patterning of t-glottaling for  Vermonters and for New York's Reform Jews investigated in the two respective studies is different than for the speakers from Columbus.

Discussion and conclusion
With regard to the first goal of the paper, the replication of the contextual frequency effect, it has been confirmed that /t/-final words that typically occur before consonantinitial words undergo glottaling at higher rates than words that occur before consonantinitial words less often. As to the second goal of the paper, the quantification of the factors influencing rates of prevocalic t-glottaling across word boundaries in Midland American English, bigram frequency and speech rate deviation have been found to be negatively correlated with t-glottaling, while the results concerning the frontness of the initial vowel of the following word, and the presence or lack of stress on the initial syllable of the following word remain inconclusive. Finally, an effect of an age by gender interaction was discovered. The implications of the effect of social variables will first be discussed, before turning to the wider implications of the replication of the finding of the contextual frequency effect for models of phonological representations.
In the present study, younger age is positively correlated with t-glottaling for female speakers. While the effect of gender for older speakers is not significant, in light of the interaction that gender enters into with age, it (gender) is by no means an irrelevant variable. While for older speakers there is hardly any difference in rates of intervocalic word-boundary t-glottaling between women and men, for younger speakers there is an appreciable difference between the genders. In Columbus, young women are the group glottaling the most. Variants favored by young women have been repeatedly shown to be the variants on the rise in cases of linguistic change in progress (Eckert, 1988;Labov, 1994).
To the extent that this pattern is attested in the present data set, it provides an indication that t-glottaling might be undergoing a change in progress in Columbus. This apparenttime indication, however, would of course have to be supplemented with real-time data to ascertain whether or not t-glottaling is on the rise. The case for a change in progress is strengthened by recurring indications of higher rates of t-glottaling among younger speakers in other parts of the country. Eddington and Taylor's (2009) participants came from Western states (N = 42) (the majority of these from Utah) and from non-Western states (N = 16, including the North, the South, the Midland, and the Mid-Atlantic), and showed the same kind of gender by age interaction as found here, with younger women having highest rates of t-glottaling. The Santa Barbara Corpus speakers selected by Eddington and Channer (2010) also came from both Western (N = 19) and non-Western (N = 21) states, and also showed a facilitative effect of age, though not interacting with gender. Studies focusing on particular speech communities provide an interesting comparison. For New York Reform Jews, t-glottaling seems to be tied to their highly specific 'mosaic identity and style' (Levon, 2006), where being affiliated with multiple social groups may exert conflicting pressures: The secular setting of an interview, as well as secular topics, seem to favor glottaling over audible alveolar release for the two young speakers, in contrast to the religious setting of the Youth Group, and religious topics, which disfavor glottaling. Another particularly interesting case is Vermont (Roberts, 2006), where t-glottaling used to be a stigmatized feature, and yet is now spreading among young speakers who otherwise move away from local speech patterns and accommodate to supralocal norms, leading Roberts to posit that the 'new' glottaling might be a new and different feature altogether. All these findings taken together potentially point to a 'nationwide change in progress,' which Midland American English is participating in. Several such supralocal changes have been posited in the vocalic domain. One of them is the low-back (or lotthought) merger, which, at least in the Midland and the South "is not spreading from any particular point(s) of origin but is appearing roughly simultaneously in several states" (Johnson, 2010, p. 10). Similarly, the Elsewhere Shift (sometimes seen as related to the lot-thought merger, cf. Becker, 2019), with the retraction/lowering of the vowels of kit and dress, and with the nasal system of trap, recently documented in Lansing, MI (Nesbitt, 2018), was previously found in places so distant from one another as California (Hagiwara, 1997) and Canada (Boberg, 2005).
The hypothesis put forward by Eddington and Channer (2010) has stood its ground when confronted with a different data set and with an improved statistical analysis. The finding that the frequency with which a /t/-final word is followed by consonant-initial words increases the likelihood of that word to undergo glottaling even before vowel-initial words has therefore gained further support. This result contributes to the growing body of evidence that detailed phonetic information is stored in the mental lexicon. The frequency of occurrence in preconsonantal environment can raise the likelihood of glottaling in prevocalic position only if a representation with a glottal stop is stored, as it is not derived online prevocalically. One solution proposed to account for such effects is offered by exemplar models of phonology (e.g., Johnson, 1997;Bybee, 2001). These assume that a phonetically detailed trace of each perceived token is stored.
However, doing away with abstract representations altogether would be a problematic solution: There is after all a vast amount of linguistic data that has led to the development of abstractionist theories to begin with. The pioneering findings of historical linguistics would not have been possible without the stability of lexical classes (as noted for example by Bermúdez-Otero, 2015). Sound changes such as Grimm's Law operated on sound categories shared by multiple lexical items, rather than, or at least in addition to, individual words. Pronunciation of neologisms and loan word assimilations, where sounds of the native language are employed, also attest to the existence of a level of abstract representation (Pierrehumbert, 2001). Consequently, recognizing the need for both abstract and phonetically rich representations, several descriptions of a viable compromise-a hybrid view of representation-have been proposed (Ernestus, 2014;Goldinger, 2007;Nguyen, 2012;Pierrehumbert, 2002Pierrehumbert, , 2006Pierrehumbert, , 2016. Indeed, research on the perception of word-final /t/ allophony points to the role of both abstract (Sumner & Samuel, 2005) and phonetically rich (Garellek, 2011) representations. Investigating the variation in word-final /t/, Sumner and Samuel (2005) [ʔ] were not stored, one would expect an advantage of the canonical form [t], as processing the other two variants would involve additional computation or relying on non-canonical cues. If they were stored, one would expect [ʔt̚], the most frequent variant of /t/ prepausally to show an advantage. Neither effect was found: All three variants did better than a mismatched form (e.g., [fluːs]). This null effect is not conclusive with regard to storage. The absence of evidence of a difference cannot be taken as evidence of absence of such a difference: It might be too small to detect with the semantic priming paradigm used by Sumner and Samuel (2005). Indeed, using a phoneme monitoring task, Garellek (2011) found that [ʔt] tokens were recognized faster than canonical [t], suggesting that a more frequent variant of final /t/ is recognized more easily. Sumner and Samuel (2005) further found that only the canonical form, that is [t], shows a long-term priming effect (e.g., [fluːt], but not [fluːʔt̚] or [fluːʔ], primes [fluːt] in a lexical-decision and in a newold recognition task). This effect, present across two different tasks, supports the primary role of the canonical form, that is of a form with only and all the contrastive features of English phonology, in long-term priming. While this finding of Sumner and Samuel points to the role of abstract representations of final /t/ in perception, the effect found by Eddington and Channer (2010) and confirmed here bolsters the support for phoneticallyrich representations of the same final /t/ in speech production.
Taken together, these two effects support a hybrid model of phonological storage, with both abstract and phonetically rich levels of representation, relevant for different types of processing. To take a brief look at the literature on speech recognition, the role of two levels was demonstrated by several studies (McLennan, Luce, & Charles-Luce, 2003;Vitevitch & Luce, 1998). 5 Tasks involving strong lexical competition (using words with high neighborhood densities, or hard lexical-decision tasks), phonologically ambiguous input, and ample time allowed for recognition, seem to be tapping the abstract level.
Other tasks, such as auditory naming, easy-discrimination lexical decision, little time allowed for recognition, seem to be tapping the phonetically rich level. Several hybrid models of phonological storage, promising to accommodate such findings, have been proposed (cf. Ernestus, 2014;Goldinger, 2007;Nguyen, 2012;Pierrehumbert, 2002Pierrehumbert, , 2006Pierrehumbert, , 2016. For the effect confirmed in the present study, detailed representations with a glottal stop and a flap need to be postulated. Interestingly, since in connected speech both of these phonetically quite different variants can occur in the same context, it is clear that their abstract representation with /t/ cannot explain their usage patterns. Instead, phonetically detailed representations become more entrenched with use. This interpretation is consistent with the finding of Dilley and Pitt (2007) that high-frequency /t/-final words show more variability in the realization of /t/ than low-frequency words. Higher lexical frequency entails frequent occurrence in a variety of phonological contexts, and different contexts favor different allophones. The more often a particular word occurs in environments favoring one of the representations, the more entrenched, and the more likely to be used that phonetic form becomes. Hence, a word typically occurring before consonants shows higher glottaling rates, even in prevocalic position. But for the entrenchment in preconsonantal position to take place, the production of [ʔ] must be linked to an activation of an abstract category /t/. Repeatedly, the phoneme finds itself in preconsonantal context, which favors glottaling. Such occurrences strengthen the links between the phoneme in this word, and its allophone [ʔ]. The allophone is then activated even in environments which do not favor this allophone. The stronger the link, the higher the likelihood of glottaling. A functionally-oriented take on t-glottaling, namely that the glottal constriction enhances the recognition of /t/-final words by strengthening the cues to the voicelessness of the coda, was not substantiated (Chong & Garellek, 2018;cf. Seyfarth & Garellek, 2015). Instead, Chong and Garellek (2018) found that glottaling simply does not inhibit the recognition of /t/-final words, while it does inhibit the recognition of /d/-final words. Glottaling, it seems, can spread through the /t/-final words as it does not incur any cost here. These, and other factors influencing t-glottaling can be seen as forming part of the nexus of selection forces on the spread of t-glottaling, one of them being contextual frequency. And, to conclude, while prevocalic t-glottaling presents another case where the 'abstract only' view of phonological representation is insufficient (cf. Baayen, 2007), a hybrid model, rather than an exemplar model of phonological storage is best equipped to accommodate the finding that the rate of glottaling of prevocalic /t/s at word boundaries increases with the frequency of occurrence of a given word before consonant-initial words.