Temporal variation in African American English: The Distinctive Use of Vowel Duration

Introduction: African American English (AAE) is a unique dialect of American English that differs systematically from the variety spoken by the White population. Acoustic-phonetic explorations of segmental structure of AAE including vowel and consonant productions are still rare and the current state and developmental direction of AAE in the United States relative to dialects of White American English (WAE) are largely undetermined. Particularly little is known about timing patterns in AAE such as segmental durations, speech rate and rhythm. Objective: The purpose of this study was to better understand temporal variation in AAE by analyzing vowel


Introduction
African American English (AAE) is a unique ethnicity-based dialect of American English that differs systematically from the variety spoken by the White population, often termed White American English (WAE) [1,2]. AAE has its own system of rules. For example, certain morphosyntactic constructions are allowed, such as marking of the remote past tense using emphatically spoken "been" as in He been left, meaning he left a long time ago. Another unique feature of AAE includes the use of "steady" to describe a regularly occurring action as in She steady lying, meaning she has lied in the past, and will continue to do so for the foreseeable future. Notable colloquialisms include variable marking of possession and plural such as My mama house, meaning the house belonging to my mama, or fifty cent, meaning fifty cents.
Systematic variation in the semantic, morphosyntactic, prosodic, and phonological aspects of AAE has been well documented [3,4], albeit more on the basis of impressionistic observations than employing rigorous instrumental measures. Given the general shortage of experimental evidence, there are many unanswered questions with respect to the current development of AAE in the United States. These questions pertain primarily to how uniform AAE is across the US and whether AAE and WAE are converging or becoming less like each other, perhaps maintaining a relatively constant distance from each other. Phonetic details supply unprecedented richness of information for illuminating the developmental direction AAE is taking and how is it changing. However, acoustic-phonetic analysis of the segmental structure has been marginalized in the study of this dialect and acoustic explorations of vowel and consonant production in AAE are rare. Research over the last decade has begun to uncover some acoustic-phonetic details in AAE vowel systems in different geographic regions in the US, providing mixed evidence [5][6][7]. To that end, AAE speakers have variously been found to participate in regional sound change, to innovate sound change or to resist sound change.

The current status of AAE
Linguistic studies of AAE across the United States can be generally classified as analyses of transplanted versions of the dialect brought by southern African Americans moving north and west during the period of the Great Migration at the beginning of the 20 th century. This generalization is not intended to erase the history or linguistic contributions of African Americans that have lived in the north since the 1660s [8]. We need to recognize that analysis of dialect variation in AAE outside of the south is likely to include southern AAE transplants who migrated north and west in the mass exodus of African Americans between 1910 and 1970, and their descendants. Conversely, analyses of AAE in the large urban centers in the south are likely to include AAE participants whose families returned to the southern cities, from the north, in a reverse migration beginning in the mid 1970s [9,10]. The mass movements of African Americans throughout the US complicate the analyses of AAE. It is the case that common morphosyntactic and phonological features are consistently observed in AAE varieties spoken in disparate geographic regions, including vocalization of /r/ and /l/, monophthongization of /aɪ/, the merger of /i/ and /e/ before nasals, final consonant cluster reduction, -ing dropping, and lack of plural and possessive /s/ markers [1]. The similarities in AAE across geographic regions led to a hypothesis of a supra-regional or pan dialect of AAE, although Wolfram [11] cautions against such overgeneralization, pointing out that variation in the dialect has been plainly apparent in the earliest data collected in Detroit and New York in the late 1960s. Thomas [4] defined an AAE vowel system that was quite similar to the southern system affected by a set of changes called the Southern Vowel Shift. A collection of studies in [12] provided evidence for both convergence and divergence from the local dialect features across the US. Each community presented a unique picture of the relationship between AAE and the local regional dialect so that AAE speakers were converging along some parameters in some communities and failing to converge in others.
Especially little is known about the temporal organization of AAE, including segmental durations, speech rate and rhythm. Timing patterns are an important dimension of speech prosody and prosodic variation provides salient cues differentiating dialects. Recently, it has been reported that AAE speakers produce significantly longer vowels than WAE speakers [7] and that the prosodic rhythm of contemporary AAE speakers evidences more stress timing than syllable timing, a pattern consistent with WAE [13]. The current study contributes to a better understanding of temporal variation in AAE by examining the durations of vowels preceding voiced and voiceless word final stop consonants.

Word final consonant voicing variations in AAE
It is the case that AAE allows for word final consonant variations including devoicing of stop consonants (the word bad sounds more like bat) and even a complete deletion of the final consonant (the word cat is produced as ca') [14,15]. Devoicing and deletion of the voiced stops /b/, /d/ and /g/ was reported in African American speech in Detroit in the 1970s [16]. More recently, it was found that AAE speakers in Michigan typically deleted the word final voiced consonants, while the WAE speakers devoiced the segment [17]. Importantly, the AAE speakers compensated for this deletion by producing the preceding vowels with significantly longer duration than the WAE speakers. The use of increased duration to compensate for a decreased phonetic contrast has also been found in WAE [18,19] but these spectro-temporal trade-offs pertained to vowel contrasts rather than to relations between vowels and consonants.
There is some evidence that the cue of increased duration in the identification of word-final voiced stops may be a feature more salient to AAE listeners than to WAE listeners. One study found that AAEspeaking children could accurately discriminate minimal word pairs such as cub and cup when the final consonant was devoiced, but the WAE speaking children could not [20]. Measurements of the vowels produced by the children themselves were aligned with their perceptual responses, showing that the AAE children produced longer vowels preceding the word final intended voiced stop even though they devoiced the stop. The conclusion was that the AAE speaking children's superior performance on the discrimination task may have been due to their use of vowel length as a phonemic cue to the identity of the final consonant, a strategy that was not employed by the WAE speaking children.
Research indicates that production of word final stops /t/ and /d/ may be socially conditioned in many dialects of English [14,[21][22][23][24]. In particular, deletion and glottalization of /t/ and deletion, glottalization or devoicing of /d/ has been observed so frequently that the variation is considered a ubiquitous feature of vernacular dialects [25]. It needs to be pointed out that this variation is generally regarded to be a function of morphophonology, reflecting higher levels of linguistic organization. Consequently, the durations of the preceding vowels-belonging to the realm of acoustic phonetics-were not considered in the literature on this topic. Thus, while variation in word final /d/ and /t/ including devoicing and deletion may occur in many dialects of American English, the enhanced lengthening of vowels before word final voiced stops (whether produced as voiced, devoiced or deleted) appears to be restricted to AAE.
Given the general lack of acoustic evidence, the feature has not been firmly established as a feature of AAE. The two studies that found the increased vowel durations before voiced stops [17,20] were conducted with AAE speakers in cities in the northern US, which were burgeoning centers of AAE culture in the early and mid-twentieth century as a result of the Great Migration and the development of the large African American communities in major northern cities such as New York city and Detroit [26]. However, no systematic instrumental analysis of vowel duration preceding voiced and voiceless alveolar stops has been carried out with AAE speaking adults in the southern US, where the majority of African Americans lived prior to and during this massive migration to northern metropolitan areas. The current study addresses this gap by examining the speech of AAE participants in a small and relatively homogenous southern speech community in the eastern part of North Carolina, a region with a particularly high concentration of multi-generation local AAE families who use many traditional core features of AAE.

The tense-lax distinction in AAE
In addition to addressing the question of increased vowel length before word final voiced stops, the second aim of the current study is to provide further insight into the temporal manifestation of the phonological distinction between the tense and lax vowels in AAE. In WAE, a systematic inherent temporal difference between the longer tense vowels such as /i, e, u/ and their shorter lax counterparts /ɪ, ɛ, ʊ/ Citation: Holt YF, Jacewicz E, Fox RA (2016) Temporal variation in African American English: The Distinctive Use of Vowel Duration. J Phonet has been well established [27][28][29]. However, more recent work has shown that this systematic temporal difference is influenced by regional dialect variation. In particular, the temporal contrast between tense and lax vowels seems to be reduced in Southern American English [30,31]. An even greater reduction of the temporal contrast was reported for AAE speakers in the South [7], suggesting that the phonological tense vs. lax distinction may exhibit considerable temporal variability across dialects of American English including AAE.
It is possible that the extensive temporal reduction of the tense vs. lax distinction in AAE relative to WAE is also a distinctive feature of AAE. However, before reaching this conclusion, further acoustic evidence is needed to determine the size and consistency of the difference. The findings in Holt [7] are inconclusive as the speech samples from AAE speakers were obtained in only one Southern AAE speech community located in western North Carolina. Furthermore, vowel duration was measured in only one consonant environment, before a voiced stop. In English, a voiced consonant has a lengthening influence on the duration of the preceding stressed vowel. This phenomenon has been observed since early acoustic research on the duration of syllabic nuclei in WAE [29]. Although it cannot be ruled out that the tense vs. lax contrast is partially neutralized for vowels preceding voiced consonants, due to the lengthening effect of the following consonant on the lax vowel, it is unclear whether this feature is expressed similarly in WAE and AAE. It is important to establish if the reduced tense vs. lax distinction in AAE is also maintained before a voiceless stop, where vowel duration is unaffected by any lengthening effects associated with a voiced consonant. If the reduced temporal contrast between phonologically tense and lax vowels is a distinctive feature of AAE, it is expected to occur before both voiced and voiceless word final consonants.

Participants
Sixteen male speakers, eight AAE and eight WAE, ranging in age from 20 to 31 years (M=23.9, SD=3.7) participated. They were lifelong residents of Pitt County in North Carolina (NC). The Pitt County community is located in the Coastal Plain, which is one of five Southern dialect regions in NC as shown in Figure 1. The Coastal Plain dialect was of particular interest to the current study. Historically, the state of NC was settled from east to west. From the mid-1860s to the present day, the Coastal Plain region has had the highest proportion of African Americans in the state. Currently, 34.6% of the 168,286 Pitt County residents (6/17) are African American [32]. These population demographics create the right conditions for a high concentration of AAE features due to dense local networks. As research has shown, speakers in high-density networks tend to maintain local dialect norms more often than speakers from less dense networks, in which interpersonal contacts are comparatively sparse [33,34]. We thus expect that AAE speakers in Pitt County will have maintained the core features of AAE, including distinctive temporal vowel characteristics. We also predict that durations of AAE vowels will differ from durations of vowels in WAE speakers from the same geographic area. The WAE speakers were included in this study as a control group representing the local non-AAE features. This prediction is based on findings in Holt [7], who reported statistical differences between AAE and WAE speakers in a different speech community in NC (Iredell County). All participants in the current study were literate and had at least a high school education at the time of recording. The AAE participants demonstrated common features of AAE, includinging dropping, final consonant cluster reduction, and medial syllable reduction or deletion, both during the initial pre-screening and informal recorded interview. All participants were paid a nominal fee for their efforts.

Stimuli
The words used as stimuli were of the CVC-form and contained one of the following 11 target vowels /i, ɪ, e, ɛ, ae, ɑ, u, ʊ, o, ai, ɝ/ embedded between two consonants. The final coda consonant was always an alveolar stop, either a voiced /d/ or a voiceless /t/. Each vowel was followed by an equal number of voiced and voiceless stops. The stimulus set used is listed in Table 1

Procedure
Participants were recruited using flyers, electronic bulletin boards and word of mouth. Each participant was interviewed by the first author and completed a questionnaire to self-identify racial-ethnic group membership. All participants were recorded during the 2014-2015 academic year. All recordings took place in a sound treated booth. For the current experiment, participants read a randomized set of words, one at a time, from a computer monitor, using a headmounted SM10A dynamic microphone (Shure, Niles, IL). Ten practice words were recorded from each participant prior to the experiment in order to familiarize them with the task. The practice words were different than those used in the experiment. All participants were judged to be fluent readers and were able to complete the word reading task without noticeable difficulties. The task was completed under the direct control of the experimenter using a custom MATLAB program. The tokens were recorded directly to a disk at a sampling rate of 44.1 kHz.

Acoustic measurements
The recorded words were digitally filtered and down-sampled to 11.025 kHz prior to acoustic measurements. Fifty-six words were analyzed from each participant for a total of 896 words from all participants. Vowel durations were measured by hand using standard criteria [29]. The vowel onset was defined as the onset of periodicity (e.g. start of voicing) using the waveform display in the Praat software for speech analysis [35] as the primary guide. The vowel offset was defined as the beginning of the stop closure for the final alveolar stop /t/ or /d/. The onset and offset locations were hand labelled using the tier marking in Praat. The labels were re-checked and verified by the first author. A reliability check was performed on all 896 tokens by another experimenter. Specific Praat scripts were then run to measure the duration of each vowel.

Vowel duration before voiced vs. voiceless coda consonant
Average absolute durations in milliseconds for separate vowel categories are shown in Figure 2. As expected, vowels preceding voiced consonants were uniformly longer than vowels preceding voiceless consonants. However, based on visual inspection, AAE vowels before voiced consonants were considerably longer than WAE vowels, whereas vowels preceding voiceless consonants appeared to be of comparable durations. A repeated-measures analysis of variance (ANOVA) with the within-subject factors voicing (voiced, voiceless) and vowel (11 levels) and the between-subject factor ethnicity (AAE, WAE) was used to analyse these durations. A significant main effect of voicing indicated that vowels preceding voiced consonants were significantly longer that vowels preceding voiceless consonants, F(1,14)=383.67, p<.001, ηp2=.965. The main effect of vowel was also significant, F(10,140)=21.38, p<.001, ηp2=.604, which was an expected result because vowels differ in their inherent durations. Although significant, F(10,140)=5.6, p<.001, ηp2=.286, the interaction between vowel and voicing was not of immediate interest in this study because voicing-related variations were expected to interact with differential durations of individual vowels.
The effects of ethnicity were of primary interest in this analysis. While the main effect of ethnicity was not significant, F(1,14)=1.5, p=. 241, ηp2=.097, an important significant interaction arose between ethnicity and voicing, F(1,14)=23.06, p<.001, ηp2=.622. This interaction, displayed in Figure 3, was further explored using independent t-tests. The durations of vowels preceding voiced consonants differed significantly as a function of ethnicity, t(23)=3.33, p=.003, with AAE speakers producing significantly longer vowels than WAE speakers. There was no significant ethnicity-related difference for vowels preceding voiceless consonants, t(21)=.64, p=.527, where both AAE and WAE vowels were of comparable durations. These results demonstrated that, compared with WAE speakers, AAE speakers significantly prolonged vowels preceding voiced consonants but did not differ in their productions of vowels preceding voiceless consonants.

Temporal vowel contrast as a function of consonant voicing
To determine the significance of the enhanced temporal contrast in AAE vowels relative to WAE, ratios (proportional differences) between vowel durations preceding voiceless consonants and durations preceding voiced consonants were calculated for each ethnicity group. The proportional differences (rather than absolute durations in milliseconds) correct for indexical variables such as ethnicity-related inherently longer vowels in AAE and inherently shorter vowels in WAE. The proportional results, displayed in Figure 4 for each vowel category, show that the ratios were smaller for AAE than for WAE. The smaller ratio indicates a greater temporal difference (a ratio of 1.0 indicates no difference). In particular, the mean voiceless to voiced ratio for AAE was .548 (the duration of vowels before voiceless consonants was 54.8% of that before voiced consonants) and for WAE was .681 (the duration of vowels before voiceless consonants was 68.1% of that before voiced consonants). A separate ANOVA for the voiceless to voiced ratios with the within-subject factor vowel and the betweensubject factor ethnicity returned a significant main effect of ethnicity, F(1,14)=15.69, p=.001, ηp2=.529, indicating that the difference between AAE and WAE speakers was significant. The main effect of vowel was also significant, F(4.36,61.02)=3.55, p=.010, ηp2=.202, although this result is not of immediate interest because voiceless to voiced ratios are expected to differ across variable durations of individual vowels. Overall, this analysis established that the temporal contrast between vowels preceding voiced versus voiceless consonants is significantly enhanced in AAE relative to WAE.

The tense-lax distinction
The final set of analyses examined how consonant voicing affected the temporal contrast between tense and lax vowels in AAE and WAE speakers. As reported elsewhere, the tense-lax difference tends to be minimized in AAE when compared to WAE and the size of the difference varies with vowel pair [7,17]. Three tense-lax vowel pairs were selected for the current analysis: /i-ɪ/, /e-ɛ/ and /u-ʊ/. The overall pattern of average durations of these vowels as a function of voicing and ethnicity is shown in Figure 5. In this analysis, we were interested how voicing and ethnicity affected the lax/tense vowel ratio. Again, the duration ratio corrects for the ethnicity-related inherent temporal differences among vowels. The lax/tense ratio was obtained by dividing the duration of the shorter lax vowel in a pair by the duration of the longer vowel. Smaller lax/tense ratio represents greater temporal contrast between the two vowels and no temporal difference will yield a ratio of 1.0. Based on previous results, the lax/tense ratio was expected to be smaller in WAE than in AAE. The ratios for the three tense-lax vowel pairs are displayed in Figure  6. A repeated-measures ANOVA with the within-subject factors vowel pair and voicing, and the between-subject factor ethnicity was used to analyze these results. All three main effects were significant. The main effect of vowel pair, F(2,28)=14.32, p < .001, ηp2=.506, revealed that the tense-lax difference was the smallest for the /u-ʊ/ pair. The main effect of voicing, F(1,14)=6.56, p=.023, ηp2=.319, indicated that the tense-lax contrast was smaller before voiceless consonants than before voiced consonants. The main effect of ethnicity, F(1,14)=5.18, p=.039, ηp2=.270, showed that the tense-lax contrast was reduced in AAE relative to WAE. There was also a significant vowel pair by voicing interaction, F(2,28)=7.53, p=.002, ηp2=.350. Exploring this interaction, separate paired t-tests for each vowel pair were used (collapsed over the two ethnicity groups since there was no significant two-or threeway interaction with ethnicity). We explored whether the temporal relationship between the two members of each vowel pair was significantly different as a function of consonant voicing. The temporal difference was significant for two pairs: /i-ɪ/, t (31)=2.28, p=.029, and /u-ʊ/, t(31)=5.16, p>.001, indicating that the temporal contrast between lax and tense was significantly reduced before voiceless consonants and enhanced before voiced consonants. Interestingly, the tense-lax distinction was completely neutralized for the /u-ʊ/ pair in pre-voiceless contexts in WAE group and the lax vowels tended to be slightly longer than the tense vowels in AAE group (compare Figure 6, right panel). We can only speculate at present as more data need to be collected to provide firm evidence, but it appears that the neutralization of the /u-ʊ/ contrast could be a local pronunciation feature in Eastern North Carolina. The temporal difference was not significant for the /e-ɛ/ pair, t (31)=.55, p=.588.

Discussion
The primary aim of this study was to contribute to a better understanding of temporal variation in AAE by examining the durations of vowels preceding voiced and voiceless word final stop consonants. The second aim was to determine if the significant reduction of the tense vs. lax distinction in AAE previously reported before voiced stops [7] also occurs in vowels preceding voiceless stops.
The experiment was conducted in a historically well-established southern speech community of African Americans in Pitt County, NC, whose speech patterns have not thus far been explored by means of acoustic analysis. Before we turn to the discussion of the results, it is important to situate the obtained production patterns in a sociohistorical context. Historically, from 1790 until 1910, almost 90% of African Americans in the US lived in the southern states, sometimes referred to as the old Confederate States [36]. In the 1970s, following the mass exodus of African Americans during the Great Migration, a majority, 53% of African Americans still remained in the South. To date that percentage has been the lowest. In a move termed reverse migration the population of African Americans living in the southern states has increased as families that had once moved to northern cities to escape poverty and racism are now returning to the South in search of better economic opportunity and quality of life [10]. Currently, it is estimated that about 57% of all African Americans in the U.S. live in the southern states [32]. The features of the AAE dialect examined in this study have not been influenced by prior contact with the varieties of AAE spoken in northern metropolitan areas as the majority of African Americans in Pitt County are multi-generation life-long residents. Therefore, the current results are particularly informative with respect to the hypothesized supra-regional AAE dialect.
The robust finding of the current study was that, compared with WAE speakers living in the same geographic area, AAE speakers significantly prolonged vowels preceding voiced consonants but the shorter durations of vowels preceding voiceless consonants did not differ from those of WAE speakers. This increased duration result supports previous reports about extensive vowel lengthening before voiced stops in the Northern cities [17,20]. These findings suggest that the extensive vowel lengthening may be a distinctive feature of AAE, consistently manifested across different AAE dialect regions in the US, supporting the supra-regional aspect of AAE. It is plausible that this temporal aspect of AAE provides a salient perceptual cue to ethnic identification. It has been shown that listeners can distinguish African American and European American voices quite well, even when linguistic segmental information is experimentally eliminated (or filtered out) [37]. Segmental timing and temporal variation in speech may also provide a set of important perceptual cues and the prolonged vowel durations before voiced consonants in the coda may play a key role in ethnic identification.
The relative duration analysis using proportional (ratio) measures further established that the temporal contrast between vowels preceding voiced versus voiceless consonants was significantly enhanced in AAE relative to WAE. This finding provides a strong argument for the distinctive use of vowel duration in AAE. In particular, the observed vowel lengthening before voiced stops may serve as the most salient cue with respect to the voicing status of the consonant that follows the vowel. That is, AAE speakers may systematically lengthen the vowel to convey that the intended consonant is voiced and not voiceless, even if the word final consonant itself is devoiced or deleted altogether. The greater vowel lengthening may enhance the temporal contrast between vowels preceding voiced versus voiceless consonants, and this contrast may serve as the primary marker of the consonant voicing distinction in rapid casual speech, where segmental deletions and assimilations are common.
By including WAE speakers as a local control group, we predicted that AAE vowels would be comparatively longer. This prediction was based on the findings in Holt [7] who examined vowel duration in only one context, before voiced consonants. A significant ethnicity by voicing interaction clarified that this prediction holds only for vowels preceding voiced stop consonants and does not apply to vowels preceding voiceless stops. This important new finding increases our knowledge of vowel duration patterns in AAE relative to WAE. Consequently, the relatively longer vowel durations before voiced stops should be expected and not considered deviations from the norm, whereas vowels before voiceless stops should be expected to be equally short as vowels produced by WAE speakers.
The current results also provide new insights with respect to the effect of word final consonant voicing on the temporal contrast between the tense and lax vowels in AAE. Importantly, the tense-lax distinction was reduced in AAE relative to WAE in both voicing contexts, indicating that the smaller temporal contrast between tense and lax vowels is a distinctive feature of AAE. However, a significant interaction between voicing and vowel pair indicated that, for either speaker group, the tense-lax contrast tends to be reduced before voiceless consonants when vowels are shortened, and is more enhanced for longer vowels before voiced consonants. This pattern was only true for two vowel pairs, however, and the tense-lax contrast was lost for the /u-ʊ/ pair. Presumably, the reduced /u-ʊ/ contrast is a regional feature of English in the eastern part of North Carolina because it occurred in both speaker groups, irrespective of their ethnic background.
It is also important to note that the current study has a significant methodological implication. In the growing field of sociophonetics, there continues to be a debate about adequacy of speech materials used to characterize socio-indexical patterns of phonetic variation including temporal organization of speech [38]. In particular, there is a concern that the high level of control under laboratory conditions may obscure dialect-specific patterns which are typically found in spontaneous conversational speech. Using isolated words, the current study shows that significant and systematic differences in vowel duration related to speaker characteristics such as ethnicity and dialect can be successfully documented in the phonetic laboratory. The current results provide further evidence for the recent position that sociophonetic variation in vowel duration can be elicited on the basis of citation-form words providing fine control over phonetic context, and not necessarily in passages of unconstrained spontaneous speech, which are particularly beneficial in studying temporal aspects of segmental reductions [39].