Producing and perceiving socially structured coarticulation: Coarticulatory nasalization in Afrikaans

Most theories of phonetics assume a tight relation between production and perception, and recent years have also seen increasing evidence for such a relation at the level of the individual. For the most part, however, this evidence comes from socially homogeneous speech communities where the targeted pattern of variation is mostly socially neutral. What implications might socially structured phonetic variation in the speech community have for the perception-production link? If listeners can predict the phonetic patterns of a talker based on the talker’s actual or assumed identity, would they adjust their perceptual strategies accordingly, possibly weakening the link between their own production and perception patterns? This study reports the results of a pair of experiments that investigate the production and perception of coarticulatory vowel nasalization in Afrikaans, a language for which variation in coarticulatory nasalization is socially structured. Relying on nasal airflow measures, the production experiment showed that speakers of White Afrikaans produce more extensive coarticulatory nasalization than speakers of Kleurling Afrikaans. The perception experiment used an eye-tracking paradigm to assess listeners’ perceptual reliance on coarticulatory nasalization, and found (i) that Afrikaans speakers’ use of coarticulatory nasalization in production predicts their perceptual model-predicted differences a positive difference more target fixations in response to the fairly early during CVC CVN(C) tokens early in the differences late and 600 ms after for Kleurling difference in the opposite-to-expected (i.e., more the CVC tokens for the Kleurling talker). opposite-to-expected lateness of its direction, that listeners perceptual strategies on presumed pre-existing knowledge about the variety of Afrikaans fixation on course of the experiment for the Kleurling Afrikaans talker’s CVC words versus faster fixations for the White Afrikaans talker’s CVC words. more coarticulatory nasalization also rely more on this information in perception—and they do so regardless of the talker’s (predictably structured) pattern of nasalization. The persistence of the production-perception link, even in a context of socially structured variation, provides evidence for the robustness of this link. At the same time, although the relative perceptual usefulness of coarticulatory information is informed by listeners’ own productions, our results also show that even language users who themselves produce little to no anticipatory nasalization are nonetheless adept at using that information in perception. The evidence provided in this study further shows, though, that listeners’ perceptual adjustments for speaker-specific, real-time information occur only under certain circumstances. No clear evidence was found for the social mediation of the link between production and perception based on pre-existing knowledge of different coarticulatory patterns in different socio-ethnic varieties of Afrikaans. The continuing challenge for phonetic theory is to determine how individual language users balance, from moment to moment, their reliance on the acoustic patterns in the speech of their interlocutors, and their reliance on their own production patterns.

Most theories of phonetics assume a tight relation between production and perception, and recent years have also seen increasing evidence for such a relation at the level of the individual. For the most part, however, this evidence comes from socially homogeneous speech communities where the targeted pattern of variation is mostly socially neutral. What implications might socially structured phonetic variation in the speech community have for the perception-production link? If listeners can predict the phonetic patterns of a talker based on the talker's actual or assumed identity, would they adjust their perceptual strategies accordingly, possibly weakening the link between their own production and perception patterns? This study reports the results of a pair of experiments that investigate the production and perception of coarticulatory vowel nasalization in Afrikaans, a language for which variation in coarticulatory nasalization is socially structured. Relying on nasal airflow measures, the production experiment showed that speakers of White Afrikaans produce more extensive coarticulatory nasalization than speakers of Kleurling Afrikaans. The perception experiment used an eye-tracking paradigm to assess listeners' perceptual reliance on coarticulatory nasalization, and found (i) that Afrikaans speakers' use of coarticulatory nasalization in production predicts their perceptual reliance on this information, (ii) that they rapidly adjust to the coarticulatory timing patterns in the speech of other speakers, but also (iii) that they do not adjust their perceptual reliance on coarticulation in response to the assumed identity of the speaker. The link between perception and production therefore persists, even in this situation of socially structured variation in coarticulatory timing.

Introduction
This study investigates the perception and production of coarticulated speech in an Afrikaans speech community in which a selected pattern of coarticulation, anticipatory vowel nasalization, is socially structured. The study is situated at the intersection of, on the one hand, the longstanding tradition of phonetic research on the relationship between speech perception and production and, on the other hand, newer lines of research on listeners' adjustment of their perceptual strategies in response to the social identity of their interlocutors. A broad hypothesis underlying this work is that how a language user produces speech is complexly related to how that individual perceives speech. Our main goals are to determine, for coarticulated speech, whether perception is guided by knowledge of social structuring and thus whether the complex production-perception relation is socially mediated.

The relation between the perception and production of coarticulation
Although speech production and perception cannot be assumed to be isomorphic (e.g., Pardo, 2012) there is nonetheless strong theoretical motivation for assuming a tight relation between speaking and listening. For some theoretical approaches, this tight relation is formalized in the nature of produced and perceived phonetic units. Gesturalist theories of speech perception, for example, postulate that the forms of speaking-vocal tract actions-are, correspondingly, the forms of perception (Fowler, 1986;Liberman & Mattingly, 1985;Liberman & Whalen, 2000).
Alternatively, the similarity between produced and perceived forms of speech might reside in the acoustic-auditory domain, as in the assumption of the DIVA (Directions into Velocity of Articulators) production model that the targets of production are auditory/perceptual (Guenther, Hampson, & Johnson 1998). Some theoretical perspectives further postulate that such parity holds not only for the nature of phonetic forms but also for the specific forms produced and perceived by individual speakers. This assumption emerges in exemplar-based models, which are sometimes agnostic concerning the nature of stored experiences but typically involve a perception-production loop in which productions are drawn from the larger perceptual space (Pierrehumbert, 2001). From a different perspective but along similar lines, many theoretical approaches to sound change rest on the assumption that a listener's innovative percepts (i.e., percepts that differ from the community norm) are manifested in the subsequent production patterns of that individual (Beddor, 2009;Harrington, Kleber, & Reubold, 2008;Lindblom, Guion, Hura, Moon, & Willerman, 1995;Ohala, 1981;Yu, 2013).
For the production and perception of coarticulated speech, this theoretically postulated tight link is well supported by empirical findings for various communities of speaker-listeners.
For example, gestural overlap patterns that are language-specific (e.g., Beddor, Harnsberger, & Lindemann, 2002;Beddor & Krakow, 1999) or age group-specific (Harrington et al., 2008;Kleber, Harrington, & Reubold, 2012) have been shown to correspond to language-or age-specific perception in that the more extensive a speech community's production of coarticulatory overlap (e.g., coarticulatory vowel fronting or nasalization), the greater those language users' perceptual adjustments for the acoustic effects of that overlap. Other studies have shown language varietyand age-specific production and perception of coarticulation to be linked in that the group for which one type of information is especially informative perceptually is also the group that produces that information to a greater extent (Coetzee, Beddor, Shedden, Styler, & Wissing, 2018;Kuang & Cui, 2018;Schertz, Kang, & Han, 2019).
Similar to speech communities, individuals also differ systematically from each other in their production and perception of coarticulated speech (see Zellou, 2019 for reviews), yet findings are mixed regarding whether an individual's produced coarticulation predicts their perception. Some studies of individuals' perceptual adjustments for, or attention to, the acoustic effects of coarticulation report a positive correlation with produced coarticulation Yu, 2019;Zellou, 2017), while others have failed to establish a link (Grosvald, 2009;Kataoka, 2011). Individuals' perceptual weightings of coarticulatory effects relative to the source of those effects are also not predicted by their production patterns in some studies (Schertz, Cho, Lotto, & Warner, 2015;Shultz, Francis, & Llanos, 2012), and findings that they are (e.g., Coetzee et al., 2018) may be driven in part by group-based differences (see Schertz & Clare, 2020 for a review). Thus, the strength of the production-perception link for individual speaker-listeners appears to be variable, and the factors that mediate the strength of the link are not yet well understood.
One scenario in which this link may be weak or even absent is in speech communities where production patterns are socially structured. In these situations, listeners may interact regularly with speakers whose production patterns are predictably different from their own. Beddor et al. (2018, p. 935), for instance, speculated that a weaker perception-production link may be observed in a situation of an ongoing sound change, where younger and older community members may differ in their adoption of new phonetic norms. Harrington et al.'s review of the relevant literature (Harrington, Kleber, Reubold, Schiel, & Stevens, 2019) suggests that perception can lead production in ongoing changes, and two recent studies of sound changes in progress-Kuang and Cui's study of the tense/lax register contrast in Southern Yi (Kuang & Cui, 2018) and Pinget et al.'s study of obstruent devoicing in Dutch (Pinget, Kager, & van de Velde, 2020)-document this pattern at the level of individual speaker-listeners of the relevant speech varieties. In addition, though, Pinget et al. (2020) find that, when nearing completion, the change has progressed further in production than perception.
We propose that a similar misalignment or relaxation of the perception-production link may be observed in speech communities where differences in production patterns are linked not to an ongoing sound change, but to the social structure of the speech community. If different subgroups of the speech community have different characteristic production patterns, then listeners will encounter these different patterns and successful communication might be aided by listeners being especially flexible in their perceptual strategies.
In this study, we use an approach, and investigate a scenario, that to some extent is similar to that studied by Beddor et al. (2018). They investigated variability in the production and perception of coarticulatory nasalization in Midwestern American English, finding considerable variation in the extent of produced coarticulatory nasalization. They also found that speakers who produce particularly heavy anticipatory nasalization attend particularly closely to that information in perception-for example, they are faster to identify [sεñt] as scent rather than set, indicating that they rely more on the coarticulatory information during the vowel for disambiguation. In that speech community, however, there is no evidence of social structuring of the extent of nasalization, and differences in nasalization do not clearly index a speaker in any meaningful manner. It is hence not possible for a listener to predict, based on the identity of a specific speaker, whether the speaker will nasalize more or less. In the current study, we investigate the same phenomenon in an Afrikaans speech community where a similar degree of variation in the amount and extent of produced coarticulatory nasalization is observed. However, the variation in the extent of coarticulatory nasalization is clearly socially structured in this speech community-producing more or less nasalization marks an individual as being a speaker of a specific socio-ethnic variety of the language. We investigate whether this difference between Midwestern American English and Afrikaans in terms of the social structure of the variation impacts the nature of the perception-production link.

Talker-sensitive perceptual strategies
In general, listeners use the lawfully structured acoustic variation afforded by coarticulation to facilitate perceptual processing (e.g., Whalen, 1984, among many others). However, as discussed above, individual listeners differ from each other in the extent to which they attend to and use that information in making lexical decisions, and they appear to do so in ways that depend in part on their own production of coarticulation. A question that naturally arises, then, is whether listeners will nonetheless adapt their perceptual strategies to the idiosyncratic coarticulatory patterns of their interlocutors-idiosyncratic patterns that are well-documented in the literature, including for coarticulatory nasalization (Beddor et al., 2018, among others). That is, when attending to the speech of a talker 1 with a coarticulatory pattern different from their own, will listeners adjust their perceptual strategy for the processing of coarticulatory information accordingly?
Based on findings that listeners adapt their perceptual strategies to talker-specific idiosyncrasies, we hypothesize that the same will hold for talker-specific coarticulatory nasalization. Trude 1 In order to differentiate between individuals whose speech is used as stimuli in speech perception experiments (as in Section 3 below) and general members of speech community, we will use '"talker" to refer to the former and "speaker" to the latter throughout this paper.
and Brown-Schmidt (2012;Trude, Duff, & Brown-Schmidt, 2014), for instance, exposed listeners to two talkers who differed in whether they produced a raised diphthong in words ending in /aeɡ/ compared to /aek/-that is, although both talkers realized words like back as [baek], one realized bag as [baeɡ] and the other as [beɪɡ]. Using a visual world paradigm, they conducted an identification experiment in which participants saw images for minimal pair words like bag and back while hearing auditory [baek]. Listeners fixated more quickly on the back image for the talker who realized bag as [beɪɡ], showing that listeners relied on different perceptual strategies depending on talker-specific production patterns. These and other similar findings (e.g., Dahan, Drucker, & Scarborough, 2008;Kraljic, Brennan, & Samuel, 2008;see Samuel & Kraljic, 2009, for a review) show that very limited exposure to a novel pattern of a talker is sufficient for listeners to perceptually adapt to that pattern.
Because varieties of the same language can differ systematically in terms of their timing of coarticulatory nasalization, this study also asks whether listeners bring existing knowledge about timing patterns in different language varieties to the perceptual task, and consequently rely differentially on coarticulatory information based on their (possibly unconscious) knowledge about these patterned differences. For coarticulatory nasalization, there is clear evidence, despite variation at the level of individual speakers, of broader community-level patterns. Studies have documented systematic variation in anticipatory vowel nasalization for different regional varieties (e.g., Bongiovanni, 2018, for Caribbean and non-Caribbean Spanish;Delvaux, Huet, Piccaluga, & Harmegnies, 2012, for European French;Stroop, 1994, for Belgian French;Tamminga & Zellou, 2015, for American English) and for age groups within a regional or ethnic variety (Wissing, 2018, for so-called White Afrikaans;Zellou & Tamminga, 2014, for Philadelphia English). This study documents socio-ethnically based nasalization patterns in Afrikaans.
There is a growing body of research showing that, when listeners are led to believe that a talker has a particular (typically regional) identity, they actively adjust their perceptual strategies based on their prior knowledge of, or stereotypes about, that speech variety. Niedzielski (1999), for instance, showed that listeners were more likely to accurately identify a word like house as having a raised version of the diphthong /aʊ/ when they were led to believe that the talker was Canadian-that is, a talker of an English variety associated with this form of raising-rather than American. Hay et al. (2006a) replicated this finding for a difference between New Zealand and Australian English, showing that listeners are more likely to perceive a word like fit as being produced with a raised vowel when they were led to believe that the talker was from Australia, in agreement with the more raised realization of the high front lax vowel in that dialect. Staum-Casasanto (2009a;2009b) found that American English-speaking listeners were more likely to identify [maes] in a phrase like The [maes] probably lasted … as the word /maest/ when they were led to believe that the talker was Black rather than White, showing that listeners use their knowledge that word-final deletion of /t/ is more common in the speech of Black than White speakers of American English.
Especially relevant to our question of whether listeners bring existing knowledge about variety-specific timing patterns to the perceptual task is Schertz et al.'s study of vocalic f0 in relation to preceding stops' voice onset time for speakers of two dialects of Chinese Korean (Schertz et al., 2019). Speakers of these dialects differ from each other in the contributions of f0 and VOT to the differentiation of lenis and aspirated stops, but both dialects differ from Seoul Korean, in which the f0 information is primary. Schertz et al. found that younger Chinese Korean listeners weighted f0 more heavily in their lenis-aspirated judgments when they were led to expect that the talker was from Seoul than when they thought the talker was from their own city.
In the current study's investigation of whether Afrikaans-speaking listeners adjust to the coarticulatory pattern of a talker, the socio-ethnic varieties of the talkers differ in prestige. Several studies have documented differences in processing advantages for 'prestige/standardized' versus 'non-prestige/non-standardized' varieties. In an early study, Weener (1969), for instance, found that children from a predominantly White, middle class Detroit neighborhood recalled more words produced by a speaker from their own neighborhood than by a speaker from a predominantly Black, lower class neighborhood. On the other hand, children from the predominantly Black, lower class neighborhood showed no recall difference between the two different speakers. That is, children who spoke the 'prestige' variety of English showed a processing disadvantage for the other variety, while children who spoke the 'non-prestige' variety of English did not show a comparable processing disadvantage for the prestige variety. Sumner and Kataoka (2013) similarly showed that the prestige of an accent appears to influence listeners' responses to talkerspecific variation. They found that American English listeners are faster at identifying a word like thin after being primed with an auditory presentation of a semantically related word such as slender-but only under certain conditions. Specifically, slender primed thin identification when realized with a final /ɹ/ (the typical American rhotic pronunciation) or with a non-rhotic British English pronunciation, but not with a non-rhotic, 'non-standard' New York City pronunciation.
Under the reasonable assumption that the average American listener would have only limited exposure to (non-rhotic) British English, they argued that listener sensitivity to talker identity does not require extensive exposure to the specific speech variety and hypothesized that the higher prestige associated with British compared to New York City English may result in more robust encoding of British exemplars (see also Sumner, Kim, King, & McGowan, 2014).

Coarticulatory nasalization in two socio-ethnic varieties of Afrikaans
In this study, we focus on differences in coarticulatory nasalization between two socio-ethnic varieties of Afrikaans. Although there are regional differences observed in Afrikaans, the main dialect groups of the language are differentiated along socio-ethnic rather than regional lines (Stell, 2011, pp. 57-64). 2 So-called 'White Afrikaans' is spoken predominantly by speakers of European descent, and is also the variety of the language that is more likely to be encountered in the media and taught as either first or second language in school settings. So-called 'Kleurling Afrikaans' is spoken predominantly by members of the Kleurling community, comprised of descendants of 17 th century Dutch settlers, various communities indigenous to South Africa (including both Khoisan and Bantu speakers), and Malaysian and Indonesian slave laborers brought by the Dutch to South Africa in the late 17 th and early 18 th centuries. Although there are today more speakers of Kleurling than White Afrikaans as a first language (Stell, 2011, p. 57), Kleurling Afrikaans is often considered the non-standard variety of the language. 3 There is a long tradition of impressionistic phonetic descriptions of Afrikaans, including on the dialectal distribution of coarticulatory nasalization. The general observation is that White Afrikaans is characterized by more extensive nasalization, while nasalization is claimed to be limited or even absent in Kleurling Afrikaans (Coetzee, 1981;Coetzee, 1989, pp. 233-234;Coetzee & van Reenen, 1995;Coetzee, 1985;van Rensburg, 1989, p. 440). Although none of these earlier studies relied on acoustic or aerodynamic measures of nasalization, there is no reason to doubt the accuracy of these descriptions. Even so, one of the goals of the current study is to confirm this claimed difference based on nasal airflow measures collected from speakers of the two varieties.
In addition to noting the difference in the prevalence and extent of nasalization between the two varieties of Afrikaans, earlier research also commented on the association between nasalization 2 There is also a geographic component to the dialect distribution, with most speakers of White Afrikaans concentrated in the eastern and northern provinces of South Africa, and most speakers of Kleurling Afrikaans in the western provinces (Stell, 2011, pp. 57-59). However, primarily due to segregation enforced on South African society by the apartheid system , even Kleurling communities in the eastern and northern regions of the country speak a variety of Afrikaans that is most closely affiliated with Kleurling Afrikaans, and similarly White communities in the western regions speak predominantly White Afrikaans (modulo smaller regional differences within each of these two socio-ethnic varieties). 3 We acknowledge the problematic nature of the terms 'White Afrikaans' and 'Kleurling Afrikaans.' The socio-ethnic groupings indicated by the terms 'White' and 'Kleurling' are problematic constructs that oversimplify the lived realities of Afrikaans-speaking individuals, so that not all speakers will associate with one of these two terms. Similarly, not everyone who may self-identify as belonging to one of these two socio-ethnic groups necessarily speaks (only) the variety of Afrikaans traditionally associated with that particular group. The terms are used here as convenient labels only to refer to two parts along what is more likely a dialect (and perhaps also style) continuum, rather than two distinct varieties of the language. Participants in the study completed a survey at the end of their participation in which they were asked to self-identify in terms of their affiliation with different parts of the Afrikaans speech community. Participants considered as speakers of White Afrikaans for the purposes of this study all self-identified as 'White,' while those considered as speakers of Kleurling Afrikaans typically self-identified as 'Kleurling,' 'Coloured,' or 'Brown' (terms that are used mostly interchangeably in South Africa). In the South-African context, the term 'Coloured' does not carry the same negative connotations as the term 'Colored' in the United States, and it is in fact often the term preferred by members of the community itself, sometimes spelled phonetically as 'Kallit.' and socioeconomic or educational factors. For White Afrikaans as spoken in Johannesburg, for instance, A.E. Coetzee (1989) reports more extensive nasalization for individuals in the upper than lower middle class. She also notes that, although there does not appear to be an age-or gender-related difference in the upper middle class, in the lower middle class, younger speakers and women show more extensive nasalization than older speakers or men. This indicates a possible association of nasalization with prestige and upward socioeconomic mobility. I.A. Coetzee (1985) reports similar results for the Kleurling Afrikaans community of Eersterust, not far from Johannesburg. Although he reports an overall low prevalence of nasalization (in accordance with other descriptions of Kleurling Afrikaans), he notes that nasalization rates are higher for individuals from the more affluent neighborhoods of Eersterust than those residing in the poorer neighborhoods (1985, p. 76). This difference again hints at an association of nasalization with prestige, and may also reflect the fact that individuals from the more affluent neighborhoods of Eersterust have more contact with White Afrikaans in both educational and professional settings.
The historical origin of the differences in nasalization patterns between the two varieties of Afrikaans is difficult to determine, especially given that the social valuation of nasalization in modern Dutch is opposite to that in Afrikaans (Coetzee & van Reenen, 1995;van Reenen & Coetzee, 1996). Unlike in Afrikaans, extensive nasalization is associated with non-standard and stigmatized varieties of Dutch, while lesser degrees of nasalization are found in the standard variety of the language. Coetzee and Van Reenen (1995, pp. 63-64) provide a possible explanation for the opposite valuation of nasalization in these two speech communities in terms of the historical settlement patterns of Dutch speakers who provided the input for the development of these varieties of Afrikaans. They note that the Dutch settlers who came to South Africa in the late 17 th century originated from regions in the Netherlands where extensive nasalization is common today-the border regions between North and South Holland (excluding Amsterdam) and the southwestern parts of South Holland. These settlers provided the primary input for the variety that later developed into White Afrikaans. The descendants of these early Dutch settlers moved away from the Cape Town region, first to the east in the late 17 th century, and eventually also north into the interior of modern South Africa after the British takeover of Cape Town in the early 19 th century. Their Afrikaans therefore reflects the nasalization patterns typical of the earliest Dutch settlers. On the other hand, the non-White inhabitants of the Cape Town region for the most part did not migrate away from Cape Town. Once the Dutch settlement was firmly established by the early 18 th century, Dutch settlers of higher socioeconomic status (hence coming from regions in the Netherlands where non-nasalization was the norm) came to South Interested readers can refer to Den Besten (1989) and Roberge (1994) for two authoritative discussions in English.

Hypotheses
In this study, nasal airflow and eye-tracking methods are used to assess the production and perception of coarticulatory nasalization by speakers of Kleurling and White Afrikaans. We hypothesize, based on the existing impressionistic descriptions of Afrikaans, that speakers of White Afrikaans will produce more extensive coarticulatory nasalization than speakers of Kleurling Afrikaans. Our perceptual hypotheses are more nuanced and depend not only on the listeners' own socioethnic identity but also on whether they are listening to a Kleurling or White Afrikaans talker.
First, we expect that listeners will rely on coarticulatory information during perception and that their reliance will be sensitive to the time-varying patterns of that information.
Given that the stimuli for our study were created in such a way that nasalization onset occurs earlier in the tokens produced by the White than by the Kleurling Afrikaans talker (see Second, given results such as those reported by Beddor et al. (2018) for English, showing a link between the extent of coarticulatory nasalization produced by an individual and that individual's perceptual reliance on nasalization, we hypothesize that a similar pattern will be found for speakers of Afrikaans. However, given the social structuring of coarticulatory variation in the Afrikaans speech community and the evidence that listeners can adjust their perceptual strategies for socially structured variation (see Section 1.2), it also possible that the link found by Third, we hypothesize that listeners might adjust their perceptual strategies based on the identity of the two talkers in the perceptual task (Kleurling versus White Afrikaans).
Our perceptual design (see Section 3.1.3) tests two possibilities in this regard. Listeners may bring to the perceptual task knowledge about the coarticulatory differences between the two varieties of Afrikaans, and may consequently use different perceptual strategies immediately upon identifying the specific variety. Alternatively, listeners may instead adapt their perceptual strategies over the course of the experiment, based on exposure to the coarticulatory patterns of the specific talkers in the experiment (Dahan et al., 2008;Trude & Brown-Schmidt, 2012;Trude et al., 2014;etc.).
Fourth, based on the social structure of the Afrikaans speech community, we expect potentially different perceptual results for White and Kleurling Afrikaans-speaking participants. Given the sociolinguistic situation in South Africa, speakers of Kleurling Afrikaans typically have extensive exposure to White Afrikaans. Not only is White Afrikaans the variety encountered most often in the media, it is also the variety used most often in professional and academic settings. It can thus be assumed that speakers of Kleurling Afrikaans will have a relatively high level of exposure to White Afrikaans, which is also the variety with higher social prestige. The average speaker of White Afrikaans, by contrast, would have less extensive exposure to Kleurling Afrikaans. The situation leads to the expectation that, relative to White Afrikaans-speaking listeners, Kleurling Afrikaans-speaking listeners might have stronger prior coarticulatory expectations or be able to more rapidly adjust their perceptual strategies when listening to stimuli from the other variety. That outcome would be in keeping with the finding of Sumner and Kataoka (2013) that American English listeners do not adjust their perceptual expectations to non-prestige New York City English stimuli. On the other hand, given South Africa's racio-political history, and the consequent prominence of race and ethnicity in South African society generally, it is possible that speakers of both varieties of Afrikaans may be attentive to speech patterns related to ethic identity and hence that speakers of both varieties will adjust their perceptual strategies to a similar extent.

Production experiment
Data collection for the production and perception experiments was done over two sessions, typically scheduled one week apart. Both sessions included a perception component, while production data were collected only at the end of the second session. Although the production data were collected last, we present those results first since we investigate whether nasal airflow patterns of individual speakers may predict their reliance on nasalization during perception.

Based on the impressionistic descriptions of the patterns of nasal coarticulation in Kleurling and
White Afrikaans, we expect both earlier onset and a higher overall volume of nasal airflow for speakers of White than Kleurling Afrikaans in the production of words that contain a nasal coda.

Participants
Participants were 81 native speakers of Afrikaans, between the ages of 18 and 30 years, recruited from among the student body at the North-West University, Potchefstroom, South Africa. Of the participants, 37 self-identified as 'Kleurling' (22 female, 15 male; see footnote 3) and 44 as 'White' (24 female, 20 male). All participants reported normal or corrected-to-normal vision, as well as no known speech or hearing deficits. Participants received 500 South African Rand for their participation. Twenty-six additional participants were disqualified for a variety of reasons: seven for failure to complete the full experiment, 11 for problems with accurate airflow measurement, two for poor eye-tracking accuracy, and six for poor performance in the perception task (defined as achieving less than 0.75 proportion target fixations during the time window of interest in any one of the conditions in the perception experiment).

Stimuli
Stimuli consisted of 10 pairs of Afrikaans words, given in Table 1

Procedure
During airflow collection, participants positioned a hand-held pliable silicone mask against their faces, with instructions to create a secure but comfortable seal. For participants with smaller faces, a large metal clip was used to pinch the bottom edge of the mask in order to ensure a tight seal. Nasal airflow was captured via the Glottal Enterprises Oral-Nasal Airflow system using a split oral-nasal silicone mask with mesh port covers and two PT-2E airflow capture transducers. Prior to each block of airflow data collection, each transducer was calibrated by pushing 140 ml of air through a calibration box attached to the transducer; air escaped through a vented-mesh port identical to those in the mask. This produced a known volume pressure signal, which was then used to calculate a conversion factor to transform the electrical pressure response of the transducer into the volume of air (in ml) passing through the mask.

Oral CVC stimuli
Nasal CVN(C) stimuli 'belly' Table 1: Stimuli used in the production study.
Stimulus presentation and data collection were conducted using SR Research Experiment Builder software. Responses were elicited by presenting a professionally drawn black-and-white line sketch on the computer monitor. Participants were familiar with these images, since the same images were also used during the preceding perception experiment sessions. Even so, to ensure that participants produce the appropriate word, the images were accompanied by an orthographic representation of the relevant word beneath the image. Upon presentation of a stimulus, participants produced the relevant word in the frame sentence X is die woord ('X is the word'). Once an image had been presented, participants had two seconds to respond.
Trials with incorrect productions or disfluencies were manually flagged by the experimenter for later repetition. Stimuli were presented in random order and repeated 10 times, resulting in 200 airflow samples per participant. After every 50 trials, participants were given a break and allowed to remove the mask from their faces for normal breathing.

Data analysis
Nasal airflow during the vowel portion of each signal was measured at 25 points across the duration of the vowel. Vowel and nasal consonant durations were also measured. Vowel and nasal boundaries were delimited using TextGrid annotations in Praat (Boersma & Weenink, 2013). As illustrated in Figure 1 for a token of bons [bɔns] 'bounce,' segmentation was based on the nasal and oral waveforms, and on spectrograms that were created from the residual acoustic data captured by the airflow transducers. Signals were low-pass filtered below 5,000 Hz (to remove extraneous acoustic information) and high-pass filtered above 40 Hz (to remove the nonacoustic airflow signal). Boundaries for vowel onset and offset were placed at the first and last visually identifiable pitch pulses of the vowel and were based primarily on the oral waveform.
Nasal consonant onset was identical to vowel offset, while the offset of the nasal consonant was determined largely on the basis of cessation of the periodic signal in the nasal waveform.
Despite precautions taken during recordings to minimize production errors, specific tokens were excluded from analysis due to speaker error (e.g., incorrect or disfluent production of a target word, or non-production of the carrier sentence), or an unanalyzable nasal waveform (due to mask slippage). Furthermore, to ensure that the data were not unduly affected by outliers, we applied a functional outlier detection method, from R's rainbow package, on a by-participant basis (Hyndman & Ullah, 2007;Shang & Hyndman, 2019). The outlier detection method calculates for each trial the integrated squared deviation from the mean airflow over time. Those trials that fall outside of the smallest area that captures 99% of the data were removed from further analysis.
We excluded 107 trials from 34 participants (16 speakers of Kleurling Afrikaans, 18 speakers of White Afrikaans) on this basis (1.4% of the total trials).

Results
We start by briefly describing the observed nasal airflow patterns in order to confirm that the expected differences between speakers of White and Kleurling Afrikaans were obtained. We then present an analysis of the nasal airflow patterns relying on Generalized Additive Mixed Modeling (GAMM) to capture the dynamic changes in nasal airflow over time. Finally, we conduct a functional principal component analysis of the airflow data to capture differences between speakers. The first principal component from this analysis will be used later (see Section 3) to make speaker-level predictions about perceptual reliance on nasal coarticulation.
The left panel in Figure 2 presents the raw average nasal airflow across normalized time for CVN(C) tokens, separately for speakers of Kleurling and White Afrikaans. As this image shows, the onset of nasal airflow is earlier and the overall volume is greater for the speakers of White

Afrikaans.
Before statistical modeling, airflow measures were normalized on a by-trial basis by dividing each of the 25 raw nasal airflow measures in the vowel by the maximum nasal airflow value attested within the following nasal consonant. The normalized nasal airflow values therefore constitute the ratio of nasal airflow during the vowel to maximum nasal airflow during the nasal consonant. These normalized values adjust for both across-speaker differences and between-trial differences within a speaker. To give an indication of the structure of and variation in the nonnormalized data, we report measures of vowel length and nasal airflow in Table 2. As seen in this   for these same speakers in oral CVC words, and noting that there is no evidence of ingressive nasal airflow during their productions of CVC words. We hypothesize that ingressive nasal airflow may be the result of the lowering of the velum before the velic seal is broken, slightly increasing the volume of the nasal cavity, and hence resulting in weak ingressive nasal airflow. See Hayes and Stivers (2000) for evidence that such 'pumping action' of the velum can result in measurable ingressive airflow. 6 Speakers (and especially speakers of non-standardized/stigmatized language varieties) 'style-shift' based on the specific social communicative setting in which their language use occurs-see Scanlon and Wassink (2010) and Britt and Weldon (2015) about such style-shifting in African American English, for example. The PC1 values used here should therefore be interpreted as reflecting the nasalization patterns typical of these speakers in the specific social communicative setting in which the data were collected-that is, a fairly formal setting on a university campus where White Afrikaans is the majority language variety and the assumed standard. It is therefore possible that those Kleurling speakers with PC1 values typical of White Afrikaans may be style-shifting to accommodate to the specific social communicative setting of the experiment and that they may nasalize less in settings where White Afrikaans is not the social normative variety of the language (Coetzee, 2018, p. 188

Perception Experiment
The perception experiment assessed listeners' perceptual reliance on the presence versus absence of coarticulatory nasalization using an eye-tracking design similar to that used by  see also Beddor, McGowan, Boland, Coetzee, & Brasher, 2013). In this experiment, listeners were presented with an auditory CVC (kat [kɑt] 'cat,' pet [pɛt] 'baseball cap') or CVN(C) (kant [kɑnt] 'lace,' pen [pɛn] 'pen') stimulus, and two images corresponding to the presented auditory stimulus and its minimal pair competitor (kat-kant, pet-pen). Participants' task was to look at the image corresponding to the auditory stimulus. Auditory stimuli were produced either by a White or Kleurling Afrikaans talker, with relatively minor manipulation so that all tokens would show coarticulatory patterns typical of these two varieties of the language (see Section 3.1.2 for more on these manipulations).
For each talker condition (Kleurling or White Afrikaans), stimuli were presented according to a blocked design in which participants first heard only CVC auditory stimuli followed by CVC and CVN(C) stimuli intermixed (with both blocks also containing fillers). This design allowed us to test the two versions of our third perception hypothesis (Section 1.4) concerning adjustment of perceptual strategies based on talker identity. If listeners use prior knowledge about coarticulatory nasalization in the two varieties when hearing, say, kat and deciding between kat and kant, they should look more quickly to the kat image when hearing the token produced by the White Afrikaans talker than the Kleurling Afrikaans talker, even prior to hearing kant. This is because, on average, orality disambiguates kat and kant early in the vowel for White but not for Kleurling Afrikaans. Alternatively, if listeners only adapt their perceptual strategies over the course of the experiment, differences in responses to the stimuli produced by the two talkers should not emerge until after listeners hear CVN(C) stimuli.

Participants
The participants were the same individuals as those who participated in the production experiment.

Stimuli
Stimuli were the same 10 CVC-CVN(C) minimal pairs that were used in the production experiment.
Auditory stimuli were modified versions of the words as produced by two adult female talkers, one Kleurling Afrikaans talker and one White Afrikaans talker. In order to select talkers who could easily and reliably be identified as speaking the relevant variety of Afrikaans, we first conducted a talker norming experiment using the voices of 11 Kleurling Afrikaans (5 female, 6 male) and 13 White Afrikaans (8 female, 5 male) individuals, each reading the instruction sentences used during the eye-tracking experiment (see Section 3.1.3). These recordings were presented, through an online interface and in random order, to 19 Afrikaans listeners who were tasked with identifying the variety of Afrikaans spoken by each talker. The talkers' variety was generally identified accurately, with an average of 94% correct and a range of 86 to 100%. From the 19 talkers, we selected one female talker of each variety for whom their variety was correctly identified by 100% of the participants.
In addition to the instruction sentences, these two talkers also produced the CVC and CVN(C) target words (in addition to some filler words). To ensure that the stimuli consistently had the coarticulatory nasalization patterns typical of Kleurling and White Afrikaans, the original stimuli were waveform edited in Praat. For each minimal CVC-CVN(C) word pair (kat-kant), the initial C and onset of V were taken from a token of the CVC word. To create the CVC stimulus (kat), this initial portion (ka onset ) was then spliced onto the V offset C of a different token of the relevant CVC word (a offset t). The corresponding CVN(C) stimulus (kant) was created by using the same initial portion (ka onset ), and splicing that onto the V offset N(C) portion of the relevant CVN(C) token (ã offset nt from kant). Splicing was done such that approximately the last 75% of the vowel was realized with nasalization in the White Afrikaans tokens, and approximately 20% for the Kleurling Afrikaans tokens. This editing (typically involving only a few pitch pulses per vowel) resulted in tokens with coarticulatory patterns characteristic of the two relevant varieties of Afrikaans. For all nasal vowel portions, nasalization was clearly audible, with acoustic correlates of the nasalization being a decrease in waveform amplitude and a flattening and broadening of the F1 region of FFT spectra, relative to the oral portion. Table 3 contains average durations of the oral and nasal portions of the vowel, and the nasal consonant in CNV(C) tokens in each of the two varieties. Filler stimuli were 10 minimal pairs differing in oral codas (e.g., tas /tɑs/ 'suitcase,' tak /tɑk/ 'branch').
This splicing results in stimuli in which the temporal onset of nasalization (the main difference of interest between White and Kleurling Afrikaans in this study) is carefully controlled. An alternative approach could have been used in which naturally produced tokens with the requisite timing patterns were selected as stimuli. We opted to use the splicing methodology in order to have more exact control over both the pre-nasalization portion of the stimuli (i.e., so that there would be no other information on which listeners might base their target looks) and the temporal onset of nasalization in the stimuli.
Visual stimuli were black and white line drawings corresponding to each of the 40 words (20 target stimuli and 20 fillers), which were used as prompts in both the production and perception studies.

Procedure
Data collection for the perception experiment was done over two sessions, usually scheduled a week apart. Talker identity was blocked by session such that each session consisted of only tokens produced by the Kleurling or White Afrikaans talker. The order of sessions was counterbalanced across participants. In the first session, prior to testing, participants learned the labels for each of the target images used for the eye-tracking study (and also the production study reported in Section 2). Participants first saw the randomly ordered images one at a time, with the corresponding word written below the image. To aid memorization, they read each label aloud to the experimenter and explained how the image related to the label. Participants were then shown, in a self-paced procedure, each of the images in random order, and had to produce the word corresponding to the image aloud. Each image had to be identified correctly twice before moving on to the main task. An incorrect answer resulted in the correct label being shown on the screen and the word being reentered into the randomization. The testing part of this familiarization procedure was repeated at the start of the second data collection session.

Oral portion of vowel Nasal portion of vowel Nasal consonant
White Afrikaans 41 123 101 Kleurling Afrikaans 101 31 128 Table 3: Average durations (in ms) of relevant portions of the vowel in CVC and CVN(C) tokens, and of the nasal consonants in CVN(C) tokens.
Eye movements were captured with a remote monocular eye-tracker (EyeLink 1000 Plus, SR Research), using a 25 mm lens and a sampling rate of 500 Hz. Participants were seated so that their eyes were between 550 and 650 mm from the camera and about 800 mm from the monitor.
During testing, auditory and visual stimuli were presented using SR Research Experiment Builder software; auditory stimuli were heard over AKG 271 Mk2 headphones. After familiarization but prior to testing, the experimenter performed a calibration.
In each test trial, participants were presented with two visual stimuli, arranged as in Figure 5.
Participants then heard the instruction sentence (Kyk na die sketse 'Look at the drawings'), as  Stimuli were organized into two blocks: The initial block was an 'oral only' block, and contained 40 oral target stimuli and 35 filler auditory stimuli, followed by a 'mixed' block containing 10 oral, 50 nasal, and 25 filler trials. Two stimulus randomizations were created (without mixing stimuli from the mixed and oral only blocks), and were alternated between consecutive participants.
Participants assigned to the first of the two orderings for the first perception experiment session were then assigned to the other ordering for the second session (i.e., a participant always heard different orderings for the White and Kleurling Afrikaans sessions, respectively). Participants were given a short break after every 50 eye-tracking trials.
Participants' eye movements were monitored during each trial, starting from the onset of the auditory stimulus and for a duration of 1000 ms. The computed measure was the proportion of fixations on the target image over time, beginning at 200 ms after stimulus onset, and for forty 20 ms temporal bins. The 200 ms delay is based on the standard assumption of the time required for the planning and execution of a saccade (Dahan, Magnuson, Tanenhaus, & Hogan, 2001; for a review of the cognitive bases for this delay, see Hutton, 2008). A fixation was counted as a target fixation if it fell within the target image's 'square' (as in Figure 5). Thus, a proportion of 0.50 for, say, the temporal bin 400-420 ms for auditory bons in bos-bons trials means that 50% of those trials included a fixation on visual bons at some point during that 20 ms interval.

Results
Results from the eye-tracking experiment were modeled with generalized additive mixed models To avoid artefactual overfitting of the data, the non-linearity penalty, gamma, was increased to double its default value (see Baayen, Vasishth, Kliegl, & Bates, 2017;Wood, 2011). The remainder of our hypotheses are independent of the assumed link between perception and production, and were hence evaluated with a model that was identical to that described above, except that PC1 and its interactions with other factors were not included in the model. The full model structures and results are available in the supplementary materials.

Do listeners use coarticulatory nasalization?
Our first hypothesis is that listeners will rely on coarticulatory vowel nasalization to differentiate CVN(C) and CVC words (e.g., kant versus kat) and will not wait for the disambiguating postvocalic consonantal information (-nt versus -t). Since coarticulatory nasalization starts earlier in the CVN(C) words produced by the White than Kleurling Afrikaans talker, we expect that participants will fixate on the target CVN(C) image earlier in the White Afrikaans condition than in the Kleurling Afrikaans condition. The panels in the top row of Figure 6 show the average show model-predicted fixation proportions for conditions excluding effects attributable to participant-and word-specific variation, as specified in the model random effect structure. In the relevant plots, this is indicated by the words "fitted values, excl. random" in the righthand margin of the plots. 8 Results in the difference plots (bottom panels) of this and following figures are given in log odds, since responses are binary (a participant either looks at an image or not), so that we have to rely on regression models with a logit link function to model participant looks. Log odds should be interpreted carefully given that the relationship between log odds and proportions is not linear. The same size change in log odds (a change of one unit from 0 to 1, and from 1 to 2) can correspond to very different size changes in proportion (here a difference of 0.23 from 0.5 to 0.73, and a difference of 0.15 from 0.73 to 0.88). The R packages used to model the data in this paper do not have the functionality to back-transform modeled differences in log odds to proportions, and we therefore opt to represent the difference plots in log odds rather than proportions. Readers interested in transforming the log odds to proportions can use the following formula: (exp(logodds))/(1+exp(logodds)). have completed the task for the White Afrikaans talker (where nasalization onset is earlier) and are hence starting to look away from the target image for this talker towards the end of the trial (see Beddor et al., 2018, pp. 954-955, for a similar 'look away' pattern).

Do individuals' patterns of produced coarticulatory nasalization predict their perceptual reliance on coarticulatory nasalization?
Having established that both Kleurling and White Afrikaans-speaking listeners rely perceptually on acoustic information for coarticulatory nasalization, we turn to the question of whether there is a relation between an individual's production of coarticulatory nasalization and that same individual's perceptual use of this information. To investigate this question, we rely on a GAMM that includes PC1 (and its interactions) as a fixed factor. As shown in Section 2.2, higher PC1 values correspond to speakers with both earlier onset and higher overall volume of nasal airflow in CVN(C) words. If an individual's produced coarticulatory nasalization predicts their perception, we would expect participants with higher PC1 values to fixate earlier on nasal CVN(C) target images. Based on the general assumption of a production-perception link, this effect is expected to hold irrespective of whether listeners are attending to the speech of the Kleurling or White Afrikaans talker.
The results given in Figure 7 and    Figure 7. These results therefore provide support for the hypothesis that individuals who produce more extensive coarticulatory nasalization in CVN(C) tokens also rely more on coarticulatory nasalization perceptually, and hence add to the evidence for a link between individuals' production and perception patterns.

Do listeners adapt to the coarticulatory nasalization patterns of the talker?
As reviewed in Section 1.2, under certain conditions, listeners can rapidly adjust their perceptual strategies to the acoustic patterns present in the speech of a specific talker. In our study, stimuli were blocked such that participants were presented with only oral CVC auditory targets in the initial part of the experiment (Oral Only block), and with both oral CVC and nasal CVN(C) targets in the second part (Mixed block). It is therefore only in the Mixed block that participants get information about differences in the timing of coarticulatory nasalization for the two talkers (earlier onset for the White Afrikaans talker). Once participants reach the Mixed block they could hence, based on the timing patterns of nasalization in the White Afrikaans tokens, identify a token as CVC rather than CVN(C) with confidence relatively early in the vowel of the CVC token (if the vowel is still fully oral about 25% into the vowel, it can only be the oral CVC token). Consequently, we might expect earlier fixations on the CVC target image in the Mixed than Oral Only blocks for the White Afrikaans talker. Conversely, for the Kleurling Afrikaans talker, participants will get information in the Mixed block that CVC and CVN(C) tokens are ambiguous up to the very end of the vowel (due to late onset of nasalization in CVN(C) tokens).
Minimally, no change in the speed of looks to the CVC target images would be expected in Kleurling Afrikaans condition. It is also possible that confirmation of the late disambiguation between CVC and CVN(C) tokens in the Kleurling Afrikaans condition can result in additional uncertainty on the part of the listeners, which could lead to a slow-down in fixations on the CVC target images in the Mixed versus Oral Only Block in the Kleurling Afrikaans conditions. As with earlier hypotheses, this hypothesis is assessed using a GAMM.     Figure 9.
Afrikaans talker, the corresponding speed-up for the talker of White Afrikaans was not found.
We therefore find partial support for the hypothesis that listeners rapidly adjust their perceptual strategies based on speaker-specific acoustic timing patterns.

Do listeners anticipate differences between the Kleurling and White Afrikaans talkers?
We have found partial support in Section 3.2.3 for the hypothesis that listeners rapidly adjust their perceptual strategies, at least for the Kleurling Afrikaans talker, once they receive information about the specific timing patterns of this talker's coarticulatory nasalization in CVN(C) tokens.
We turn now to whether listeners anticipate these timing patterns. The timing of coarticulatory nasalization in the perception stimuli reflect typical patterns for Kleurling and White Afrikaans (cf. Section 2.2). Additionally, the two talkers who produced these stimuli are easily and unambiguously identified as speakers of these varieties based on the instruction sentences that introduce each stimulus (cf. Section 3.1.2). If the participants have pre-existing knowledge about the typical timing patterns of coarticulatory nasalization for these two varieties of Afrikaans, they may rely on this knowledge even during the initial Oral Only block of the experiment, before receiving information about coarticulatory nasalization for these specific talkers. In this case, listeners would fixate on the CVC target image in the initial Oral Only block earlier for the White Afrikaans talker because for the White, but not the Kleurling, Afrikaans talker listeners would, by hypothesis, be able to identify the auditory target as a CVC word early during the vowel based on the absence of acoustic evidence for nasalization.
As before, we rely on a GAMM to assess whether this effect is observed in our data. Figure 11 shows the proportion fixations over time in the Oral Only block for the CVC auditory targets as produced by the White and Kleurling Afrikaans talkers. The left and right panels show the patterns for the Kleurling and White Afrikaans listeners, respectively. Panels in the top row show the average observed fixations, while those in the middle show model-predicted fixations. The bottom row shows model-predicted differences where a positive difference indicates more target fixations in response to the White Afrikaans stimuli. Differences would be expected fairly early during the trial, given that disambiguation between CVC and CVN(C) tokens happen early in the vowel in White Afrikaans. Inspection of the figure shows, however, that the observed differences happen comparatively late (between 400 and 600 ms after vowel onset) and, for both Kleurling and White Afrikaans listeners, the difference is in the opposite-to-expected direction (i.e., more looks to the CVC tokens for the Kleurling than White Afrikaans talker). The opposite-to-expected pattern is difficult to explain. However, given the lateness of this effect, as well as its direction, the current experiment does not provide support for the hypothesis that listeners adjust their perceptual strategies based on presumed pre-existing knowledge about the variety of Afrikaans being spoken. Bottom: model-predicted differences, calculated such that a positive difference corresponds to more target fixations for the White than Kleurling Afrikaans stimuli. Middle and bottom panels include 95% confidence bands; temporal regions marked in red are regions of significant difference.

Summary
This study investigated patterns of produced nasal coarticulation and the perceptual reliance on this information by members of an Afrikaans speech community in which variation in nasalization is socially structured. Consistent with earlier impressionistic descriptions, we confirmed more extensive coarticulatory nasalization in White than Kleurling Afrikaans by showing that, in the production of CVN(C) words, nasal airflow both starts earlier and reaches higher overall volumes for speakers of White than for speakers of Kleurling Afrikaans (Figure 2). In addition, we documented variation in the amount of produced coarticulatory nasalization within each of the two varieties of Afrikaans through submitting the nasal airflow patterns to an fPCA. In this analysis, the first principal component (PC1) accounted for over 90% of observed variation, with higher PC1 values corresponding to earlier onset and higher overall volume of nasal airflow (Figure 3). Although variation was observed within both speaker groups, PC1 values for speakers of White Afrikaans were higher overall than those for speakers of Kleurling Afrikaans (Figure 4).
On the perception side, we first demonstrated that both White and Kleurling Afrikaans listeners rely on nasal coarticulation in differentiating CVN(C) and CVC words. Specifically, given that nasalization started early during the vowel for the White Afrikaans stimuli and late for the Kleurling Afrikaans stimuli, systematically earlier fixations on the CVN(C) target image for the White Afrikaans stimuli would be evidence that listeners rely perceptually on coarticulatory nasalization. We found this pattern both for White and Kleurling Afrikaans listeners (Figure 6), consistent with our hypothesis that listeners' attention to coarticulation will be sensitive to the time-varying patterns of that information.
Replicating the finding of Beddor et al. (2018) for American English, and consistent with theoretical frameworks that assume a link between individual speakers' perception and production repertoires, we found that speakers who produce more extensive coarticulatory nasalization also rely more on this information as listeners. This was confirmed by showing that participants who, as speakers, produce more extensive nasalization (higher PC1 values) fixate earlier on CVN(C) target images than participants who produce less nasalization. This pattern was observed for both the Kleurling and White Afrikaans stimuli, and for both Kleurling and White Afrikaans listeners (Figure 7 and Figure 8).
We also tested the hypothesis that listeners would adjust their perceptual strategies based on coarticulatory timing differences in the speech of the Kleurling and White Afrikaans talkers by investigating listeners' responses to CVC auditory stimuli. We expected that listeners' exposure (in the Mixed perception block) to early onset of vowel nasalization for the White Afrikaans talker and late onset for the Kleurling Afrikaans talker would lead to faster and slower fixations, respectively, on the CVC target images (relative to fixations on the CVC target images in the Oral Only perception block). This hypothesis, tested by comparing the Oral Only and Mixed perception blocks, was partially supported. As expected, both White and Kleurling Afrikaans listeners were slower to fixate on oral CVC target images in the (later occurring) Mixed perception block for the Kleurling Afrikaans talker (Figure 9), that is, for the talker for whom disambiguation between CVC and CVN(C) auditory stimuli only happens towards the end of the vowel. However, contrary to expectations, for the White Afrikaans talker, for whom CVC-CVN(C) disambiguation happens early in the vowel, we did not find evidence of a speed-up in fixations for either the White or Kleurling Afrikaans listeners (Figure 10).
Lastly, we investigated whether listeners have knowledge of the difference in the typical timing patterns of coarticulatory nasalization in White and Kleurling Afrikaans; pre-existing knowledge that they could potentially bring to the perceptual task. In this case, that CVC and CVN(C) can, in general, be disambiguated earlier for White than for Kleurling Afrikaans voices, might lead to faster identification of a word as CVC (rather than CVN(C)) for the White than the Kleurling Afrikaans talker in the experiment-even prior to hearing any CVN(C) produced by that talker. However, as tested within the context of the Oral Only perception block (Figure 11), we did not find evidence for this pattern for either the White or Kleurling Afrikaans listeners.

Is the perception-production link socially mediated?
Although this study is situated within a theoretical framework that postulates a close link between perception and production, we asked whether the production-perception link might nonetheless be relatively weak within a speech community in which coarticulation is socially structured.
In asking this question, we have in mind that, similar to perception leading production or production leading perception for some community members in conditions of an ongoing sound change (e.g., Coetzee et al., 2018;Kuang & Cui, 2018;Pinget et al., 2020), perhaps socially structured coarticulatory variation might lead speaker-listeners to attend perceptually to the information encoded in that variation more than might be expected on the basis of their own production patterns.
As just summarized, though, for participants in this study, those individuals who produced more extensive coarticulatory nasalization also relied more on this information perceptually. Clearly, then, any possible social mediation of the production-perception link was not sufficiently strong to override the basic finding in this study that the extent of produced coarticulatory nasalization predicts perceptual reliance on this information, for both White and Kleurling Afrikaans-speaking individuals. Thus, our findings suggest that listeners are applying a perceptual strategy determined at least partially by their own production patterns when listening to either White or Kleurling Afrikaans talkers.
On the other hand, if perception were very closely tied to production, we would expect that Kleurling Afrikaans listeners, who overall produce limited coarticulatory nasalization, would exhibit relatively limited perceptual reliance on nasalization and would hence exhibit comparably weak influences of speaker ethnicity on their perceptual judgments. Instead, as shown in Figure 6 (left panels), these listeners fixated more on CVN(C) images when listening to the White Afrikaans compared to the Kleurling Afrikaans talker nearly across the entire duration of the relevant eye-tracking trials. This effect is not simply driven by those Kleurling Afrikaans listeners who produce heavier vowel nasalization since even the Kleurling Afrikaans participants with lower PC1 values fixated reliably earlier on the CVN(C) image when listening to the White Afrikaans talker than to the Kleurling Afrikaans talker. (For example, the dashed curves in the middle left panels of Figures 7 and 8 show that low-PC1 participants were estimated to fixate on the target 50% of the time at 525 ms after vowel onset for the White talker's stimuli but not until about 575 ms after onset for the Kleurling talker's stimuli.) What we cannot determine from these data, though, is whether the perceptual attention to nasalization by these Kleurling Afrikaans listeners is socially mediated. To determine this, we would need to have stimuli similar to those in this study but presented to participants with at most minimal exposure to speakers of the other variety. If the perception-production link is weakened by the social structure of the speech community in which our study was conducted, we should find an even stronger relation between production and perception for participants with no or limited exposure to the other variety.

On the nature of talker-specific perceptual adjustment
Both Kleurling and White Afrikaans listeners attend to the talker-specific patterns of coarticulatory vowel nasalization. This outcome might emerge from perceptual learning of a talker's coarticulatory timing, but it could also follow more generally from listeners' close attention to the coarticulatory information as it becomes available in the unfolding acoustic signal-information that is available earlier for the CVN(C) words produced by the White than by the Kleurling Afrikaans talker. However, if listeners are adjusting to the talker's timing patterns for nasality, this perceptual adjustment should also emerge in their responses to the CVC words produced by these talkers. As reviewed in Section 1.2, listeners are adept at rapidly adjusting their perceptual strategies based on the specific acoustic properties of an interlocutor's speech. In this study, similar evidence of perceptual adjustment was expected to emerge in listeners' slower fixation on target images across the course of the experiment for the Kleurling Afrikaans talker's CVC words versus faster fixations for the White Afrikaans talker's CVC words.
That only the first pattern was found (Figures 9 and 10) is unexpected. Dahan et al. (2008) and Trude and Brown-Schmidt (2012), for instance, both show that listeners are faster to respond once they get evidence for early disambiguation of stimuli (in their case, faster to respond to back after learning that the speaker produced bag with a raised diphthong). This is exactly the pattern that we did not find-faster fixations for the White Afrikaans talker. Instead, we found evidence for a slow-down once listeners receive information for late disambiguation for the Kleurling Afrikaans talker. The reason for this difference is difficult to explain. We note, however, that the phenomenon that is the focus of the studies by Dahan et al. and Trude and Brown-Schmidt is above the level of consciousness-that is, American English listeners will most likely consciously notice the difference between a non-raised and raised production of a word like bag ([baeg] versus [beɪg]) because both vowels have phonemic status in English. The difference between early and late onset of coarticulatory nasalization in Afrikaans, however, is below the level of consciousness-coarticulatory nasalization is not phonemic in Afrikaans. These differences in the status of the phenomena may hence be relevant in the different patterns of perceptual adjustment seen in the Dahan et al. and Trude and Brown-Schmidt studies versus our study.

Differential perceptual strategies based on the identity of the talker
Contrary to results reported by Niedzielski (1999), Hay et al. (2006a, 2006b), Staum Casasanto (2008, 2009a, 2009b, Schertz et al. (2019), and others, which showed that listeners adjust their perceptual strategies based on their prior knowledge of a targeted speech variety, we did not find evidence that listeners adjust their perceptual strategies based on the assumed identity of the talker. Specifically, we did not find for either the White or the Kleurling Afrikaans listeners that they were faster to fixate on the CVC target images in the Oral Only block for the White than for the Kleurling Afrikaans talker (Figure 11). That is, we did not find evidence that these listeners anticipated, based on pre-existing knowledge about nasalization patterns in different dialects of Afrikaans, talker-specific coarticulatory patterns.
The absence of this effect may be at least in part methodological: Previous studies showing an influence of anticipated talker variety on listeners' judgments have used visual priming (e.g., orthographic label or purported picture of the talker). In spite of the fact that we found high accuracy in identifying the variety of Afrikaans spoken by the two talkers who provided stimuli for the perception study (see Section 3.1.2), it may be that the auditory instructions for each trial produced by the Kleurling and White Afrikaans talkers served as less explicit information about talker identity than the explicit visual or orthographic cues used in other studies. (In this regard, we note that Munson, Ryherd, & Kemper, 2017, found that explicit priming of talker sex with a picture of a woman or a man had stronger influences on linguistic judgments than implicit priming based on female-versus male-associated sentence content.) Another methodological difference between our study and some studies that found evidence of an influence of anticipated talker identity is that all auditory stimuli in our study were congruent with speaker variety-that is, all White Afrikaans CVN(C) stimuli had early onset nasalization and all corresponding Kleurling Afrikaans stimuli had late onset nasalization. In comparison, participants in some other studies were also presented with trials in which the acoustic properties of the stimuli were incongruent with the assumed identity of the speaker. In their study of the perception of New Zealand and Australian English vowels, Hay and Drager (2010), for instance, presented all stimuli (both those typical of New Zealand and Australian English) in both the New Zealand and Australian conditions of their study. The mismatch between the acoustic stimuli and the patterns expected based on the assumed identity of the talker could cause participants to attend more closely to these expected patterns.
Alternatively, or in addition, for the White Afrikaans listeners, absence of evidence of a difference in anticipatory response to White as opposed to Kleurling Afrikaans might be ascribed to the fact that, due the structure of South African society, they have less exposure to Kleurling Afrikaans and so may not have sufficient knowledge of the differences between the two varieties of the language to adjust their perceptual strategies relative to the identity of the talker. Moreover, along the lines of the claim by Sumner, Kim, King, and McGowan (2014) that socially stigmatized varieties of a language receive less robust exemplar encoding, even if White Afrikaans listeners have sufficient exposure to Kleurling Afrikaans they may not use this information to inform differential perceptual strategies in response to a speaker of Kleurling versus White Afrikaans. However, these explanations are not available for the lack of evidence of differential perceptual strategies for the Kleurling Afrikaans listeners. These listeners would have ample exposure not only to Kleurling Afrikaans (in the home and family context) but also to White Afrikaans (through the media and as students at a majority White Afrikaans university), which is also the prestige variety of the language, especially in the academic context where data collection for this study took place.
Yet another possible explanation for the absence of differential perceptual anticipation strategies may again (see Section 4.3 for perceptual adjustment strategies) be that the phenomenon of interest here (coarticulatory nasalization) is below the level of consciousness and is non-phonemic in Afrikaans. This differentiates this phenomenon from at least some of the phenomena for which such differential perceptual strategies have been documented.

Conclusion
Many phonetic theories assume a close relation between speech production and perception including, for some approaches, between the production and perception repertoires of individual language users. At the same time, successful communication depends on listeners being able to accurately perceive speech produced by speakers whose production patterns may be quite different from their own, implying a need for flexibility in the perception-production link. Understanding the factors that mediate this link at the level of the individual and the speech community is therefore central to phonetic theory. In this study, we investigated how the production-perception link may be mediated by socially structured variation in the extent of produced coarticulatory nasalization in an Afrikaans speech community. For this community, we found evidence for a production-perception link at the level of the individual, such that individuals who produce more coarticulatory nasalization also rely more on this information in perception-and they do so regardless of the talker's (predictably structured) pattern of nasalization. The persistence of the production-perception link, even in a context of socially structured variation, provides evidence for the robustness of this link. At the same time, although the relative perceptual usefulness of coarticulatory information is informed by listeners' own productions, our results also show that even language users who themselves produce little to no anticipatory nasalization are nonetheless adept at using that information in perception. The evidence provided in this study further shows, though, that listeners' perceptual adjustments for speaker-specific, real-time information occur only under certain circumstances. No clear evidence was found for the social mediation of the link between production and perception based on pre-existing knowledge of different coarticulatory patterns in different socio-ethnic varieties of Afrikaans. The continuing challenge for phonetic theory is to determine how individual language users balance, from moment to moment, their reliance on the acoustic patterns in the speech of their interlocutors, and their reliance on their own production patterns.