Transphonologization of voicing in Chru: Studies in production and perception

Chru, a Chamic language of south-central Vietnam, has been described as combining contrastive obstruent voicing with incipient registral properties (Fuller, 1977). A production study reveals that obstruent voicing has already become optional and that the voicing contrast has been transphonologized into a register contrast based primarily on vowel height (F1). An identification study shows that perception roughly matches production in that F1 is the main perceptual cue associated with the contrast. Structured variation in production suggests a sound change still in progress: While younger speakers largely rely on vowel height to produce the register contrast, older male speakers maintain a variety of secondary properties, including optional closure voicing. Our results shed light on the initial stages of register formation and challenge the claim that register languages must go through a stage in which breathiness or aspiration is the primary contrastive property (Haudricourt, 1965; Wayland & Jongman, 2002; Thurgood, 2002). This article also complements several recent studies about the transphonologization of voicing in typologically diverse languages (Svantesson & House, 2006; Howe, 2017; Coetzee, Beddor, Shedden, Styler, & Wissing, 2018).


Introduction
Recent studies have revealed several cases of transphonologization of laryngeal contrasts in languages as diverse as Kammu, Malagasy, and Afrikaans (Svantesson & House, 2006;Howe, 2017;Coetzee et al., 2018). In these languages, low-level f0 perturbations induced by onset voicing or aspiration have become contrastive as VOT-based contrasts were neutralized, providing apparent-time evidence for a process that was diachronically and phonetically well-established (House & Fairbanks, 1953;Haudricourt, 1954;Hyman, 1976;Hombert, Ohala, & Ewan, 1979). In many Southeast Asian languages, however, the same type of onset voicing is transphonologized into a bundle of properties called register, characterized by contrastive differences in phonation type, 1 vowel quality, and/or duration in addition to f0 (Henderson, 1952).
In this paper, we explore the development of register in Chru, a Chamic language (Austronesian) of Vietnam. Chru was described by Fuller (1977) as having a voicing contrast accompanied by redundant register, and was chosen as it could inform us about the earliest stages of register formation, when voicing is still contrastive but is gradually enhanced by non-automatic, or extrinsic, register properties (Hyman, 1976). However, Chru onsets (Table 2) have been described as preserving the Proto-Chamic voicing contrast (Phạm, 1955;Lee, 1966;Fuller, 1977). According to previous sources, there is a contrast between plain voiceless and plain voiced stops (there are also implosives, aspirated stops, and sonorants, but they do not contrast in voicing). However, Fuller (1977, p. 85) mentions that Chru "seems to have a non-contrastive feature of register in which the vowel and sometimes the syllable has a lax, breathy quality or a tense, clear quality. Often the breathy quality is concomitant of length in the vowel and voicing in the syllable initial stop." Based on this description, we set out to investigate the language with the goal of ascertaining if voiced stops condition the expected acoustic properties on following vowels in a language at what may be an early stage of registrogenesis. The complicated relationship between voicing and phonation type mentioned in Section 1.1 is of special interest.
Fuller's observation that Chru has a non-contrastive register feature also raises the possibility that register has already started to phonologize without becoming contrastive (cf. Hyman, 1976). If it is the case, it could be generalized to non-contrastive contexts (Huffman, 1976). Such a generalization is in fact attested in Cham dialects closely related to Chru, in which there is a process of register spreading, i.e., a rightwards propagation of register in sonorant-initial syllables (Friberg & Hor, 1977;Thurgood, 1999). In Formal Eastern Cham, for instance, the second syllable of /ala/ 'snake' is realized with a high register, but in /ḳila/ 'stupid,' a word whose initial consonant bears a low register (marked with a subscript dot), the second syllable takes on a low register as well, yielding surface  [ḳiḷa] (Brunelle, 2009b). Register spreading is not limited to Chamic: It is also attested in the orthography of Khmer and has evolved into long-distance vowel alternations in Madurese (Cohn, 1993;Cohn & Lockwood, 1994;Misnadin & Kirby, 2020).

Structured variation and change
Also important for understanding registrogenesis and transphonologization more generally is to understand how the acoustic distributions and perceptual salience of the individual phonetic cues evolve over time. In classical treatments (e.g., Hyman, 1976), a secondary cue such as f0 becomes phonologized first, giving rise to a period where the previously primary cue is still produced, but is perceptually redundant. The transphonologization of f0 in Malagasy (Howe, 2017) and Afrikaans (Coetzee et al., 2018) appears to have followed this pattern. However, it is also possible that the acoustic and/or perceptual relevance of one cue increases proportionally as another decreases, as appears to have been the case in Seoul Korean (Kang, 2014;Bang, Sonderegger, Kang, Clayards, & Yoon, 2018), or even that listeners' attention shifts to a secondary cue in perception before changing in their production (Kuang & Cui, 2018). Here, we focus on three factors: covariation between cues; correlation between acoustic and perceptual salience; and individual differences in the production-perception relationship. First, is there a compensatory relationship between the presence of onset voicing and the vocalic properties of register? In particular, does there exist an inverse relationship between the frequency and/or temporal extent of closure voicing and spectral properties of register in vowels? Within-category covariation in production between primary and secondary cues has been proposed for other contrasts (Shultz, Francis, & Llanos, 2012;Kang, 2014;Kirby & Ladd, 2016;Howe, 2017;Bang et al., 2018), but has not always been found (Kirby & Ladd, 2016;Clayards, 2018).
Finally, recent studies suggest that one can date a cue shift by comparing the production and perception of individuals. Structured relations between production and perception are not always found (Shultz et al., 2012;Schertz, Cho, Lotto, & Warner, 2015;Brunelle, Hạ, & Grice, 2016), but the picture that emerges is that at an early stage, coarticulation and reduction biases alter the relative salience of the phonetic properties associated to a contrast (Ohala, 1989;Beddor, 2009;Bang et al., 2018). Cues that were ancillary then gain perceptual salience in some listeners (Ohala, 1981;Harrington, Kleber, & Reubold, 2008;Beddor, 2009;Kleber, Harrington, & Reubold, 2012;Ohala, 2012;Kuang & Cui, 2018), and in turn trigger a production shift in some individuals. At late stages, both innovative and conservative speakers exhibit some degree of sensitivity to all the relevant cues, until the production shift is completed in the entire community (Pinget, Kager, & Van de Velde, 2016;Howe, 2017;Kuang & Cui, 2018;Pinget, Kager, & Van de Velde, 2019). Although we did not specifically set out to study structured sociolinguistic variation in the speech community, we attempted to balance our speaker sample to the extent possible in order to see what differences, if any, obtain between men and women and between older and younger speakers.

Research questions
We therefore set out to answer the following research questions: Q1) How is the voicing/register contrast of Chru realized acoustically? How much individual variation is there in the speech community? a. Is there any evidence that Chru has already developed a register system, as suggested by Fuller (1977)? If register properties are already present, do they correspond to the secondary acoustic properties of voicing expected to be phonologized in register systems? b. Is there any evidence that the voicing contrast has been neutralized? If register properties are already present, is there evidence that they are, or are becoming, contrastive? c. If voicing and register still coexist, are they in a compensatory relation? Are the acoustic properties of register more or less distinct when prevoicing is absent? Q2) What are the perceptual cues used by Chru listeners to identify the voicing/register contrast? Is there variation across listeners? Q3) Do the weights of individual acoustic properties and perceptual cues of voicing/register correlate at the individual or group level? If there is structured variation across individuals, what does it reveal about registrogenesis in Chru and sound change in general?
In order to answer these questions, we undertook both production and perception studies in the Chru community. For practical reasons, data collection was staggered: We gathered the production data in June 2018 and analyzed it first, in order to have sufficient time to design and pilot sensible stimuli for the perception study, which was conducted in June, 2019. We present the production study in Section 2 and the perception study in Section 3.

Data collection
Twenty-six speakers of Chru (15 women and 11 men) were recorded in the villages of Điom A and Proh, in the province of Lâm Đồng (about 50 km south of Đà Lạt). We chose to work in Điom A, a village where Fuller also worked in the 1960-70s, to ensure that our data are comparable with his (Fuller, 1977). Proh was selected because we had a reliable contact there. Our speakers were born between 1951 and 2000 (between 18 and 67 at the time of recording), were all highly proficient in Vietnamese, and were all born and raised in the district of Đơn Dương, home to the majority of Vietnam's Chru population. Participants were presented with selected target words in Vietnamese, had to translate them into Chru, and to produce them four times in the frame sentence in (1).
Simultaneous audio and electroglottographic (EGG) signals were recorded. Audio recordings were made with a Beyerdynamic 55.18 Mk II microphone connected to a Marantz PMD-660 digital recorder. EGG recordings were acquired through the MATLAB data acquisition toolbox with a Glottal Enterprise EG2-PCX laryngograph connected to a laptop through a National Instrument USB6210 data acquisition device. Three signals were acquired with the EGG: an electroglottograph signal, a larynx height signal, and an audio channel. This paper focuses exclusively on the high-quality audio recordings, which are available from the Pangloss collection (https://pangloss.cnrs.fr/corpus/ list_rsc_en.php?lg=Chru&name=Chru) and from Cocoon (https://doi.org/10.24397/ pangloss-0005939).

Acoustic and statistical analysis
Each target word was annotated in a Praat Textgrid. As illustrated in Figure 2, three major acoustic landmarks were used to make our measurements: the beginning and endpoint of onset stop closures, onset fricatives or onset sonorants, the beginning and endpoints of the open phase ranging from the burst to the endpoint of the vowel, and the voiced onset time (VOT) associated with stops. VOT was calculated by subtracting the time at the onset of voicing (ov) from the time at the beginning of the open phase (op). When there was no bleeding-i.e., no progressive voicing extending from a previous sonorant without reaching the burst (Davidson, 2016)-ov was set at the beginning of the voice bar or, in the absence of a voice bar, of vowel phonation. If bleeding covered less than the first 50% of the closure, ov was positioned after it ([aṭa] in Figure 2). Finally, in a few tokens, most of the closure was voiced, but voicing stopped shortly before the burst because of the aerodynamic voicing constraint (second instance of [ada] in Figure 2). In such cases, ov was marked as soon as closure voicing began, but additional annotations were used to mark cessation of voicing (cv) and resumption of voicing (rv). As only 27 words contained cv and rv labels, they will not be reported here. Several types of acoustic measurements were obtained from these landmarks with PraatSauce, a Praat-based application for spectral measures based on VoiceSauce (Shue, Keating, Vicenik, & Yu, 2011;Kirby, 2018): The most relevant are the duration of onset stop closures and vowels, stop VOT, and the f0, first two formants, cepstral peak prominence (CPP), and H1-H2 (H1-A1 and H1-A3 were also measured but will not be reported as they do not distinguish registers as clearly as H1-H2 at each 1 ms of the vowel). H1-H2 measures were corrected for formant frequencies and bandwidths, and will thus appear as H1*-H2* (Hawks & Miller, 1995;Iseli & Alwan, 2004).
The data and the R script used for data processing (Script 1) are available as Supplementary Material, but some decisions need to be mentioned here. First, since 25 ms windows were used for acoustic measures, the first sampling point reported for each vowel corresponds to a window centered on its 12th ms, thus excluding any influence from the onset. Second, two algorithms were used to remove measurements errors. In order to remove sudden jumps in tracking, f0, F1, and F2 were z-normalized per speaker. Derivatives were then computed for consecutive sampling points and all measures with derivatives of ±0.5 standard deviations were erased. Then, in order to remove tracking errors over longer time spans, we excluded all f0, F1, and F2 values deviating by more than 3 standard deviations from means computed for combinations of subject, vowel, and register. In total, 2.4% of f0 values, 4.5% of F1 values, and 4.4% of F2 values were removed. All H1*-H2* measures derived from excluded f0, F1, and F2 values were also deleted.
To facilitate the comparison of acoustic measurements across participants and ensure convergence of statistical models, f0, formants, CPP, and H1*-H2* measurements were z-normalized by speaker a second time, after removing outliers. As z-scales are difficult to interpret, z-scores were converted back to familiar scales based on means and standard deviations obtained for all speakers in the groups under investigation (mean of all speakers + z-score * standard deviations for all speakers). These normalized scales are used in figures where data is pooled over groups of speakers and in statistical analyses.
The statistical strength of linguistically meaningful differences was tested with mixed models using the R package lmerTest (Kuznetsova, Brockhoff, & Christensen, 2017). Dependent variables will be indicated where relevant. Unless indicated otherwise, register, place, and vowel were used as fixed factors. All two-way interactions were included; threeway interactions were excluded to avoid overfitting as combinations of the three fixed effects often contained a single word. Random effects included by-subject and by-word random intercepts. Random slopes were not included as this often resulted in singular (overfitted) models. Maximal models were simplified by dropping non-significant fixed effects if doing so yielded a significantly lower Akaike information criterion (AIC) score. Interactions were dropped before main effects (by decreasing order of F-values in ANOVA model comparisons) and effects were not dropped if they were a subset of a significant interaction. In the main text, we focus on visual displays and discussion of the most relevant model parameters; see Appendix B for the fixed effect estimates and the Supplementary Material for the R code and data files.

Onsets
The top row of Figure 3 reports VOT values for the series of stops described as 'voiced' in previous sources (Phạm, 1955;Lee, 1966;Fuller, 1977). Contrary to expectations, they have a bimodal VOT distribution: They are sometimes voiced, but more often voiceless. To avoid confusion, we will relabel them low-register stops, and will characterize them as prevoiced when they have a negative VOT, and as devoiced when they have a positive VOT. The second row corresponds to the series previously described as voiceless stops, which we will refer to as high-register stops. High-register stops have a unimodal distribution, centered around a 13 ms positive VOT.
A closer look at low-register stops reveals that the VOT of devoiced coronal stops is comparable to that of high-register coronal stops, but that devoiced velar stops have a slightly longer VOT than their high-register counterparts (RegisterLow: β = -1 ms, t = -.799, p = .507; RegisterLow:Placevelar: β = 5 ms, t = 3.714, p = .062; see Table 1 in Appendix B).
The distribution of VOT in Figure 3 hides important interspeaker variation in lowregister stops. A breakdown by speaker is given in Figure 4, where the proportion of prevoiced low-register stops is plotted by age and sex. Seven out of 11 men prevoice more than 50% of their low-register stops, but only one out of 15 women does. There also seems Figure 3: VOT of onset stops. Plain stops are split into high register (<voiceless) and low register (<voiced). Implosives and aspirates are given for comparison.
to be an age gradient, illustrated by the significant regression line in Figure 4 (r = 0.45, p = 0.02): Younger speakers have lower proportions of prevoicing than their elders. Since this could be evidence for a change in progress, differences between two groups will be tracked in the rest of Section 2.2: Speakers will be split into Voicers (8/26 participants), who prevoice 50% or more of their low-register stops, and Devoicers (18/26 participants), who devoice them more than 50% of the time. Mean closure duration for high-and low-register stops is reported in Table 3. Devoiced velar stops have a shorter closure duration than their high-register counterparts in both Voicers and Devoicers (see statistical models in Table 2 of Appendix B). The last row of Table 3 also shows that Devoicers have comparable proportions of devoicing in lowregister coronals and velars (87.6% versus 84%), but that Voicers devoice coronals more often than velars (33.3% versus 17%). Temporal measures obtained from onset stops thus suggest that the original voicing contrast is no longer consistently realized in Chru onset stops: Closure voicing has become optional (even for Voicers), and closure duration differences between high-and low-register stops are now restricted to velars.   Table 3 in Appendix B). F0 is not a reliable acoustic indicator of register either. In Figure 5, it appears higher after high-register than after low-register stops, but in Voicers, this effect is largely circumscribed to the vowel /iː/ (RegisterLow:Voweli β = -43 Hz, t = -3.156, p = .016), while in Devoicers, the effect of Register is not robust enough to be included in the final statistical model (see Table 4 in Appendix B). The vowels of sonorant-initial syllables following presyllables headed by high-register stops also have a higher f0 than those following presyllables headed by low-register stops (sonorant-initial monosyllables, which are labelled as 'register neutral' in Figure 5 as they do not contrast in voicing and should not be affected by register spreading, fall in between), but this coarticulatory effect is again weak and limited to some combinations of vowels and places (see Table 5 in Appendix B). Finally, vowels headed by obstruents that do not contrast for register (implosive, fricative, and aspirated) all have high f0 contours, f0 following fricative /s/ being the highest. The fact that the f0 effects visible in Figure 5 are statistically weak can be attributed to individual variation in the f0 patterns of both Voicers and Devoicers (discussed further in Section 2.2.3 below).
Out of the three spectral slope indicators that were analyzed, only normalized H1*-H2* will be reported, as it shows the strongest effect. In Figure 6, register conditions a robust H1*-H2* difference at vowel onset, indicating a breathier phonation in the low register (Voicers: RegisterLow β = 3.25 dB, t = 4.248, p = .027; Devoicers: RegisterLow β = 9.58 dB, t = 5.151, p = .002, with much weaker effects in close vowels; see Table 6 in Appendix B). This effect of register on H1*-H2* is limited to the beginning of the vowel and does not extend to sonorant-initial syllables (see Table 7 in Appendix B). Finally, while fricatives and aspirates are associated with a high H1*-H2* because of their glottal opening, implosives are followed by a low H1*-H2* caused by their glottal closure. Normalized CPP, shown in Figure 7, is another indicator of phonation type. It is expected to be high when phonation is modal, and to be low when phonation is non-modal. As such, the systematic rise in CPP at the beginning of vowels is an indication that phonation is perturbed by onsets. CPP is consistently lower after low-register than high-register stops in Voicers (RegisterLow β = -3.95 dB, t = -8.286, p < .001), but this effect is not as robust in Devoicers (RegisterLow β = -3.74 dB, t = -2.324, p = .137). CPP does not start as low in sonorants as in stops because they are not produced with a spread or constricted  glottis, but the register of presyllables nonetheless exerts a weak coarticulatory effect on sonorant-initial syllables (see Table 7.2.9). Vowels following register-neutral obstruents all start with a low CPP because their onsets are either produced with an open (aspirates and fricatives) or a closed glottis (implosives), two settings that favor non-modal phonation. F1 is plotted by vowel in Figure 8. Vowels have a lower F1 immediately after low-register than high-register stops (Voicers: RegisterLow β = -166 Hz, t = -9.743, p < 0.001. Devoicers: RegisterLow β = -240 Hz, t = -9.220, p < 0.006). However, the effect of register is much smaller in close vowels (see interactions of Register and The range of the y-axis is kept constant across vowels (500 Hz). Ribbons represent the 95% CI of the mean. The large confidence interval for /εː/ sonorants in Devoicers is due to the small number of tokens as the intended target word /bəŋε/ was produced with the high register by most speakers. Table 10 in Appendix B). The difference between registers diminishes during the production of the vowel, but is maintained for at least 100 ms, longer than for any other spectral property. The coarticulatory influence of presyllables on the F1 of sonorant-initial syllables is weak at best (see Table 11 in Appendix B). There is little difference between vowels following register-neutral obstruents.

Vowel in
In Figure 9, the effect of register on F2 following stops is weak (statistical results are given in Table 12 of Appendix B). The formants of sonorant-initial syllables and syllables headed by register-neutral obstruents are highly variable and show no robust statistical pattern (see Table 13 in Appendix B). The large confidence interval for /εː/ sonorants in Voicers is due to the small number of tokens as the intended target word /bəŋε/ was produced with the high register by most speakers.
Inspection of the spectral properties of register does not reveal important qualitative differences between Voicers and Devoicers, contrary to what was found for VOT. The register contrast is primarily realized by differences in F1, especially (but not exclusively) in open and open-mid vowels. Phonation (H1*-H2* and CPP) is also a consistent indicator of register. On the other hand, registers do not differ consistently in terms of f0 or F2. The register of presyllables has a coarticulatory effect on syllables headed by sonorants, but this effect is smaller than that found in syllables headed by stops, suggesting that it is not categorical register spreading.

Relation between closure voicing and vocalic properties of register
A breakdown of syllables headed by low-register plain stops by phonetic voicing (devoiced versus prevoiced), plotted in Figure 10, shows that out of the four acoustic properties that are conditioned by register, f0 and H1*-H2* (and possibly CPP) are more distinct from the high register when onsets are prevoiced than when they are devoiced. If speakers enhanced register cues to compensate for the lack of prevoicing, we would expect these properties to be enhanced to a greater extent when prevoicing is absent. Crucially, F1, which was arguably the most robust register property in Section 2.2.2, behaves differently, in that it seems equivalent after prevoiced and devoiced stops. As there are relatively few tokens of low-register stops in the dataset (an average of 26.2 per speaker) and as the frequency of closure voicing is highly variable across speakers (see Figure 4), there is too little data for a meaningful statistical analysis comparing Voicers and Devoicers, but we note that similar results are obtained when the same figure is only plotted with the four speakers that have the most balanced proportions of prevoiced and devoiced stops.

Individual variation
Although we did not find significant qualitative differences between Voicers and Devoicers in Section 2.2.2 at the group level, the normalized mean values presented therein conceal non-negligible individual variation in each group. It is therefore essential to consider individual behavior, and more especially the magnitude of the difference between the two registers for each speaker and acoustic property. We did this by computing Cohen's d (Cohen, 1988), an effect size indicator, for individual speakers and relevant acoustic properties. We calculated Cohen's d as the vowel-weighted difference between the means of the two registers at the first sampling point after plain stops, divided by the pooled register-weighted mean of their standard deviations. While Cohen's d is simple to compute, we note that the scores must be interpreted with some caution, as this measure does not take into account possible correlations between cues.  The most striking regularity in Figure 11 is that women distinguish their registers primarily in terms of F1, only using other properties to a limited extent. Men's productions, on the other hand, tend to be distinguished by a wider range of cues. They largely base their register contrast on F1, like women, but many also have high Cohen's d for VOT, as well as slightly higher scores for H1*-H2*.
As was already foreshadowed in Section 2.2.2, Cohen's d scores for f0 vary unexpectedly across speakers. They tend to be close to 0, but while some speakers have a distinctly higher f0 in the high register (like most older men on the right of Figure 11), others have a consistently higher f0 in the low register, like F18, F33', and M39.

Summary of acoustic results
Our acoustic results confirm Fuller's (1977) intuition that Chru has already developed a register contrast and that the vocalic properties of this register system are analogous to those found in other Austroasiatic and Chamic register languages. After stops, F1 differences between registers are robust, last over the first 100-150 ms of the vowel, and tend to be greater in non-close vowels. Non-close vowels have a moderate falling on-glide in the low register: In the clearest cases, low register /aː/ and /ɔː/ sound like [ ə a] and [ o ɔ]. Close vowels, on the other hand, have a slightly higher F1 at vowel onset in the high register, but this effect is never large enough to be heard as an onglide. H1*-H2* and CPP results also indicate that there is a moderate but consistent lax to breathy phonation in the initial 50 ms of low register vowels. Differences in f0 are also found, but are weaker and vary significantly across speakers. Besides the greater diphthongization of open vowels in the low register, which is common in Austroasiatic (Jenner, 1974;Ferlus, 1979;Huffman, 1985), the realization of register does not seem to vary significantly across vowels, contrary to what was found in Southern Yi (Kuang & Cui, 2018). H1*-H2* effects appear weaker in close vowels in Devoicers, but no such effect is found in Voicers. Register also seems to have a greater effect on f0 in vowel /iː/, but not in other vowels. What Fuller seems to have overlooked is that for many speakers, register may have already taken over the contrastive role of voicing. Most speakers (18/26) have prevoicing in less than 50% of 'voiced stops,' and only one speaker preserves it in all low-register stops. Synchronically, Chru therefore seems to have a register system combined with optional prevoicing, rather than a voicing contrast with redundant register. Since some speakers with significant devoicing were already in their mid-20s when Fuller conducted his field research in the late 1960s, the fact that he did not describe this distribution cannot be attributed to a dramatic devoicing in the past half-century; it is probable that devoicing was already present in the community, but that it was overlooked because Fuller was mostly working with older men.
Another important observation concerning onsets is that the large majority of speakers show no evidence of aspiration in devoiced low-register stops. There is evidence that lowregister stops have a longer VOT than high-register stops when they lose their closure voicing, but this effect is limited to velar stops and is probably too small to be audible (5.5 ms). Moreover, a small number of speakers (6/26) have a marginally longer VOT in low-register than high-register stops, as can be seen in Figure 11. While this is reminiscent of the aspiration (or 'breathy release') that is a central keystone of most models of register formation (Haudricourt, 1965;Huffman, 1985;Thurgood, 2002;Wayland & Jongman, 2002), the small VOT difference found in some Chru speakers is certainly weaker and less systematic than expected. That F1 appears to be the primary acoustic correlate of register for all Chru speakers in our sample, despite the fact that only a minority shows signs of lengthened VOT, leads us to doubt the claim that aspiration is a necessary step in registrogenesis.
Speaker averages hide a certain amount of individual variation in the realization of register. In women, F1 is the best-defined acoustic property of register, with Cohen's d scores about five times greater than those of any other property. F1 is also a strong distinctive property among men, but it is less dominant: Many men also maintain large VOT and H1*-H2* differences, and a few have non-negligible Cohen's d scores for f0, CPP, and F2. However, when we look at individuals rather than groups, there is no significant compensatory relation between the weight of the various acoustic properties (for instance, there is no inverse relation between the use of voicing and the use of F1 to contrast registers). Furthermore, with the notable exception of F1, differences in f0, H1*-H2*, and CPP are more pronounced when stops are prevoiced than when they are devoiced. We interpret this as evidence that F1 is the primary, obligatory, acoustic property of the register contrast, but that other properties can be enhanced in clear speech contexts, where prevoicing is also most likely to be present.
Additional evidence about the phonological status of Chru register can be gathered from syllables in which register is non-contrastive. An inspection of the rightwards propagation of register properties from presyllables to syllables headed by sonorants reveals that there appears to be weak coarticulatory effects in f0, CPP, and F1, but that these effects are not indicative of the type of categorical spreading reported in Cham dialects or in Khmer. This shows that even if register has become contrastive in Chru, it is neither involved in productive phonological alternations nor generalized to sonorants. The second type of evidence comes from the acoustic properties of register-neutral obstruents (aspirated stops /t h , k h /, implosive /ɗ/, and fricative /s/). The acoustic properties of vowels following these obstruents do not clearly pattern with a specific register. Vowels following aspirated stops, for example, have a high initial f0 reminiscent of the high register, but their high H1*-H2* and low CPP at voicing onset indicate that they are breathier than the low register. Along the same lines, vowels following the implosive /ɗ/ start with a high f0 and a low H1*-H2*, just like vowels following highregister plain stops, but are initiated with a low CPP and a high F2, which is more similar to the low register. This suggests that register-neutral obstruents are not forced into phonologized register categories, but maintain their own idiosyncratic phonetic properties.

The perception experiment
In order to determine if the acoustic differences between registers uncovered in the production experiment are meaningful to Chru listeners, we returned to Điom A and Proh to conduct a perception experiment in June 2019.

Methods
The perception experiment was conducted with 41 listeners (21 women); two additional participants were excluded because they could not complete the task. They were all born between 1950 and 2001 (18 to 69 years old at the time of the experiment) in the district of Đơn Dương and raised there. Nineteen of these listeners had participated in the production experiment the previous year.
All listeners took part in a forced choice identification task in which they had to listen to synthesized stimuli varying in F1, phonation type, f0, and VOT, and to identify them by pressing one of two keyboard buttons associated to images presented on a computer screen. For one set of stimuli, they had to choose between /mta/ 'eye' and /mda/ 'rich' (where /a/ does not contrast for length but is phonetically long, like all Chru vowels in open syllable) while for the other, the choices were /tuːɁ/ 'bamboo joint, section' and /duːɁ/ 'honey bee' (where /d/ can be realized as [d] or low-register [ṭ]). The experiment was run in OpenSesame (Mathôt, Schreij, & Theeuwes, 2012). Instructions were largely visual because few of the participants were fully literate in Chru.
Ideal stimulus pairs would have consisted of open syllables without presyllables. However, such minimal pairs are rare in Chru, and can be difficult to represent visually (one has to exclude function words and most abstract lexical words). The selection of our two minimal pairs is thus based on the assumption, backed up with acoustic evidence, that the final glottal stop of /tuːʔ~duːʔ/ only affects spectral balance towards the end of the vowel and that the effect of the nasal presyllable of /mta~mda/ on the spectral tilt of the vowel is blocked by the onset of the main syllable (see Styler, 2017, for an overview of the effects of nasality of spectral balance).
Stimuli were synthesized using KlattGrid synthesis in Praat (see Scripts 3 and 4 in the Supplementary Material). Parameters were set in such a way as to imitate natural tokens without including superfluous low-level variation. Spectrograms of sample resynthesized stimuli are given in Figure 12. The four parameters that were manipulated across stimuli are f0, open quotient, F1, and VOT. For each parameter, maximum and minimum values were selected based on the acoustic results presented in Section 3, and fine-tuned to maximize naturalness based on the natural productions of a middle-aged male speaker. 3 The four parameters were crossed so as to obtain all possible combinations of acoustic properties (Figure 13, plotted with Script 5 provided in the Supplementary Material). The four parameters were manipulated as follows: -VOT: Three VOTs were synthesized: -50 ms, 10 ms, and 20 ms. The 20 ms VOT is exaggerated compared to values observed in Chru and was included to test the crosslinguistic hypothesis that there is a relation between obstruent devoicing and aspiration.  The ribbons correspond to one standard deviation above and below the mean (due to interactions between the various parameters, the acoustic properties of stimuli were not always exactly on target).
-f0: An initial pitch target was set at the beginning of the vowel, with three possible values: 130, 140, and 150 Hz. A fixed target was set to 140 Hz at 100 ms into the vowel for all stimuli. -Phonation type (Open Quotient): an initial open quotient was set at the beginning of the vowel, with three possible values: 0.4, 0.5, and 0.6. A fixed target was set to 0.5 at 100 ms into the vowel. In order to increase naturalness and to modulate the CPP variation found in the production experiment, breathiness amplitude (BA) was added to the first 100 ms of the vowel: 60 dB of BA were added to the tokens with an initial open quotient of 0.6 and 30 dB to those with an open quotient of 0.5. No BA was added to tokens with an initial open quotient of 0.4. -F1: An initial F1 target was set at the beginning of the vowel, with three possible values: 350, 400, and 450 Hz for the vowel /uː/ and 500, 600, and 700 Hz for the vowel /aː/. The differences in F1 steps between the two vowels mirror those found in natural tokens. Fixed targets were set to 400 Hz for /uː/ and 700 Hz for /aː/ at 200 ms into the vowel. Other formants were kept constant across tokens.
The resulting stimuli sounded fairly natural, even if they sometimes combined acoustic parameter values that do not naturally cooccur in Chru. As far as we know, no participant came to realize that the stimuli had not been recorded from a real speaker and many participants asked us who the speaker was. 4 Before proceeding to the testing phase proper, all participants had to undergo 1) a training phase with the two tokens closest to natural productions (6 tokens per word pair), 2) a first test phase with the same stimuli (10 tokens for each word pair) and 3) a second test phase with random stimuli (10 tokens for each word pair). They had time to rest and ask for clarification between each block. The real testing phases then started. Stimuli were presented in six blocks. There were three blocks per word pair, alternating between /mta~mda/ and /tuːʔ~duːʔ/ (henceforth a-and u-stimuli). Each block contained all 81 randomized stimuli for the relevant word pair, for a total of 486 tokens per listener. The entire experiment took 15-25 minutes per participant. Responses with reaction times above 2 seconds were not recorded.
Responses were analyzed using mixed logistic regressions with the R package lmerTest (Kuznetsova et al., 2017). The fixed effects were the f0, F1, OQ, and VOT of the synthesized stimuli, where f0, F1, and OQ were treated as centered and ordered categorical factors, and VOT was treated as an unordered categorical factor. Maximal models included all two-way interactions of fixed effects, and random effects included random intercepts for Subject and random slopes combining Subject and main fixed effects. Models were then simplified in a stepwise manner by dropping the fixed effect or interaction with the lowest F-value as long as this did not significantly increase the Akaike information criterion (AIC) score of the model.

Identification experiment
The pattern of responses obtained from the listeners are reported in Figure 14, and statistically summarized in Tables 4 and 5 (response data are available in the Supplementary Material). Note that the intercepts of the final models are not significantly different from 0, which means that we were successful in generating stimuli scales in which middle values have neither a high or low register bias. For both synthesized   syllables, the factors that have the strongest effect on the results are F1 and the presence of a negative VOT. F1 is positively correlated with high register responses (i.e., syllables with /t/), while a negative VOT prompts more low register responses (i.e., a lower rate of /t/ responses). F1 weighs more than negative VOT for a-stimuli (β's of 2.88 and -2.19, respectively in Table 4), while the opposite is observed for u-stimuli (β's of 1.50 versus 2.00 in Table 5). Positive VOTs (10 and 20 ms) elicited similar responses in a-stimuli, while a 20 ms long VOT seems to slightly favor low register responses in u-stimuli.
Other factors play a more limited, yet still significant role. F0 is positively correlated with high register responses in both sets of stimuli, but the magnitude of the effect is fairly small. Open quotient is correlated with low-register responses in a-stimuli, but is not significant in u-stimuli. Several weak interactions are also observed. In both sets of stimuli, an increase in F1 favors high-register responses to a more limited extent in stimuli with a negative than a positive VOT (F1:VOT neg). Moreover, in a-stimuli, simultaneous increases in F1 and breathiness (F1:OQ) favor low-register responses more than independent increments in F1 and breathiness, and an increase in breathiness yields more high-register responses when combined with a high f0 (f0:OQ).

Relation between production and perception
Finally, we explore the relation between production and perception for the 19 participants who completed both production and perception experiments. Identification weights were computed by fitting logistic regressions for each listener with the independent variables f0, F1, OQ, and VOT on the two sets of stimuli. Interactions were not included due the small number of tokens tested with each individual participant. Our proxy for production weights are Cohen's d scores computed for the f0, F1, H1*-H2*, and VOT of the registers as realized on vowels /aː, uː/ at the first sampling point after plain stops for each participant.
Results are plotted in Figure 15. The weight of each relevant acoustic property (Cohen's d) is reported on the x-axis, while the weight of each perceptual cue (|β|) is reported on the y-axis. Each speaker is represented by five different symbols corresponding to the Figure 15: Correlation between perceptual weights (|β|) and acoustic weights (Cohen's d) in individual participants, for vowels /aː/ and /uː/. VQ stands for phonation type and represents the acoustic property H1*-H2* on the x-axis and the OQ of the synthesized stimuli on the y-axis. A token of F1 with a β of 22.8 is omitted from the left panel.
five phonetic properties of interest. F1 is the main acoustic property and perceptual cue of register across speakers, and distinguishes registers more efficiently in open vowels (/aː/) than close vowels (/uː/). Although VOT also plays a role in the production and perception of register for most participants, the production and perception weights of VOT do not always correlate at the individual level. Four women and two men who have a low production weight for VOT (Cohen's d below 2.5) in either /aː/ or /uː/ nevertheless attribute it a large perceptual weight (|β| above 1): In other words, while they recognize VOT to be a correlate of register, they do not use it to distinguish registers in their own production (cf. Coetzee et al., 2018). Also notable is a 63-year-old male who maintains a clear VOT distinction in production, but does not rely on it for identification (the blue square at the bottom right of both panels in Figure 15). Phonation type and f0 play a more limited role in both production and perception.
Globally, the only significant correlation between the Cohen's d's and β estimates of any of phonetic properties across participants is found in f0 in the vowel /uː/ (r = .3, p = .008; see Script 2 in the Supplementary Material). This means that overall, the weight of participants' perceptual cues cannot be predicted from the weight of their corresponding production properties, which is not surprising as listeners have to accommodate speakers who produce the contrast differently from them. Moreover, no gender differences similar to those found in the weight of production properties (Figure 11) are apparent.

Summary of perception results
Pooled results show that low F1 and negative VOT are the primary cues associated with the low register. However, two noteworthy differences between a-and u-stimuli must be mentioned. First, the weight of F1 is greater for a-than u-stimuli. Although this may be due in part to the more restricted range of F1 in our u-stimuli, it mirrors the stronger acoustic weight of F1 in open vowels in the production experiment. Second, there is an association between the exaggerated positive VOT (+20 ms) and the low register in u-, but not in a-stimuli. While this could be further evidence of a weak association between a long positive VOT and low register, why it is only found in u-stimuli remains unclear.
Phonation (OQ) and f0 play a more limited role in perception, but both show an effect in the expected direction. That f0 is perceptually relevant despite its weak acoustic distinctiveness and individual variation in production probably indicates that listeners are aware that most speakers have a higher f0 in the high register, even if it is not a very reliable cue across speakers. It is also possible that the weak perceptual role of phonation type, which is not even significant in u-stimuli, is due to imperfect synthesis.
The β estimates obtained from individual logistic regressions (Figure 15) show that F1 is a crucial identification cue for all speakers. VOT is important for a significant minority of listeners, while other acoustic parameters have a much weaker effect on perception, which is in general accordance with the production results. However, there is no evidence that the weight of perceptual cues is structured along gender or age lines, and there are no systematic correlations between individual participants' production and perception weights. In fact, while a significant minority of participants are perceptually sensitive to VOT, they do not systematically use it to distinguish between the registers in production.

General discussion
Our production results clearly establish that the Proto-Chamic onset voicing contrast has been transphonologized into register in Đơn Dương Chru (Q1a). All our participants now realize the original contrast by means of register properties, while the voicing contrast is now optional for all but one speaker (Q1b). The register distinction is primarily realized as a modulation of F1 over the first 100 ms of the vowel: There is a clear F1 rise at the beginning of open vowels in the low register (sometimes strong enough to yield an audible falling onglide), and a more moderate F1 fall at the beginning of close vowels in the high register. These vowel trajectories are in line with what is found in other register languages (Jenner, 1974;Huffman, 1976;Ferlus, 1979;Huffman, 1985;Wayland & Jongman, 2002). Other acoustic properties are associated with the register contrast: Some laxness/breathiness is systematically present at the very beginning of low register vowels across speakers, and weak f0 and F2 differences are attested in most, but not all, speakers.
Chru could be categorized as a language in which there is 'phonemic vowel register' and optional 'retention of sub-phonemic differentiation in the stops vis-à-vis register' (Huffman, 1976, p. 587). However, in all but a handful of Chru speakers, this optional sub-phonemic differentiation in stops is realized not as an increased VOT, the typical scenario in Austroasiatic register, but rather as prevoicing. That prevoicing and register coexist in Chru suggests that a weak aspiration of devoiced stops may not always be a step in the development of the register contrast, as generally assumed (Haudricourt, 1965;Huffman, 1976;Wayland & Jongman, 2002). The fact that F1 distinguishes registers better than phonation type or positive VOT for all of our participants, including those with high rates of prevoicing, further challenges the idea that f0 and vowel quality differences must necessarily 5 develop out of breathiness or laxness (Thurgood, 2002;Wayland & Jongman, 2002).
If not mediated through breathy phonation, how could a system like Chru come about? Several authors have proposed that the phonetic properties of register are phonologized consequences of articulatory strategies for circumventing the aerodynamic voicing constraint by increasing the transglottal pressure differential (Gregerson, 1976;Ferlus, 1979;Thurgood, 2002;Brunelle, 2010). Two such strategies that would have a direct impact on F1 are tongue root advancement, which causes a raising and a forward movement of the tongue body, and larynx lowering, which lengthens the back cavity responsible for F1 resonances (Bell-Berti, 1975;Lindau, 1979;Tiede, 1996;Fulop, Kari, & Ladefoged, 1998;Ahn, 2018). The spectral slope and f0 differences between registers would be additional, parallel, consequences of these two gestures (Ohala, 1972;Bell-Berti, 1975;Lindau, 1979;Fulop et al., 1998;Honda et al., 1999;Hoole & Honda, 2011), rather than being directly responsible for the development of F1 differences. Whether a given register language attributes more weight to spectral slope, F1 or f0 would then be a consequence of this multidimensionality, as individual listeners may potentially assign each cue a perceptual weight differing from those of the speaker (Beddor, 2009).
A direct route from voicing to register signaled primarily by vowel height is in fact supported by our comparison of the acoustic properties of vowels following prevoiced and devoiced low-register stops (Section 2.2.3). We have seen that there is evidence for the opposite of a compensatory relation between prevoicing, and f0, H1*-H2*, and CPP: Lowregister stops pattern closer to high-register stops when they are devoiced than prevoiced (Q1c). This suggests that pitch and phonation variation in the register system of Chru remain to some extent automatic effects of prevoicing, a phenomenon also seen in Tibeto-Burman languages such as Dzongkha (Kirby & Hyslop, 2019). On the other hand, F1, the primary registral property for all speakers, does not seem to differ after prevoiced and devoiced low-register stops, indicating that it has been phonologized. What remains unclear, as pointed out in Section 1.1, is the articulatory or auditory mechanism that links voicing and laxness/breathiness, especially in light of the fact that closure voicing does not typically condition this type of phonation on following vowels.
Although Chru seems to have developed a contrastive register system in the obstruent sub-system, there is no evidence that it has been generalized to non-contrastive contexts (sonorants, fricatives, implosives, aspirated stops), contrary to what has been observed in many Austroasiatic languages (Huffman, 1976). The acoustic properties following these onsets have not been forced into specific registers: /s-/, for example, is followed by a high f0 characteristic of the high register, but a steep spectral slope normally associated with the low register. There is also no indication that register is involved in phonological alternations like register spreading. This could be interpreted as a sign that Chru still has a relatively conservative form of register, but it should be emphasized that the phonologization of register does not entail its generalization to new environments: This can be compared to the evolution of tone splits, which can lead to a larger inventory of contrastive tones on just a subset of syllable types (unchecked syllables in Vietnamese: Haudricourt, 1954; obstruent-initial syllables in Kra-Dai: Pittayaporn, 2009; sonorantinitial syllables in Wu: Gao, 2015).
In terms of perception (Q2), F1 and VOT are the main cues used in register identification, but while F1 is a significant cue across listeners, VOT is weaker and appears more variable across individuals. F1 also weighs more in the identification of /aː/ than /uː/, which reflects its greater role in the register contrast in non-close vowels. Other cues bias responses in expected directions, but have smaller weights. The results of our perception study thus roughly mirror those of the production study. A comparison of the weights of perceptual and production cues (Q3) reveals no direct correlation between the acoustic weight (Cohen's d) and perceptual weight (β) of the participants who underwent both experiments. The only structured pattern that emerges is that some participants seem sensitive to VOT in perception even if they do not themselves consistently produce different VOTs for the two registers.
In light of these results, can we claim that the Chru register contrast is stable? A definitive answer to this question requires a more systematic sociophonetic study, with a larger participant sample and better controls for factors other than age and sex. However, we seem to be dealing with a situation in which all speakers have a contrastive register based on F1, but in which male speakers, especially older ones, maintain optional closure voicing in the low register and idiosyncratic use of other cues. On the other hand, female speakers, especially younger ones, appear to have mostly dropped the closure voicing cue and assign a systematically lesser weight to acoustic cues other than F1, thus leading the change in reinterpreting register as an exclusively vocalic contrast. Younger men seem similar to young women, suggesting that the attrition of prevoicing may be nearing completion. This is similar to what was recently described in Afrikaans, where the loss of voicing was transphonologized into an f0 contrast in utterance-initial position (Coetzee et al., 2018). Our perception results are compatible with this interpretation: Almost all participants primarily rely on F1 for identification, which should be enough for communicative purposes given that all speakers use F1 in production. However, several participants are making a more significant use of VOT in perception than production, which could be interpreted as evidence that some innovators preserve conservative perceptual cues to accommodate less advanced speakers (Pinget et al., 2016;Howe, 2017;Kuang & Cui, 2018;Pinget et al., 2019).