Vowel harmony and positional variation in Kyrgyz

While it is well known that the phonetic realization of a segment may differ by position, it is unclear how positional variation interacts with vowel harmony, the imperative that vowels be identical along some phonological dimension. Pearce (2008, 2012) contends that phonological harmony blocks phonetic reduction, suggesting that phonology dictates phonetic realization for this class of assimilatory patterns. This paper investigates harmony and vowel reduction in Kyrgyz, finding that non-initial vowels are more centralized than their initial-syllable counterparts. The potential sources for this reduction, including initial strengthening, supralaryngeal declination, predictability, and undershoot are discussed. The proposed predictability-based analysis provides an analysis of reduction based on phonological knowledge and representations.


Introduction
It is well known that the phonetic realization of a phonological element typically varies according to a range of factors, including position, speech rate, and speech style. Among these, research on positional variation has brought a number of key insights linking phonological and prosodic categories with phonetic variation. Interestingly, despite the range and depth of work on the topic, extant research on positional variation has not considered the potential interaction between phonetic forces and phonological harmony.
In phonological harmony, e.g., vowel or consonant harmony, the realization of some element is determined by some other element elsewhere in the word. Among harmony patterns, the most widely studied example is vowel harmony, which is a phonological restriction on vowel co-occurrence (see Rose & Walker, 2011 for an overview). In languages with vowel harmony, only certain classes of vowels may co-occur within a given domain, often the word. Most often, this is manifested in morphophonological alternations. As an example, the backness of suffixes in Turkish is generally determined by the backness of the initial-syllable vowel. More specifically, consider the realization of the plural suffix in [bel-ler] 'waist-pl' and [bɑl-lɑr] 'honey-pl.' If the preceding vowel is [+back], [-lɑr] is the appropriate allomorph of the plural suffix, but if the preceding vowel is [-back], [-ler] is the appropriate allomorph. Phonetic studies of vowel harmony have typically examined the acoustic or articulatory properties of vowels in a fixed position to determine what similarities exist within a particular harmonic class, and conversely, what differences distinguish each class of vowels (e.g., Fulop, Kari, & Ladefoged, 1998;Guion, Post, & Payne, 2004;Svantesson, 1985;Washington, 2016). These studies have compared the properties of vowels across classes, but not across positions. While a particular set of acoustic or articulatory features may characterize one class of vowels in some position, it does not necessarily follow that those same properties will be identical across all positions for the relevant class of vowels. For instance, after coarticulation from flanking consonants is taken into account, how similar or distinct are the [ɑ] vowels in a Turkish word like [bɑl-lɑr] 'honey-pl'?
In a language with vowel harmony, the mandate that some set of elements be identical along a given phonological dimension may conflict with phonetic trends that favor reduction in certain positions. How do these two forces interact? This interaction between phonologically-dictated sameness and phonetically-determined variation is the central topic of interest in this paper. Specifically, this paper examines positional variation in vowel harmony in Kyrgyz, an understudied Turkic language of Central Asia. The realization of words of up to four syllables in length is examined to determine the extent and nature of acoustic variation among the Kyrgyz vowels. In addition, the paper examines the potential sources of this variation, and in turn, the relationship between phonology and phonetics in the language.   than root-internal vowels. Elsewhere in the family, McCollum (2015McCollum ( , 2019bMcCollum ( , 2019c argues that positional reduction is asymmetric in Kazakh and Uyghur. McCollum reports that back vowels undergo non-initial fronting without any comparable effects among the front vowels. However, Lanfranca (2012) reports that root and affix vowels in Turkish do not generally differ in F1 or F2.

Predictions
The goals of the paper are two-fold: to determine if and to what extent non-initial vowels differ acoustically from initial-syllable vowels, and how to analyze those potential differences. Throughout the paper, reduction is defined as centralization. Thus, in terms of raw formant values, this predicts high vowels will show increased F1 in reduced contexts, and low vowels will show decreased F1 in these contexts. Further, since centralization is equated with reduction throughout the paper, F2 of back vowels should increase in reduced contexts, while F2 of front vowels should decrease when reduced. If Kyrgyz vowels are subject to positional reduction, some general possibilities are schematized in Table 4. First, if alternating vowels are more centralized than nonalternating vowels, and vowels in each successive syllable are progressively more centralized, then this suggests an incremental pattern of reduction consistent with supralaryngeal declination. In other words, if centralization in syllable x is greater than centralization in syllable x-1 for all positions, this would support supralaryngeal declination. Second, if non-initial vowels are more centralized than initial vowels, but no differences in centralization are found across successive non-initial syllables, e.g., syllable two versus syllable three, this would support a binary distinction between non-initial (target) and initial (trigger) positions. Such a binary distinction is compatible with either an initial strengthening or predictability-based account.
To be clear, despite predicting similar patterns, the mechanisms that induce reduction (or alternatively, enhancement) under these two possible analyses are distinct. The initial strengthening account relies on prosodic boundaries while the predictability account relies on harmony to drive reduction. As a simple demonstration of the relationship between harmony and predictability, a corpus of online Kyrgyz was examined (kir_ newscrawl_2016_1M from wortschatz.uni-leipzig.de; Goldhahn, Eckart, & Quasthoff, 2012;see Goldsmith & Riggle, 2012 for more on information theory and vowel harmony). To evaluate intersyllabic vowel sequences only, all consonants, punctuation, markup, and other text formatting were removed. The remaining corpus included vowels, which were supplemented by word-beginning and word-end boundary symbols. The corpus contained 13,070,669 words and 39,275,334 vowels, with an average of 3.00 vowels per word. Third and finally, if centralization is best accounted for in terms of vowel duration rather than position, then variation might best be described in terms of vowel undershoot. Under this account, differences in vowel acoustics should be derivable from duration only, and not position.

Participants
Thirteen (11 females, mean age: 35.0 years, range: 18-57 years) Kyrgyz speakers living in Bishkek, Kyrgyzstan participated in the study. All participants reported native fluency in the target language. Most participants also reported fluency in Russian, and some speakers reported additional fluency in Uzbek. Speaker participation and informed consent were obtained in accordance with University of California San Diego Linguistic Fieldwork IRB protocol #141520.

Stimuli
During the recording phase, participants were presented a controlled set of target words containing all short vowel contrasts in the language. Target words were derived from monosyllabic and disyllabic roots, exemplifying all eight short vowels in the language. Two monosyllabic stimuli for each category were elicited. One ended in a lateral, e.g., /bɑl/ 'honey,' and the other ended in a sibilant, e.g., /bɑʃ/ 'head.' Among the disyllabic roots, root-internal vowels were identical, e.g., /moldo/ 'mullah' and /ilim/ 'science.' The attested lexical root /qurum/ 'soot' was prompted, but no speakers accepted this word. As a result, no disyllabic roots with root-internal /u/ were recorded. The full list of stimuli is presented in the Appendix.
Each lexical item was prompted in the nominative, locative, ablative, and accusative cases for both singular and plural numbers. Case-marking suffixes were also elicited in conjunction with the first-and third-person singular possessive suffixes. Example forms for the lexical root /bɑl/ 'honey' are shown in Table 5. With monosyllabic roots, target words were up to three syllables in length, and with disyllabic roots, target words were up to four syllables in length.

Procedure
Each session was divided into training and recording phases. During the training phase, participants were taught to identify a small set of lexical roots with pictorial prompts. After learning the set of roots, participants learned a set of pictorial-grammatical correspondences involving number, case, and possession. As an example, during training participants learned to associate two downward-facing arrows as a cue for the locative case and two outward-facing arrows as a cue for the ablative case. Thus, when presented with a picture of honey without any additional arrows (or other cues), the target word was /bɑl/ 'honey.' When the picture of honey was accompanied by two downward-facing arrows, the target word was /bɑl-dɑ/ 'honey-loc' and when the picture indicating honey was accompanied by two outward-facing arrows, the cue was /bɑl-dɑn/ 'honey-abl.' The training phase typically lasted around five minutes.
After participants completed training, the recording phase began. Throughout each session, participants were presented images on a laptop computer screen that showed both a picture representing a lexical item and a pictorial prompt indicating number, case, and possession. When speakers were unable to guess the target word from the prompt, they were given either the equivalent Russian word or a paraphrase in the target language. Sessions were conducted in a quiet room. Participants wore a Shure-SM10A unidirectional head-mounted microphone, and all data were recorded to a Marantz PMD 661 MKII digital recorder at a sampling rate of 44.1 kHz. Each session lasted between 45 and 90 minutes.

Segmentation and statistical analysis
All sound files were segmented in Praat (Boersma & Weenink, 2015). The beginning and end of each vowel was aligned to the onset and offset of the second formant. In cases where the second formant persisted across flanking consonants (i.e., sonorants), abrupt shifts in the amount and distribution of spectral energy were used to indicate vowel onset and offset.
After segmentation, vowel duration and the first two formants (F1 and F2) at vowel midpoint were measured. Outliers were inspected for measurement errors. In particular, a number of errors were found with /u/, where the formant tracker in Praat failed to distinguish the first two formants. In these cases, formant frequencies were hand measured at the approximate vowel midpoint.
F1 and F2 were z-score normalized (Lobanov, 1971) to facilitate more meaningful between-speaker comparisons. The data for normalization consisted of four tokens of each vowel taken from monosyllabic words. If four tokens of a given vowel were not present in monosyllabic words, then the remaining tokens were taken from the initial syllable of disyllabic words. One benefit of z-score normalization is that it provides an estimate of the acoustic center of each speaker's vowel space. This, in turn, allows for a straightforward analysis of potential positional differences in vowel quality. In the analysis, the absolute values of F1 and F2, |F1| and |F2|, are used to assess distance from the center of the acoustic vowel space, and as a consequence, the extent and degree of hypo-or hyperarticulation (see Bradlow et al., 1996 for a different metric). In addition, vowel durations were centered to across-speaker means for each phonemic vowel quality. The data were divided into two groups for analysis, words derived from monosyllabic roots and words derived from disyllabic roots. Results from the monosyllabic roots should display the general pattern of phonetic realization for each vowel in words up to three syllables in length. Vowels produced in words deriving from disyllables can further inform the analysis through the comparison of initial-and second-syllable vowels. For instance, if affix vowels are reduced (Washington, 2008), but not root vowels, second-syllable vowels should be indistinguishable from initial-syllable vowels in disyllabic roots. On the other hand, if second-syllable vowels in disyllabic roots behave like suffix vowels after monosyllabic roots, this would suggest a distinction between the initial syllable and all non-initial syllables instead of a root-affix distinction. Data collected from disyllabic roots also provides information on variation in words of up to four syllables in length.
For both monosyllables and disyllables, |F1| and |F2| were predicted based on vowel quality, position (syllable number), duration, as well as preceding and following consonant place of articulation. All two-and three-way interaction terms for vowel quality, position, and duration were also included in the model. Position, preceding consonant place, and following consonant place were all treatment coded. Using the lme4 package in R (lme4 package Bates, Machler, Bolker, & Walker, 2015;R Core Team, 2017), the model incorporated a random intercept for speaker. The significance of each predictor was assessed using the anova function in the lmerTest package (Kuznetsova, Brockhoff, & Christensen, 2017; with Satterthwaite's method for degrees of freedom estimation). Post-hoc comparisons across syllables for each contrastive vowel quality were conducted using the emmeans package (Lenth, Singmann, & Love, 2018). Since pairwise comparisons are the main focus of the analysis, the anova function was used to make initial statistical reporting simpler, and because the F-test more closely aligns with the aims of the paper than the output of the summary function in lme4. Significance is reported in the main body of the text throughout, but ANOVA tables for each analysis are included in the Appendix.

Monosyllables
Preliminarily, observe the mean F1 and F2 vowel qualities by syllable in Figure 1 (n = 5,395 vowels). In terms of F1, most non-low vowels are characterized by increasing F1 in non-initial syllables while /ɑ/ and /u/ show no obvious differences in F1 by position. As for F2, note that vowels tend to exhibit less peripheral F2 in second and third syllables. For instance, F2 of front vowels /i/ and /e/ diminishes in second and third syllables. However, F2 of back vowels /u/ and /o/ increases in non-initial syllables, suggesting contraction of the F2 dimension of the vowel space in non-initial syllables.
In addition to vowel quality differences across positions, duration also differs by position. In Figure 2, observe that vowels tend to be longer in second and third syllables. This may be due to stress; descriptive work reports that primary stress falls on the final syllable (Yunusaliev, 1966;Kirchner, 1998;Johanson, 1998). Interestingly, when Figure 1 and Figure 2 are compared, notice that more central vowel qualities are produced in syllables that tend to be longer. This generalization immediately suggests that variation in Kyrgyz is not primarily derivable from undershoot since vowels are most centralized in positions where vowel length is greatest. Furthermore, centralization of stressed (i.e., final-syllable) vowels further supports the case that stress does not prevent reduction in Kyrgyz (cf. Delattre, 1969;De Jong, 1995;Cho, 2005). Vowel quality was a significant predictor of |F1| [F(7, 5332.7) = 1610.2, p < .001], which is not surprising, since the language exhibits a phonological height distinction. More importantly, both syllable number and duration exerted a significant effect on |F1| [Syllable: F(2, 5334.2) = 33.51, p < .001; Duration: F(1, 5111.0) = 4.40, p = .04]. Let us first examine more closely the effect of position on |F1|. In Figure 3, |F1| is plotted by position for each vowel. Reduced vowels should exhibit diminished |F1|, which is consistent with data for /i y ɯ/. However, the non-high vowels and /u/ do not show the same trend. The mid vowels /e ø o/ exhibit increasing |F1| by-syllable, while /u/ and /ɑ/ vary little in terms of |F1|. These vowel-specific differences are reflected in the model by the significant interaction between vowel quality and syllable number [F(14, 5331.2) = 34.56, p < .001].  Pairwise comparisons of each vowel quality in syllables one, two, and three were then conducted. Results from these are shown in Table 6 with Bonferroni-corrected alpha criterion (α = .05/24 = .002). The data in Table 6 allow for a detailed comparison of the predictions of both incremental and binary analyses of reduction. Reduced |F1| is indicated by a positive estimate and t-value. If reduction is incremental, all pairwise comparisons are expected to be significant and in the right direction. If reduction is binary (or alternatively, if initial syllables are strengthened), no significant differences are expected between second-and third-syllable means.
Of the 24 comparisons, 13 were significant; 10 were significant and in the right direction. Additionally, no vowels showed significant pairwise differences among all three comparisons. Thus, these data for |F1| do not lend immediate support for incremental reduction. Instead, the four high vowels /i y ɯ u/ exhibited significant positive differences between |F1| in initial and non-initial syllables, but no differences between secondand third-syllable means. Among the non-high vowels /ɑ o e ø/, /ɑ/ and /ø/ exhibited significant positive differences between first-and second-syllable |F1| while all other comparisons were either insignificant or in the wrong direction. Observe also that only /ø/ displayed a significant difference between second-and third-syllable means, albeit in the wrong direction; all other second-and third-syllable differences failed to reach significance.
In Figure 4, we can see the effect of duration for |F1|. Generally, most vowels in most positions do not show a significant connection between increased duration and increased |F1|. Observe, however, that a positive correlation between duration and |F1| is more common among the non-high vowels. In fact, the low vowel is the only vowel that exhibits a substantial positive correlation between duration and |F1| across all positions. These patterns are consistent with the significant interaction between vowel quality and duration [F(7, 5333.9) = 6.94, p < .001]. Further, the interaction between syllable number and duration indicates that duration-related effects are modulated by position in the word [Syllable:Duration: F(2, 5332.0) = 7.47, p < .001]. The significance of the three-way interaction between vowel, syllable number, and duration further demonstrates that variation for |F1| is a complex relation between individual vowels, position, and duration  Thus, there is some evidence in favor of a binary rather than incremental conception of reduction in Kyrgyz. To further address this question, two models were constructed to predict |F1| that were minimally different from the model above. In the first, position was treated as a continuous predictor, and in the second, position was treated binarily, distinguishing initial from non-initial syllables. Differences in Akaike Information Criterion (AIC; Akaike, 1974) were used to compare the two models. AIC is a metric for model selection based on Kullback-Leibler divergence, and in terms of AIC, lower values indicate better model fit. Following Burnham and Anderson (2004, p. 271), it is assumed that differences in AIC (ΔAIC) less than two indicate a non-significant difference, differences between four and seven indicate a moderately significant difference, and models with ΔAIC greater than 10 represent a highly significant difference in model fit. When the two models were compared, the model with binary position provided better fit than the model with continuous position [ΔAIC = 98.1]. This is taken as evidence that reduction of |F1| is binary rather than incremental.
The positional lowering of mid vowels in Figure 1 is an issue worthy of some discussion. If positional variation is related to centralization, then lowering is expected for the high vowels, but somewhat unexpected for the mid vowels, as F1 of /e ø o/ in Figure 1 shifts toward a value notably higher than 0, around 0.7z. Is this centralization, even though these vowels do not shift toward 0z? Recall that the zero value derived via z-score normalization approximates the center of each speaker's acoustic vowel space. It is conceivable that the acoustic center of the vowel space derived during normalization is not identical to the acoustic quality that reduced vowels shift toward. There are two points worth noting here. First, z-score normalization predicts a central acoustic value based crucially on the number and quality of contrastive vowels in the language. For instance, a language with an inventory of /ɪ ɛ a ɔ ʊ/ would likely have a lower normalized center than a language with /i e a o u/. Therefore, the normalization process can only provide a rough estimate of what traits a centralized vowel might have in a given language. Second, vowel reduction patterns in a number of languages show trends toward distinct central vowel qualities (e.g., Delattre, 1969;Gendrot & Adda-Decker, 2006;Mooshammer & Geng, 2008;Pearce, 2008). If the exact quality of a centralized vowel may differ across languages, then it is possible to interpret the data in Figure 1 as evidence that reduction in Kyrgyz yields a more open central vowel that is lower than the predicted center of the vowel space.  Figure 5.
Observe that, in general, non-initial vowels show smaller |F2| than initial-syllable vowels.
The only obvious exception to this is /ɯ/, which, as shown in Figure 1, is realized with higher |F2| in final syllables. Differences in positional variation across vowels in Figure 5 are consistent with the significant interaction between vowel and syllable number [F(14, 5331.2) = 30.35, p < .001]. Since |F2| differs across syllables, the question is whether positional variation is incremental or binary. First, the pairwise comparisons in Table 7 support a binary  distinction between initial and non-initial vowels. Recall from above that a binary distinction predicts differences between syllables one and two, and syllables one and three, but no difference between syllables two and three. In contrast, an incremental model of reduction predicts that all pairwise comparisons should be significant. Of the eight syllable one versus syllable two comparisons, six were significant, with decreased |F2| in second-syllables. Similarly, six of eight syllable one versus syllable three comparisons were also significant and in the right direction. In contrast, only one of eight syllable two versus syllable three comparisons was significant. This supports the conclusion that initial-syllable vowels are realized with more peripheral |F2|, while non-initial vowels, regardless of syllable number, are realized with similar |F2|.
In addition to the effects of vowel quality and syllable number, duration exerted a significant effect on |F2| [Duration: F(1) = 101.63, p < .001]. The relationship between duration and |F2| is manifest in Figure 6. Note in Figure 6 that all vowels except /ø y/ generally show a positive correlation between duration and |F2|. As one might expect based on the vowel-specific differences in Figure 6 When an alternate model with position encoded binarily (initial versus non-initial) was compared with a model encoding position as a continuous variable, the binary model fit the data significantly better than the continuous model [ΔAIC = 108.0]. This is consistent with a binary rather than incremental distinction in position. This interpretation is further supported by the data in Figure 1 and Figure 5. In Figure 1 and Figure 5, the F2/|F2| differences between a given vowel phoneme in second and third syllables are typically very small, in contrast to the larger differences between initial-and second-syllable variants.

Summary
Results from Section 4.1.1-2 suggest that |F1| and |F2| vary according to position and duration. Further, results point toward a binary distinction between more peripheral initial-syllable vowels and reduced non-initial vowels. Also, the fact that increased duration variably correlates with increased |F1| and |F2| supports undershoot as a lowlevel factor in Kyrgyz, in addition to a more pervasive effect of position.

Disyllabic roots
Results from Section 4.1 predict that |F1| and |F2| of suffix vowels should be distinct from initial-syllable vowels in words derived from disyllabic roots. However, these words also allow for the comparison of root-internal vowels. If position is truly binary, this predicts that second-syllable vowels should be realized like suffix vowels, and not like initialsyllable vowels. If the effect is morphologically conditioned, then second-syllable vowels should be realized more like first-syllable vowels, since they are both root-internal. General results from this set of words are shown in Figure 7 (n = 3,979 vowels). Note that as above, F1 tends to increase in non-initial syllables. For F2, front vowels are characterized by decreasing F2 in non-initial syllables while back vowels are characterized by increasing F2 in non-initial syllables. For most vowels, observe that second-syllable means are more similar to third-and fourth-syllable means than first-syllable means. The only exception to this generalization is /e/. Recall also that no disyllabic roots with /u/ were produced, so there are no first-or second-syllable tokens of /u/ in Figure 7. Tentatively, the plot below supports the conclusion that surface vowel realization is not principally conditioned by morphology.  Figure 8 presents duration results from the disyllabic root dataset; vowels are longer in second, third, and fourth syllables. The same generalization made concerning the monosyllabic roots holds when the disyllabic root data is considered-vowels are reduced in non-initial syllables even though they are longer. This further lends support to the conclusion that vowel reduction is not primarily due to undershoot in the language.
By-syllable comparisons for each vowel quality are shown in Table 8. Since words were one syllable longer in the disyllabic root dataset, an additional three pairwise comparisons were possible for each vowel quality (modulo the absence of root-internal /u/; Bonferroniadjusted α = .05/43 = .001). Pairwise comparisons suggest the superiority of a binary analysis of reduction. Fourteen of twenty-one comparisons involving the initial syllable (i.e., the left half of the table) were significant and in the right direction, while only one comparison not involving the initial syllable (i.e., the right half of the table) was significant and in the right direction.
Duration was a significant predictor of |F1| [  To test whether positional variation for |F1| is best analyzed as incremental or binary among these words, two models were compared. These models were identical to the model above, except that in one, position was encoded as a continuous predictor while in the other, position was encoded binarily, as a distinction between initial and non-initial syllables. When variation for |F1| is compared in these two models, the model with binary position provided significantly better model fit [ΔAIC = 88.2].
Observe that for all vowels except /ɯ/ non-initial positions exhibited smaller |F2| than their initial-syllable counterparts. Despite this general trend, there is notable variation between vowels. For instance, observe that second-syllable /e/ is more similar to initialsyllable /e/ while second-syllable /o/ is more similar to third-and fourth-syllable /o/. These vowel-specific effects are reflected by the significant interaction between vowel and syllable number [F(19, 3902.8) = 17.75, p < .001].
Recall from Section 4.1.2 that positional variation for |F2| was best analyzed as a binary distinction between initial and non-initial syllables rather than an incremental difference between each syllable. To test this for the data from disyllabic roots, pairwise comparisons in Table 9 support the superiority of equivalent reduction of all non-initial syllables. This binary account of positional variation predicts all comparisons involving the initial syllable (left side of the table) will be significant; all 21 are significant and in the right  Table 9. Moreover, this analysis predicts that all comparisons that don't involve the initial syllable should be non-significant. Of the 22 comparisons on the right side of the table, only five are significant, and of these, only one is in the right direction.
Furthermore, I tested the binary and incremental analyses of reduction via model comparison. Again, the model that treats position as binary provided significantly better model fit than the model with position encoded continuously [ΔAIC = 255.3]. When the data in Figure 10 is considered, this result is unsurprising. In Figure 10, there is a clear difference between mean |F2| of initial-and non-initial positions, but there is no clear difference between different non-initial means for each vowel quality.

Summary
Data from words deriving from disyllabic roots confirm the observations from Section 4.1, indicating that |F1| is reduced in non-initial syllables. As above, this positional shift produces a more open vowel quality than the zero value derived from normalization. In addition, results from Section 4.2 have further demonstrated a significant reduction of |F2| in non-initial syllables. In addition to positional effects, duration exerts a variable effect on vowel realization-shorter vowels are produced with less peripheral vowel qualities. Moreover, the specific pattern of reduction reported in Sections 4.1 and 4.2 supports a binary distinction between initial syllables, which are immune to reduction, and non-initial syllables, which are targets for reduction. Treating all non-initial vowels as equally reduced (or alternatively, initial-vowels as strengthened), is supported by both pairwise and model comparisons.

General discussion
The acoustic properties of Kyrgyz vowels demonstrate significant positional variation. Two different patterns emerged in the data. For F1, non-low vowels exhibit higher F1 in non-initial positions. I have argued that this is centralization, with both high and mid vowels lowering toward a more open central vowel quality. However, this effect appears to have little effect on the low vowel. While F1 and |F1| of /ɑ/ does vary some, the evidence in Figure 4 is more consistent with low vowel raising as undershoot, since the correlation between duration and |F1| was most substantial for /ɑ/. At this point, it is worthwhile to consider the role of stress. One might wonder if F1 increases are due to final stress in the language. Under this interpretation, stress induces vowel lowering, rather than centralization. Increased F1 typically correlates with increased intensity, a common manifestation of stress (Ladefoged, 2003). Crucially, lowering also occurs in non-final positions, as well, as seen in Figure 3 and Figure 9. The fact that lowering appears to occur in all non-initial syllables suggests that this is not stressinduced. More generally, the effect of stress on vowel F1 and F2 appears to be minimal. In some languages, stressed positions host a larger range of contrasts (e.g., in Italian, /ɛ/ and /ɔ/ are found in stressed syllables only). No such patterns are found in Kyrgyz, although this study has not investigated other potential acoustic correlates of stress, including intensity, f0 and its relation to intonation, or phonation differences. The fact that words were elicited in isolation precludes some of these, particularly the relationship between stress and intonation. Further work will likely have more specifics to contribute here; at present, I claim only that stress has no obvious effect on the first two formants.
The relative density of the vowel space, in particular the existence of several acoustically central vowels, /y ø ɯ/, may influence the extent of reduction in Kyrgyz. A densely packed central region of the vowel space could be dispreferred for functional reasons. At the same time, harmony may influence the extent of reduction, preventing more substantial reduction in non-initial syllables. The relationship between harmony and reduction is thus potentially unstable, as noted in Binnick (1991). If vowel harmony renders certain vowels more predictable, in turn these vowels may be more reduced. However, if vowel reduction is large enough, the effect of harmony can be obscured, the very point made in Pearce (2008Pearce ( , 2012, and even lost over time. Binnick (1991) argues that the result of harmony, i.e., predictability, sows the seeds for the eventual decay of the pattern. In many dialects of the related language Uzbek, vowel harmony has almost been entirely lost, and interestingly, non-initial vowels are generally limited to [ə] and [ɑ] (McCollum 2019b, ch. 4). Data from more languages would be necessary to really evaluate the connection between harmony and reduction, but the contrast between Kyrgyz and Uzbek is suggestive-harmony in Kyrgyz is robust, and reduction is smaller; harmony is generally absent in contemporary Uzbek, and reduction is larger.

The nature of reduction in Kyrgyz
The results above support a binary distinction between initial and non-initial vowels. Patterns of F1 and F2 variation clearly support a two-way distinction in Kyrgyz. F1 and F2 reduction, when taken together, also suggest that variation is not due to incremental reduction. As one further comparison, Johnson and Martin (2001) identify reduction in the size of the vowel space in Creek is reduced from initial to final positions by comparing the size of vowel polygons across positions. Thus, reduction is evident when the area of the vowel polygon in some position is smaller than in a known non-reduced position. If supralaryngeal declination results in a gradual diminution of the vowel space, then the polygons of the vowel space in Kyrgyz should contract in each successive syllable. Vowel polygons for all syllables are shown in Figure 11, and their areas are compared in Table 10. Since no root-internal tokens of /u/ were produced in the disyllabic root data, all /u/ were excluded from the rightmost polygons below. In Figure 11, the size of the vowel space stays relatively constant in all non-initial syllables, suggesting that the global pattern of variation does not result from supralaryngeal declination. In contrast, there is a notable difference between the initial-syllable polygon and the non-initial polygons. On average, the vowel space contracts by 21% in non-initial syllables of words derived from monosyllabic roots, and by 24% in non-initial syllables of words derived from disyllabic roots.
If the reduction of vowel quality distinctions is binary rather than incremental, this supports either the initial strengthening or the predictability-based accounts. One point to be made here is that although variation has been discussed in terms of reduction, it is eminently plausible that the binary phonetic distinction between initial and non-initial syllables is more appropriately construed as enhanced versus non-enhanced positions. Although the distinction between initial and non-initial is generally consistent with an initial strengthening account, there are reasons to suspect other factors are at work. First, extant research has found relatively small effects of initial strengthening on initial-syllable vowels, especially when a consonant is in domain-initial position (Fougeron & Keating, 1997;Cho, 2005;Cho & Keating, 2009). Since most words collected contained an initial consonant, the effect of initial strengthening on initial-syllable vowels is likely small. In contrast, the acoustic differences reported above are quite large. In order to more definitively assess whether variation is due to initial strengthening, it would be necessary to examine the realization of domain-initial consonants. Since initial strengthening is often localized to the initial segment in a prosodic domain, it should trigger hyperarticulation of domain-initial consonants, as well. (right) roots. Due to the lack of root-internal /u/ in the disyllabic root data, all /u/ is excluded from the right-hand plot. The acoustically central /ø ɯ/ are removed from both plots. A reviewer points out that some Bantu languages exert relatively large strengthening effects to both stem-initial consonants and immediately following vowels (Idiatov & Van de Velde, 2016). In other languages, stem-or word-initial syllables exhibit distributional asymmetries that suggest the phonologization of a pre-existing strengthening effect (Lionnet, 2017;Lionnet & Hyman, 2018). For instance, in Esimbi, the entire range of contrastive vowel qualities are licensed on the word-initial syllable; only high vowels occur elsewhere (Hyman, 1988; see also Walker, 2011). Also, despite the existence of strengthening effects in both the phonetics and phonology of these languages, in most if not all cases, these effects are linked to word-initial accent (Lionnet & Hyman, 2018, pp. 651-655). Since stress is final in Kyrgyz, such an analysis is less tenable.
While the size of the effects reported here is potentially inconsistent with some previous results on domain-initial strengthening, the Kyrgyz results fit quite well with a predictability-based account. Since the predictability of a given element, e.g., a word or phoneme, may partially condition its phonetic realization, the presence of harmony in the language may enable reduction of non-initial vowels. For instance, if the initialsyllable vowel is /ɑ/, then the next vowel is most likely to be /ɑ/ or /ɯ/, since the front vowels are generally banned after back vowels, and the round vowels are generally banned after unrounded vowels. In general, only two vowels readily occur after a given vowel, x-that same vowel x, or the vowel that differs from x only in its [high] feature. In a language with eight vowel qualities, like Kyrgyz, one might expect a given vowel to occur roughly 12% of the time, but once the quality of vowel x in syllable n is known, the likelihood of x in syllable n+1 increases dramatically. To illustrate this, the unigram probability of /ø/ in the 2016 one-million sentence corpus of Kyrgyz newspapers in the Leipzig Corpora Collection is 0.05 (Goldhahn et al., 2012). However, after /ø/, the probability of another /ø/ increases almost tenfold, to 0.50. This general trend holds not just for identical vowels, but for vowels that agree in backness and rounding with the preceding vowel. The unigram probability of /i/ is 0.11, but the probability of /i/ given a previous /e/ is 0.34, a threefold increase. This increase in predictability emerging from harmony renders vowels in non-initial positions susceptible to reduction. If cast in terms of Lindblom's (1990) H&H theory, the listener, based on their history with the language, expects sequences of harmonic vowels. The speaker is able to exploit this expectation and reduce effort without compromising the efficient communication of the message.

The relationship between phonology and phonetics
Returning to Pearce (2008Pearce ( , 2012 and the claim that harmony produces triggers and targets that are phonetically indistinguishable, it is clear that F2 varies significantly by position in Kyrgyz. Underlying Pearce's (2008Pearce's ( , 2012 claim that harmony blocks reduction is an assumption that the substitution of phonological symbols exerts a limiting force on phonetic variation. Under this view though, the results reported in the previous section are unexpected.
It is not controversial that phonological knowledge plays a crucial role in phonetic realization. Backness and rounding harmony in Kyrgyz render backness and rounding predictable in non-initial syllables. If predictability motivates reduction, this offfers a relatively direct link between phonetic variation and phonological representations, in particular, theories of underspecification (e.g., Archangeli, 1988;Steriade, 1995). Typically, underspecified (non-initial) vowels are representationally distinguished in the phonology. As noted in previous work, this phonological distinction has potential phonetic ramifications, too (Zsiga, 1997;Lanfranca, 2012; see also Washington, 2006Washington, , 2008. Since targets of harmony are more predictable, and triggers are less predictable, this predicts that, all else being equal, triggers should not be phonetically reduced relative to targets.
In other words, acoustic distinctions in the vowel space should be maximally preserved among triggers, but potentially reduced among targets. To state it differently, regardless of the direction of harmony, all else being equal, targets of harmony should never exploit a larger phonetic vowel space than triggers of harmony.
One related prediction of the proposed account is that disharmonic vowels should be realized with more peripheral vowel qualities than alternating vowels because they are not predictable based on context. This prediction critically differs from Zsiga's (1997) prediction. Under Zsiga's analysis, all else being equal, both alternating and non-alternating vowels should be indistinguishable. The predictability-based prediction is borne out in the language most closely related to Kyrgyz, Kazakh. In Kazakh, the comitative suffix has an invariantly front vowel, while the question enclitic in colloquial speech has an invariantly back vowel. McCollum (2019a) reports that F2 is more peripheral for each of these morphemes than that of alternating vowels in the language. This result from Kazakh further supports a predictability-based analysis of vowel reduction in Turkic.

Conclusion
This paper has investigated positional variation of Kyrgyz vowels, finding centralization of vowels in non-initial syllables. The nature and extent of this phonetic reduction suggests that prosodic effects are not the primary source of explanation. Rather, these results support the claim that reduction in languages with vowel harmony may depend on predictability. This proposal links reduction at the featural level with other forms of predictability-based phonetic reduction. The findings here are suggestive, offering harmony patterns as a new testing ground for examining theories of positional variation, as well as the relationship between phonology and phonetics.

Additional file
The additional file for this article can be found as follows: • Appendix. A PDF file containing the full list of stimuli. DOI: https://doi.org/10.5334/ labphon.247.s1