Phonetic neutralization in Palestinian Arabic vowel shortening, with implications for lexical organization

This study acoustically compares lexically short vowels in Palestinian Arabic to vowels that are underlyingly long, but have undergone closed syllable shortening, a phonological process affecting certain CV:CC sequences (as in /faːq-ʃ/ → faqʃ ‘woke-negative’; /ӡaːb-l-ak/ → ӡablak ‘brought to you’). In a study of word pairs produced by 74 speakers, the two vowel types were found to be indistinguishable in duration. Speakers differ as to the contexts in which they apply shortening: some shorten before the negative suffix /-ʃ/, but not the dative suffix /-l/, likely due to paradigm leveling. The results are compared to earlier studies finding incomplete neutralization in Arabic vowel epenthesis, to identify factors that affect completeness of neutralization. It appears that orthography is not an important factor, but that morphologized or lexicalized processes produce more complete neutralization. It is proposed that allomorphs produced by more fossilized processes are more weakly linked in the mental lexicon, and that incomplete neutralization of vowel quantity in particular would require linkage through the abstract CV word pattern.


Complete and incomplete neutralization
Incomplete neutralization is a phenomenon in which a phonological process apparently neutralizes some underlying contrast, yet subtle phonetic differences remain. For example, Dutch has a process of final devoicing, so that verwijd 'widen' and verwijt 'reproach' are both pronounced [vɛrʋɛit]. The underlying distinction between stem-final /d̴t/ is generally described as being realized only when the sounds are not final in the word, as in the related infinitive forms, [vɛrʋɛidən] and [vɛrʋɛitən] (Ernestus & Baayen 2006). Yet a series of studies has found that supposedly homophonous pairs like this may have slightly different phonetic realizations (see Almihmadi 2011: 296-297 for an extensive list; Yu 2011 for a review). Devoiced final obstruents often retain fine phonetic cues typically associated with voicing, such as shorter closures, longer preceding vowels, and shorter release bursts. In perception studies, speakers do better than chance at distinguishing minimal pairs, although accuracy is low enough (for example, 60% in Port & O'Dell 1985) to indicate that the phonetic cues are indeed subtle.
Not all phonologically neutralizing processes leave phonetic traces. Some studies do find complete neutralization of contrasts; for example, Kim & Jongman (1996) argue that manner features are completely neutralized in Korean codas. Surveying the literature, Almihmadi (2011) finds 32 studies reporting incomplete neutralization, 13 reporting complete neutralization, and 17 reporting neutralization that is complete only under certain linguistic or experimental conditions. These numbers should not be taken to indicate the actual prevalence of complete vs. incomplete neutralization, however. Some apparent cases of complete neutralization may be due to low power in experiment design; conversely, genuine cases of complete neutralization may go unreported due to bias against publishing null results. Also, the current body of literature is disproportionately focused on a small number of processes in primarily Indo-European languages. A better typology of both complete and incomplete neutralization is needed in order to understand why each occurs.
The cause of incomplete neutralization is much debated, but at least two ideas are prominent. The first is that the phenomenon is at least partly an artefact of experimental conditions, such as the use of written stimuli and minimal pairs. Both of these tend to focus speakers' attention on underlying distinctions (Fourakis & Iverson 1984;Ernestus & Baayen 2006;Warner et al. 2006). However, this is not likely to be the sole cause. Incomplete neutralization has been found under diverse experimental methods, and in cases where there is no orthographic indication of the underlying contrast, such as Catalan final devoicing (Dinnsen & Charles-Luce 1984).
The second idea is that incomplete neutralization is an effect of morphological paradigms. For example, Ernestus & Baayen (2006: 44-46) sketch how incomplete neutralization could be implemented in a localist spreading activation model of speech production. They propose that Dutch words, including inflected forms, are mentally stored as complete auditory and visual form representations as shown in (1) (rather than being actively derived from abstract underlying forms). When speakers activate one member of a paradigm, such as [vɛrʋɛit] 'widen', they also activate inflectionally related forms such as [vɛrʋɛidən]. The actual production is influenced by all activated forms, so the [d] of [vɛrʋɛidən] can bring about a slight voicing on the [t] of [vɛrʋɛit]. Presumably, speakers must perceive a segment-level correspondence between the [d] and [t] in order for this influence to occur. Paradigmatic effects have been implemented in other frameworks as well (e.g., Steriade 2000;Braver 2013), but this paper will build primarily on Ernestus & Baayen's formulation.
One issue in evaluating the role of paradigms is the fact that the existing data come mostly from Indo-European (IE) languages, whose morphological systems are fairly similar to one another. For example, 60 out of 71 studies Almihmadi (2011) reviews are on IE languages. Languages from this family tend to have moderate amounts of inflection, primarily through affixation, and few allomorphs per stem. If morphological relations within the mental lexicon are crucial to incomplete neutralization, we might predict different phonetic neutralization patterns in languages whose morphology and lexical organization are non-IE-like. In Arabic, for example, a single verb base can have hundreds of inflected forms, making the paradigm far more complex than a typical IE paradigm. It seems likely that many Arabic inflected forms are not stored as wholes, but rather are actively composed during production and decomposed during perception. Arabic paradigms also involve very extensive alternations; a single verb paradigm might feature vowel epenthesis, vowel deletion, vowel shortening, vowel raising, and stress shifts. It is not obvious how forms are mentally linked within such a complex paradigm; which forms are co-activated, and which sounds are considered to be in correspondence? A series of experiments by Boudelaa and Marlsen-Wilson (e.g., Boudelaa & Marslen-Wilson 2012) has found evidence that the perceptual lexicon is organized around discontinuous triconsonantal roots and CV skeletal patterns, both phonological entities that do not exist in IE languages.
It is also an open question whether incomplete neutralization is equally common for different types of alternations, and what it would look like for each. By far the largest number of studies concern just one process, final obstruent devoicing. I am aware of only a few phonetic studies of vowel quantity neutralization. In Egyptian Arabic, Broselow et al (1995) present durational data from a small number of tokens suggesting that the shortened vowels have a duration roughly comparable to that of lexically short vowels, based on the pair [kitabna] /kitaːb-na/ 'our book' vs.
[ʕinabna] /ʕinab-na/ 'our grapes'. In Dutch, Lahiri et al (1987) found that long vowels derived by an open-syllable lengthening rule are identical in duration to underlyingly long vowels. For example, [baːlen] 'bags' has the same vowel durations as [daːlen] 'valleys', although their singular forms, [baːl] and [dal], differ in vowel length. In Japanese, however, Braver & Kawahara (2015) found that long vowels derived through a rule lengthening monomoraic Prosodic Words were 32 ms shorter than underlyingly long vowels, despite behaving as long in the phonology. Finally, a pilot study by Rudin (1980) found that Turkish long vowels derived from two short vowels, through a rule of intervocalic /g/-deletion, were 31 ms. longer than underlyingly long vowels (292 vs 261 ms). These differences are much larger than those typically found when vowel duration is involved in near-neutralization of laryngeal contrasts. For example, Warner, Jongman, et al. (2004) found vowels to be just 3.5 ms longer before devoiced /d/ than before /t/. However, it is impossible to know from only two studies whether differences of this magnitude are the norm in near-neutralization of quantity distinctions.

Phonetic neutralization in Levantine Arabic
The study reported here is part of a wider line of research examining the phonetic realization of (supposedly) phonologically neutralizing processes in colloquial Levantine Arabic, a dialect group that includes varieties spoken in Lebanon, Syria, Jordan, Israel and Palestine. The morphology of this language displays numerous alternations due to processes of vowel epenthesis, vowel shortening, degemination, stress shifts, vowel syncope, and vowel raising, which often interact in opaque ways. These opaque interactions have been a central source of evidence for the phonological cycle (Brame 1974), as well as a challenge that has driven the development of new forms of Optimality Theory (see, for example, McCarthy 2007). However, there has been very little phonetic study of these alternations. In theoretical phonology, it is generally assumed that each of these processes results in full phonetic neutralization of underlying contrasts.
Our previous work (Gouskova & Hall 2009;Hall 2013) found what is arguably a case of incomplete neutralization in some speakers' production of Levantine vowel epenthesis. We compared words with vowel epenthesis, such as /libs/ [libis] 'clothing', to similar words containing lexical vowels, such as /libis/ [libis] 'he wore'. These two words are generally transcribed as being homophonous. However, for most Lebanese speakers, epenthetic vowels were significantly more centralized than lexical vowels. The speakers who differentiated the vowels fell into two groups. For one group, the vowels were categorically distinct in the vowel space, to the point that they would best be transcribed as lexical [i] and epenthetic [ə]. This is not a case of incomplete neutralization, because the phonetic cues involved are not subtle (they are of the magnitude typically seen in phonemic vowel contrasts). But for the second group of speakers, the clouds of tokens for the two vowel types overlapped very heavily in the vowel space. Their mean differences in F2 were typically on the order of 50 Hz, far less than would be expected for ordinary vowel contrasts, although statistically significant. We argue that these slightly centralized epenthetic vowels represent an incomplete neutralization of the underlying contrast between [i] and zero.
The process studied here, closed syllable shortening, differs from vowel epenthesis in several ways. First, it neutralizes a difference between short and long vowels, rather than between presence and absence of a vowel. Second, it applies at a more abstract level of representation, in the sense that its conditioning environment is relatively hard to reconstruct from surface level forms. In the terminology of Lexical Phonology (Mohanan 1982), Levantine epenthesis is a post-lexical process: it applies between as well as within words, it is to some extent optional, and it counter-feeds and counter-bleeds other processes. Closed syllable shortening, on the other hand, applies only within words, is obligatory, and is itself counter-bled by epenthesis and by cyclic resyllabification. These characteristics are described in the following section.

Closed syllable shortening
Palestinian Arabic contrasts five long vowels /iː eː aː oː uː/ and at least three short vowels /i a u/. Duration is a strong cue to vowel quantity in Levantine dialects. Allatif & Abry (2004) and Alhussein Almbark (2008) find long/short ratios around 1.7 for Syrian; data from Broselow et al (1997) show ratios ranging from 1.51 to 2.01, with shorter ratios in closed syllables than open, in Lebanese, Syrian, and Jordanian. Short vowels are also more centralized than long vowels. Allatif (2008) argues that in the closely related Syrian dialect, the formant differences are not a result of articulatory undershoot, but are actively controlled.
The distribution of long vowels is subject to complex restrictions, with both phonological and morphological conditioning. Abu-Salim (1986) and Younes (1995) analyze one set of length alternations as resulting from a process called closed syllable shortening, which applies to syllables that have the form /CV:CC/ at some point in the derivation. There are two suffixes in Palestinian that can produce such shortening, as discussed below. 1

Shortening triggered by negative /-ʃ/
One morphological environment where CV:CC syllables arise is when the negation suffix /-ʃ/ attaches to a verb stem ending in /V:C/. For example, the negative form of [ӡaːb] 'he brought' is [ӡabʃ] rather than *[ӡaːbʃ].
Forms like [ӡabʃ] are also subject to an optional but pervasive process of vowel epenthesis in final CC clusters. Epenthesis applies after shortening, producing [ˈӡabeʃ] as shown in (2).
This shortening is generally considered to be obligatory, although Elihay (2011: 35) reports occasionally hearing forms like [biˈӡiːbʃ] for 'he doesn't bring', instead of the expected [biˈӡibʃ]. This is surprising, given that CV:CC syllables are widely agreed to be banned in Levantine and other Arabic dialects. In the data collected in the current study, there were no examples of non-shortening before /-ʃ/.

Shortening triggered by dative /-l/
Shortening also applies in verb stems before the dative suffix /l/. For example, /ӡaːb-l-i/ 'he brought to me' is pronounced [ӡabli]. Abu-Salim's analysis of such forms rests crucially on the idea that phonological rules apply cyclically, as shown in (3). Shortening applies not to the form /ӡaːb-l-i/, which would contain no CV:CC syllable, but to the intermediate form /ӡaːb-l/. By contrast, shortening does not apply in words like /ӡaːb-ni/ 'he brought me', because there is no intermediate form where the [n] is a coda. (3) 'he brought to me' 'he brought me' /ӡaːb-l-i/ /ӡaːb-ni/  (2) and (3) show that shortening applies at a relatively abstract level of representation; its conditioning environment is never directly reconstructable from the surface form before /-l/, and only sometimes before /-ʃ/. The experiments reported below examine the degree of shortening before both the [-ʃ] and [-l] suffixes. Neither alternation has previously been examined phonetically, to my knowledge.

Methods
The data for the two experiments reported here were collected together, using the same methods, as part of a word list including items for other experiments not reported here. Experiment 1 examines shortening before /-ʃ/; Experiment 2 examines shortening before /-l/. Methods were approved by the Internal Review Board of California State University Long Beach .
Both experiments test only a small number of items, but a large number of speakers. This approach is informed by our previous work on the phonetics of Levantine epenthesis (Gouskova & Hall 2009;Hall 2013), which found that levels of neutralization did not differ significantly across items, but did differ significantly and qualitatively across speakers. For an initial exploration of a phenomenon within a dialect area exhibiting great microvariation, there is a practical tradeoff between the number of speakers recorded and the amount of data collected from each. It seems most useful at this stage to maximize the number of speakers and gauge the level of intradialectal variability, as a guide to future studies.

Elicitation
Words were elicited using a technique developed by Hall (2013). In a quiet room, participants viewed a self-paced Powerpoint presentation on a laptop computer. Each slide contained a written target word and four colloquial frame sentences, which were consistent across target words. Each slide also auto-played an audio recording of a colloquial "context" sentence containing the target word, spoken by a female resident of Nahif in her 20's. This context sentence was not one of the frame sentences, and was different for each target word, as it was designed to place the word in a natural context. The target word itself was removed from the recording and replaced with 400 ms of white noise. Contents of a sample slide are shown in (4). In the frame sentences, the target words were written in red type to set them apart.
Audio stimulus: Ɂamiːr ma [white noise] is ʕ s ʕ uboħ ʕala madrasto "Amir [white noise] this morning for school." This elicitation method is designed to discourage participants from using Standard Arabic, which has a very different lexicon, phonology and morphology than the colloquial dialect. Hearing colloquial audio recordings helps speakers to stay in a colloquial register. The frame sentences include distinctively colloquial words and verbal morphology, some of which were adjusted to the participant's dialect. Palestinian dialects differ in whether the first plural prefix is [men-] or [ben-], so that 'we say' could be either [men-ʔuːl] or [ben-ʔuːl]. Also, speakers from different regions prefer different words for 'now', such as [ˈʔissa], [ˈhallaʔ], and [ˈhassaʕ]. Participants were asked which form they typically used for these two items, and shown a version of the Powerpoint that used their preferred forms in the frame sentences. 84 items were elicited, of which 9 belonged to the experiments reported here: 6 to Experiment 1 as shown in (5) below, and 3 to Experiment 2 as shown in (6). Pairs of items to be compared were separated in the list, typically by about 40 items. The entire list was read twice, for a total of 8 tokens (4 frames × 2 repetitions) per item. Participants's speech was recorded in .WAV format with a Marantz PMD 660 solid state recorder and a Shure PG81 cardioid condenser microphone.

Speakers
The data analyzed here come from 74 native speakers of Palestinian Arabic, including 32 females and 42 males, with a mean age of 23. Nearly all were students at Tel Aviv University or the University of Haifa. 52 were Muslim, 13 Christian, 1 Druze, and 8 did not indicate religion (which is often an important correlate of dialect in Arabic; see Blanc 1964). All were highly proficient in Hebrew and had at least some proficiency in English. Geographically, 46 speakers were from the Galilee (including 17 from Nazareth), 12 from the Wadi 'Ara region, 6 from the Little Triangle, 3 from Jerusalem, 3 from Akko, 1 from Lod, and 1 from Haifa. 2 did not indicate origin.

Measurements
Each recording was reviewed by a native Arabic speaker, who marked disfluencies and transcribed variations in pronunciation. The author decided, based on these transcriptions, which tokens were usable. Disfluent tokens were excluded, as well as those with phonological variations that would invalidate comparison: for example, if a word intended to have two syllables was produced as one syllable, due to suppression of optional vowel epenthesis, it would be excluded.
Words were segmented in Praat (Boersma & Weenink 2009) using a semi-automated method. After research assistants manually marked word boundaries, the EasyAlign plugin for Praat (Goldman 2011) was used to segment the word into phonemes. Textgrids were checked by research assistants and segment boundaries were manually adjusted. Most CV or VC boundaries were straightforwardly identified as the point of a sharp change of intensity, plus onset or offset of formant structure. For [aʔ] boundaries, the criterion was the appearance of creaky voicing. The boundary of [aʕ] was sometimes challenging to segment, particularly as not all speakers produced [ʕ] alike. For some speakers, a decrease in intensity proved to be the most reliable marker of the transition, while for others, it was easier to identify a change in formant structure. Each speaker's recordings were segmented by a single research assistant, so that where segmentation criteria had to be adjusted for the speaker, a consistent set of segmentation criteria would be employed across pairs of words to be compared. The research assistants were unaware of the structure of the experiment, to avoid bias. A Praat script collected vowel duration measurements, as well as formant values at each vowel midpoint, using Praat's "To Formant (burg)…" function. Vowels with outlying durations (more than two standard deviations from the mean) were checked by the author, and boundaries adjusted if appropriate.

Materials
To examine the effects of shortening before the negative /-ʃ/ suffix, the first vowels of three /CaCC/ nouns were compared to those of three /Ca:C-ʃ/ verbs, as shown in (5). The latter are expected to undergo shortening. Within each noun-verb pair, the postvocalic consonant and following syllable are identical. The prevocalic consonant varies, but this is expected to have a negligible effect on vowel duration (Peterson & Lehiste 1960). In both word types, an epenthetic vowel breaks the final CC cluster, making the words bisyllabic on the surface. The consonant transcribed here as /q/ has several realizations in different subdialects of Palestinian, including [q], [ʔ], a backed [k], and [g]. These variations were combined in the analysis, as they are not expected to have any effect on the degree of neutralization.
Underlying vowel length distinction was indicated orthographically in the written part of the stimulus materials. The words in the right column of (5) were spelled with the letter for long [aː] ( ), while the underlyingly short [a]s in the left column were not written (as is normal in Arabic orthography). This orthographic confound could potentially reduce the amount of neutralization.

Exclusions
Two issues led to the exclusions of a number of tokens. First, many speakers (unexpectedly) produced the negative verbs preceded by the optional unstressed particle [ma], which is an additional marker of negation. This particle was not written in the frame sentences that speakers were instructed to read, but was included in parentheses in the title of each slide, as shown in (4). This was intended to make the colloquial negative form more recognizable out of context, but it may have caused speakers to think that they should include the particle when reading the frames. The addition of this particle increases the length of the phrase by one syllable, which could potentially decrease syllable durations throughout the phrase. Accordingly, productions with [ma] were excluded from the analysis.
Second, for some speakers it was not possible to reliably segment the sequence [aʕ], found in the pair [naʕeʃ] / [baʕeʃ]. Tokens were excluded if there was not a clear change in intensity and/or formants to mark this transition.

Results of experiment 1
For this experiment, only the analysis of durational measurements is presented. The materials are not suited to comparison of formants, due to the variation in prevocalic consonant, which can affect vowel formants.
Tokens were paired by speaker, item, and repetition, and tokens without matches were discarded. 36 speakers produced at least some usable pairs of /CVCC/ -/CV:CC/ words from the set in (5). The dataset include 742 tokens (371 pairs), with a mean of 20.6 tokens (10.3 pairs) per speaker. Table 1 presents the duration means for the three pairs of /CVCC/ vs. /CV:CC/ items. Since speakers produced unequal numbers of usable repetitions per item, durations for each item were first averaged by speaker, then the speaker means were averaged to produce item means. This method equalizes the contributions of different speakers to each item mean. For all three pairs, the durations of the two vowels are essentially identical, with tiny differences in inconsistent directions.
For the purposes of statistical analysis, results were averaged into type means by speaker, item, and frame sentence (486 type means). A 3-way ANOVA on these type means, with independent variables of (underlying) Length, Speaker, and Frame sentence, and dependent variable Duration, found highly significant main effects of Speaker and Frame sentence (p < .001), but no main effect of Length (p = 0.27), and no significant interactions.
In short, no significant durational differences were found between underlyingly short vowels versus underlyingly long vowels that have undergone shortening triggered by the /-ʃ/ suffix.

Materials
To examine shortening before the dative suffix /-l/, this experiment compared the first vowels of three words shown in (6). 2 (6) Long: /ӡaːb-ha/ ӡaːbha 'brought her' Short: /ӡabha/ ӡabha 'front, forehead' Shortened: /ӡaːb-l-ak/ ӡablak 'brought to you' 2 [ӡaːbha] was originally recorded for use in a different experiment, not reported here, on the realization of vowel length. When, as described below, it turned out that some speakers did not shorten in /ӡaːblak/, results from /ӡaːbha/ were included to provide a baseline for a non-alternating long vowel.  The consonants preceding and following the vowel are held constant, as is the vowel of the following syllable. As in Experiment 1, the underlyingly long vowels were spelled differently than underlyingly short vowels.
Tokens were matched by speaker and repetition across the three items, and tokens not part of a matching triplet were excluded. 72 speakers produced at least some matching usable triplets. There were a total of 1668 tokens, forming 556 triplets, with an average of 7.7 repetitions per item per speaker (minimum 4, maximum 8).

Excluding speakers who do not shorten /ӡaːblak/
Initial exploration of the data revealed that not all speakers apply the phonological shortening rule in /ӡaːblak/. To demonstrate this graphically, Figure 1 shows a kernel density plot of speakers' mean ratios between the durations of the first vowels in /ӡaːbha/, /ӡabha/, and /ӡaːblak/ ([ӡablak]), produced using the kernel density estimation function of R (R Development Core Team 2009). Each line represents a ratio between two forms; the x-axis shows the range of ratios produced by speakers; the height of the line estimates how many speakers produced each ratio. We can see that speakers' Figure 1: Density distribution of 72 speakers' mean durational ratios between /ӡaːbha, ӡabha, ӡaːblak/ (the last undergoes shortening for most speakers). The x-axis represents the ratio between two forms; the y-axis represents an estimate of the number of speakers producing that ratio.
ratios of /ӡaːbha/ : /ӡabha/ (top graph, solid line) display a close to normal distribution around a peak of approximately 1.7. This indicates that the speakers are fairly homogenous in the extent to which they distinguish long and short vowels. Yet the ratio of /ӡaːbha/ : /ӡaːblak/ (top graph, dotted line), shows a bimodal distribution: there is one peak in the same region as the /ӡaːbha/ : /ӡabha/ peak, but also a smaller peak close to 1. This suggests that there is a smaller population that produces the first vowels of /ӡaːbha/ and /ӡaːblak/ with roughly the same duration, rather than shortening the latter. The ratio of /ӡaːblak/ to /ӡabha/ (bottom graph) also shows a bimodal distributions: most speakers have ratios clustered around a peak at approximately 1, as expected, but a second group have ratios forming a wide peak around approximately 1.6, which is typical of surface long : short ratios. In short, it appears that for most speakers, /ӡaːblak/ is similar to /ӡabha/, but for a sizeable minority it is instead similar to /ӡaːbha/.
To check whether this is a reasonable interpretation of the distributions, models were fitted using the R package mixtools (Benaglia et al. 2009), which uses the normal mixEM procedure for fitting normal mixture densities. Table 2 gives the estimated parameters for each component, along with the actual parameters of the (unimodal) /ӡaːbha/ : /ӡabha/ ratio for comparison. The analysis confirms that within each distribution, there are two populations: a larger group (79-87% of the total) who produce /ӡaːblak/ as similar to /ӡabha/, and a smaller group who produce /ӡa:blak/ as similar to /ӡa:bha/.
Importantly, the component analysis finds that the smaller group has a /ӡaːbha/ : /ӡaːblak/ ratio centered near 1. This suggests that they do not apply the shortening rule at all, as opposed to applying a phonetically incomplete shortening.
It is clear that there are two populations: shorteners and non-shorteners. However, ratios alone cannot tell us which speakers belong to each, since there is overlap between the distributions. In order to separate out the non-shorteners, a linguist proficient in Arabic made an auditory assessment of the vowel length in /ӡaːblak/ for each of the 19 speakers whose mean /ӡaːblak/ : /ӡabha/ ratio was higher than 1.1. His judgement, based on both vowel quality and duration, was that 11 of the speakers consistently produced /ӡaːblak/ with a long vowel, and another 3 speakers produced it with a long vowel at least most of the time, although possibly inconsistently. There were no cases where he felt certain that these speakers produced a short vowel, but at least one of their faster repetitions ( typically on frame 3 or 4) was difficult to judge. The remaining 5 speakers applied shortening consistently.
Lack of shortening appears to be at least partly a regional feature. The 14 speakers who did not consistently apply shortening include all 3 of the speakers from Jerusalem, 4 of 6 speakers from the Little Triangle area, 1 of 3 speakers from Baqa Jatt which is near the Little Triangle, and only 4 of the 45 Galilee speakers included in experiment 2. Two non-shorteners did not indicate their locations, but had linguistic features typical of the Jerusalem and Little Triangle groups (the choice of [hallaɁ] for 'now', and producing /q/ as [k], respectively).  Table 2: Summary of parameters for models of bimodal distributions of /ӡaːbha/ : /ӡaːblak/ and /ӡaːblak/ : /ӡabha/ ratios, compared to parameters for unimodal distribution of /ӡaːbha/ : /ӡabha/ ratios. μ = mean; σ = standard deviation; λ = mixing proportion.
These 14 speakers are excluded from the remainder of the analysis below, since they lack the phonological rule being studied. This demonstrates, incidentally, the value of a large subject pool: had there been only a few speakers who did not apply shortening, it might not have been obvious that they represented a separate population. They might have been included in the analysis, with misleading consequences.

Results of experiment 2 5.4 Durations
After exclusion of non-shorteners, there remained 58 speakers who produced 1335 tokens (445 triplets, 7.7 repetitions per speaker, minimum 4, maximum 8). Results were first averaged into type means by speaker and item, and these type means were the basis for all further analysis. Table 3 shows the mean durations of each item. The first vowel of [ӡaːbha] has 1.7 times the duration of the first vowel of [ӡabha], consistent with the long/short ratios typically reported for Levantine dialects. The first vowels of [ӡabha] and [ӡablak] are nearly identical in duration, with a very small difference (1.8 ms) in the opposite of the direction that would be expected from any influence of underlying length.

Formants in [ӡaːbha, ӡabha, ӡablak]
As noted earlier, short vowels in Levantine Arabic are more centralized than long vowels, and there is some evidence that duration and quality are separately controlled (Allatif 2008). It is conceivable that vowels undergoing shortening might take on the duration typical of short vowels, but retain a quality closer to that of long vowels. Since the first vowels of [ӡabha] and [ӡablak] occur between the same consonants, and with the same vowel in the following syllable, they are reasonably suited for comparison of formants. They are not a perfect minimal pair, since the following syllables contain different consonants, but we can gain at least a rough idea of whether shortened vowels have formant values typical of long or short vowels.
Short vowels are typically more centralized than long vowels in Levantine, and that is the case for these data. As Figure 2 shows, [ӡaːbha] has higher mean F1 and F2 values than either of the short vowels.
[ӡablak] and [ӡabha] have very similar F1 values, but [ӡablak] has a lower F2. This difference in F2 values is in the opposite of the direction that would be predicted by incomplete neutralization, as seen graphically in Figure 2.
Paired t-tests on the speaker formant means found no significant difference between [ӡabha] and [ӡablak] in F1, t(57) = -.7, p = .48. For F2, the difference approaches significance, t (57)   second vowels is caused by the difference in flanking consonants and/or the difference in syllable shapes. In any case, the trend towards an F2 difference seen in the first vowels is straightforwardly explainable as anticipatory vowel-to-vowel coarticulation. It is possible, of course, that this coarticulation could mask any incomplete neutralization that might otherwise exist; a study using true minimal pairs would be necessary to conclusively show that quality neutralization is complete.

Discussion
The main findings of these experiments are that 1) no durational or quality difference is found between underlyingly short vowels and vowels produced by closed syllable shortening; and 2) for a minority of Palestinian Arabic speakers, closed syllable shortening applies before /-ʃ/ but not before /-l/. As always, a null result does not prove the absence of an effect; incomplete neutralization could potentially emerge with more data or under a different methodology. However, the lack of significant results stands in contrast to our previous studies of neutralization in Levantine vowel epenthesis (Gouskova & Hall 2009;Hall 2013) First vowel formant means and standard deviations by item, averaged by speaker, then gender. differences emerged under highly comparable experimental paradigms. The differences in these studies' outcomes do not seem to be attributable to differences in experiment size, which are summarized in Table 4. Gouskova & Hall (2009) found significant effects of underlying contrasts in a dataset far smaller than that of the current experiments. While Hall (2013) used a larger dataset, it should be noted that significant effects emerged not only for the entire group in that study, but for most of the speakers analyzed individually, although speakers produced only an average of 106 tokens each.
Thus, the current experiments should have a good chance of picking up any effect with a magnitude similar to that of the effect of epenthetic status. I conclude that Levantine closed syllable shortening is, at the very least, much closer to phonetically neutralizing than Levantine vowel epenthesis. This finding has several interesting implications for our understanding of the factors influencing levels of neutralization. As noted in the Introduction, incomplete neutralization is often explained as an effect of orthographic confounds, or as an effect of alternations across morphological paradigms. The implications of these results for each theory are discussed below in turn.

Orthography does not determine completeness of neutralization
Incomplete neutralization in Arabic does not seem to be caused by orthographic distinctions, which are often proposed as a cause of incomplete final devoicing in European languages.
In the stimuli for Gouskova &Hall (2009) andHall (2013)'s studies of Levantine epenthesis, neither epenthetic nor lexical vowels were represented in the orthography (e.g., [libis] from /libs/ and /libis/ were both written , lbs), and yet incomplete neutralization occurred. In the current study, there was a clear orthographic indication of underlying length: underlyingly long vowels were written with the letter for long /aː/, while short vowels were not written (as is normal), and yet they were pronounced with no discernable difference. Both findings are opposite the predictions of theories that explain incomplete neutralization as an effect of orthography.

no
Vowel shortening pre-/l/ Experiment 2 890 no  It is worth noting, however, that the role of orthography is different in Levantine Arabic than in most languages where incomplete neutralization has been studied. Colloquial Arabic is rarely written, so orthography may be weakly represented in the mental lexicon for distinctively colloquial words like [ӡaːb] 'brought', which does not exist in Standard Arabic.

Implications for lexical organization
As noted earlier, incomplete neutralization is often explained as an effect of morphological paradigms. For example, Ernestus & Baayen (2006) propose that when one morphologically complete wordform is activated for production, activation spreads to morphologically related forms. Segments from these forms that are perceived to be in correspondence may affect one another's phonetic realization, as discussed above (1). 3 Following this theory, we might predict that when producing [ӡablak], a speaker will also activate related forms from its paradigm ( (7) etc.
Contrary to this prediction, the current study found no evidence that vowel length alternations in verb paradigms affect vowel duration or quality. This might be interpreted as evidence against the paradigm effects theory. However, I suggest that the lack of effect is compatible with the theory when we take into account a) certain psycholinguistic findings on lexical organization in Arabic, and b) the relatively fossilized nature of the closed syllable shortening process. To preview the argument, I will propose that different members of a verb paradigm are linked in different ways within the mental lexicon, and that verb forms with long and short vowels are not linked through the morpheme that determines vowel timing -a conclusion that is compatible with findings from lexical priming studies in Semitic languages. Some background on Arabic morphology is necessary here. Verbal paradigms are complex: finite forms obligatorily inflect for person/gender/number (8 categories) and tense/aspect (3 categories), and if semantically appropriate, may take suffixes for direct objects (8 categories) or indirect objects (8 categories), as well as negation. Imperatives and active participles also inflect for gender and number, and take the same array of suffixes. A verb like /ӡaːb/ 'bring', which can be transitive or ditransitive, has hundreds of affixed forms. Affixes trigger stress shifts and alternations of vowel quantity and quality, leading to considerable stem allomorphy.
To assess whether co-activation of paradigm members with different stem allomorphs affects the phonetic realization of alternating segments in the stem, we must determine which members of the paradigm are linked to one another and hence co-activated, and which segments in them are considered to be in correspondence. Perceptual priming studies can shed some light on the first question. Priming, like incomplete neutralization, is believed to result from the spreading of activation from one lexical item to another. Although priming results from co-activation during perception, while incomplete neutralization would reflect co-activation during production, it is reasonable to hypothesize that the lexical linkages affecting the two phenomena are similar.
For Semitic languages, there seem to be two distinct types of morphological priming. The most robust priming occurs between words sharing a root. This root morpheme, which carries lexical meaning, usually consists of three consonants. For example, the Arabic root for 'write' is /k-t-b/, reflected in forms like [katab] 'wrote', [kaːtib] 'writer', [maktab] 'office', [kitaːb] 'book', and [kutub] 'books'. Such words prime one another. Boudelaa & Marslen-Wilson (2004: 111) find that this effect is independent of semantic relatedness: for example, writing-related words in Standard Arabic prime [katiːbatun] 'battalion', which shares their /k-t-b/ root but not their meaning.
Weaker priming occurs through the "word pattern" morpheme. This morpheme consists of an abstract timing frame such as CaCaC or Ca:CiC, where 'C' indicates where to insert a root consonant. It carries morpho-syntactic information such as lexical class.
As an example of word pattern priming, Standard Arabic [Ɂintaʃara] 'spread' primes [Ɂiħtamala] 'endure', because both belong to the ɁiCtaCaCa verbal class (Boudelaa & Marslen-Wilson 2004). Importantly, the word pattern morpheme determines the quality and quantity of a word's vowels. For example, word patterns dictate that 'write', like the other hundred-plus verbs of its class, will have a short first syllable vowel in the perfective ([katab], in the CaCaC pattern), but a long vowel in the active participle ([kaːtib], in the Ca:CiC pattern). 4 The existence of root priming and word pattern priming suggests that words are decomposed into roots and word patterns during processing, and that activation spreads to other forms sharing the same morphemes, perhaps through linkages to abstract representations of these morphemes. A rough schema of this lexical organization is shown in Figure 3. A form like [katab] activates two distinct sets of related words: words with /k-t-b/ roots, and words with CaCaC patterns. Root and pattern priming occur at different time periods (inter-stimulus intervals) during processing, suggesting that they reflect separate activation processes.
Turning to the types of words studied in the current experiments, we might initially hypothesize that words with a shared stem, like [ӡaːb] 'brought', [ӡablak] 'brought to you' and [ӡabeʃ] 'brought-neg.', would be linked in two ways: their stems are based on the Ca:C word pattern (which defines 3ms perfective for a class of several dozen verbs), and they share a root. (Their root is usually analyzed as /ӡ j b/, although the /j/ is very abstract and rarely surfaces). This hypothetical organization is shown in Figure 4a. We would expect, then, that activation flows from [ӡablak] or [ӡabeʃ] to [ӡaːb] through two routes.
However, there is reason to doubt that [ӡaːb] is actually linked to [ӡablak] and [ӡabeʃ] through word pattern. Two studies on Semitic have found that word pattern priming is disrupted when a phonological process alters the CV timing pattern of one or both forms. For example, Standard Arabic [Ɂaθnaa] 'praise', from /Ɂaθnaja/, and [Ɂalγaa] 'cancel', from /Ɂalγaja/, are both based on the ɁaCCaCa pattern, but surface as ɁaCCaa due to elision of a glide. Despite sharing an underlying word pattern, and having an identical CV-pattern on the surface, they do not prime one another (Boudelaa & Marslen-Wilson 2004). The authors suggest that "access to components of internal representations of primes and targets has been disrupted" by word pattern allomorphy. Frost, Deutsch & Forster (2000) find similar results for verbs in Hebrew, another language with a root and pattern morphology. Priming occurs between words that belong to the same verbal pattern, such as [hisrit] 'he filmed' and [hispik] 'he managed' from the pattern hiCCiC, but the priming effect disappears when prime and/or target undergoes consonant deletion, as in [hipil] 'he overthrew' from /hinpil/.
The lack of priming in such cases suggests that lexical linkages through word pattern may weaken or disappear when a phonological process alters the surface realization of the word pattern. Closed syllable shortening is an example of such a process: it alters vowel quantity, which is an essential part of the word pattern. If this alteration breaks the link between the shortened form and the Ca:C pattern, then the lexical organization will be as in Figure  Under this proposal, [ӡaːb] and [ӡablak] are of course still linked, through their root. To complete the argument as to why there are no subphonemic effects on vowel duration in this situation, we must introduce one additional conjecture: that when two forms are linked only through the CCC root, no correspondence relations are created between their vowels, and hence their vowels do not influence one another in production. The very nature of the lexicon gives reason to think this might be true. Words that share a root can vary greatly in the number, locations, length, and qualities of their vowels. As illustrated in (8) It is not clear how segment-level vowel correspondence relations could be formed between words with different numbers of stem vowels. Even where the number of vowels matches, it is hard to imagine what the mutual activation of the vowel set [a aː i iː] would mean. It seems more likely that speakers simply do not perceive these disparate vowels (most of which, remember, belong to different morphemes) as being in any kind of correspondence relationship. Hence, any activation spreading through the CCC root would spread only to the root consonants in related words, not to the vowels.
In the case of forms sharing a word pattern, the pattern is the opposite. There is a clear correspondence between the vowel of [ӡaːb] and that of related verbs like [ʕaːd] 'repeated', [faːq] 'woke', etc. These vowels' pronunciation could be reinforced by examples from other verbs sharing the word pattern. On the other hand, it is hard to see how there could be mutual influence between, say, the onset consonants of such words, given that they could potentially comprise all the consonants of the language. I suggest that when words are connected only through their patterns, there is no co-activation of root consonants in production.
In short, the proposal is that vowels of different words affect one another only when the word pattern is shared; consonants affect one another only when the root is shared. Speakers mentally link [ӡablak] to [ӡaːb] through the shared root, but not through the (theoretically) shared word pattern of the stems. Therefore no correspondence is drawn between the [a] of [ӡablak] and the [aː] of [ӡaːb], so there is no tendency to lengthen the vowel in [ӡablak].

Closed syllable shortening vs. vowel epenthesis
Under this approach, it is necessary to explain why incomplete neutralization does occur with vowel epenthesis, which also changes CV timing patterns. For example, nouns of the CiCC word pattern become CiCiC on the surface if epenthesis applies: /libs/ → [libis] 'clothing'. Yet when compared to words from an underlying CiCiC pattern, like /libis/ [libis] 'wore', both the first and second syllable vowels in epenthesized nouns show subtle quality and timing differences in the phonetic direction of CiCC forms. I suggest that the different outcomes for the two processes are related to a distinction drawn in section 1.2: vowel epenthesis is a newer, more "postlexical" process than closed syllable shortening. It is possible that the break between a surface form and its original word pattern occurs only when the pattern-altering process has become somewhat fossilized.
The types of word-pattern allomorphy that disrupted priming in Frost, Deutsch & Forster (2000)  'accessible'. Closed syllable shortening is not as ancient or fossilized as either of these processes, but it could certainly be analyzed as morphologically restricted given that it occurs only before certain suffixes (and for reasons that are not obvious from the surface forms; see (2)).
Levantine vowel epenthesis, by contrast, shows few if any restrictions on its application; it seems to be a process at a relatively early stage of its life-cycle, in the sense of Bermúdez-Otero (2014). It applies to new loanwords ([filim] 'film'), and between words in novel phrases (/dist ӡdiːd/ → [distiӡdiːd] 'new cauldron'). I suggest that a synchronically active process like this does not result in delinking of the morphologically complete surface form from the underlying word-pattern. Rather, the link between the CiCiC surface form and the CiCC word-pattern remains, as shown in Figure 5a.
It could also be relevant that vowel epenthesis is an optional process, resulting in two surface forms for a single morpho-lexical word. A speaker will often vary between producing words like 'clothing' with one or two syllables: [libis] or [libs]. Perhaps in this situation, there is some direct linkage between the two alternative surface forms, not mediated through root or pattern morphemes. This possibility is shown in Figure 5b. However, it should be noted that Hall (2013) found no correlation between the frequency of epenthesis in a given item and the phonetic realization of the epenthetic vowel. Some CiCC pattern words always surface with epenthesis, while in others epenthesis is rare; yet the degree of incomplete neutralization did not differ significantly across items. If the existence of a CiCC surface form were the cause of incomplete neutralization, we might expect neutralization to become complete in the case of nouns that are always CiCiC on the surface. For this reason, I prefer the analysis in Figure 5a for explaining the existence of incomplete neutralization.
In conclusion, the finding of incomplete neutralization in epenthesis but not closed syllable shortening is compatible with the theory that morphological relations drive incomplete neutralization, if we accept that a) morphological linkages through roots vs. word patterns have asymmetric effects on the phonetic realization of vowels and consonants, and b) the morphologization or lexicalization of a phonological process results in delinking of forms from their (historic / underlying) abstract word-pattern.

Functional role of incomplete neutralization
Another interesting claim of Ernestus & Baayen (2006) is that incomplete neutralization can support speech perception. In Dutch, final devoicing collapses many lexical distinctions. Listeners may need to recover the (underlying) voicing of final obstruents in order to identify a known word or predict the inflection of a new word. In this way, incomplete neutralization is functionally useful in perception, and Ernestus and Baayen demonstrate that the speech perception system does exploit it.
By contrast, Arabic vowel length alternations cause virtually no lexical ambiguity. In constructing stimuli for this experiment, I was unable to find even one true minimal pair between a /CV:C-ʃ/ verb and a /CVCʃ/ noun. /CV:C-l-/ verbs occasionally have minimal pairs in suffixed /CVCl/ nouns, such as /ӡaːb-l-ak/ 'he brought you' versus /ӡabl-ak/ 'your mountain', but such pairs are rare and seem unlikely to cause confusion in context. In short, it's not clear that subtle phonetic cues to underlying vowel quantity would even be useful in processing.
Vowel epenthesis, on the other hand, does cause neutralization of some distinctions, especially between noun-verb pairs like /libs/ 'clothes' versus /libis/ 'wore'. A sentence beginning [huwwe libis…] is temporarily ambiguous between 'he wore…' and 'it (is) clothing…'. There is some indication that this homophony is undesirable. In Makkan Arabic, Abu-Mansour (1991) reports that a similar epenthesis pattern is blocked just where it would create homophonous noun / verb pairs. For example, there is usually epenthesis in /sr/ codas, but not in the noun /Ɂasr/ 'capture' because there is already a verb /Ɂasar/ 'capture'. Levantine does not outright block the formation of homophonous noun / verb pairs, but some speakers do have incomplete neutralization of the epenthetic and lexical vowels that distinguish such pairs. Perhaps this serves a similar functional purpose in disambiguation.
In short, it is possible that the maintenance of incomplete neutralization depends partly on its level of functional usefulness for perception. Perhaps neutralization is more complete in shortening than in epenthesis in part because only the latter produces surface ambiguities.

Why is shortening being lost before /I/?
Some theoretical work on Palestinian (Abu-Salim 1986; Younes 1995) treats shortening in words like /faːq-ʃ/ 'didn't wake' and /ӡaːb-l-ak/ 'brought to you' as a single process, expressible as a single phonological rule. This study found, however, that speakers from some regions (Jerusalem and Little Triangle) apply shortening before /-ʃ/ but not before /-l/.
The development of a long vowel in /ӡaːblak/ fits a well-attested type of change called paradigm leveling, in which a morpheme shifts towards a more uniform pronunciation across a paradigm so as to better fulfill a paradigm uniformity condition as formalized in (9).
(9) Paradigm Uniformity: All surface realizations of μ, where μ is the morpheme shared by the members of paradigm x, must have identical values for property P. (Steriade 2000) In the paradigm [ӡaːb -ӡablak], the morpheme 'bring' has non-uniform vowel length, but in the innovated paradigm [ӡaːb -ӡaːblak], it is uniform. Why does paradigm levelling not also block shortening before /-ʃ/? A likely explanation for the divergence in treatment of the two suffixes lies in the fact that words like [faqeʃ] 'didn't wake' have another realization as [faqʃ], without the optional vowel epenthesis. In [faqʃ], a long vowel would be phonotactically impossible in Levantine, where surface CV:CC syllables are systematically disallowed. Let us suppose that the phonological pressure for paradigm uniformity is stronger between variants of the same word than between morphologically different forms. This would mean that [faqeʃ] is more constrained by similarity to [faqʃ] than to [faːq] 'woke'. Forms like /ӡaːblak/, on the other hand, have no output variant where the vowel would appear in a doubly closed syllable on the surface. Since shortening is phonotactically unnecessary in every output variant of the word, speakers can improve paradigm uniformity by altering their grammars to produce [ӡaːblak], which better resembles the base form [ӡaːb] 'brought'.
This explanation is in line with proposals by Kawahara (2002), Anttila (2006), and Ettinger (2007), all of whom argue that the historical survival of opaque phonological interactions is connected to the existence of transparent surface variants. The form [faqʃ] constitutes a transparent variant to support the opaque interaction of shortening and epenthesis in [faqeʃ], but no such transparent variant exists to support the opaque interaction of shortening and cyclic resyllabification in [ӡablak].

Conclusion and future directions
To understand the causes of incomplete neutralization, we need a more complete typology both of where it does happen, and where it does not happen. This paper represents a step towards such a typology. It examines the degree of neutralization in a type of alternation (vowel quantity) that has been little studied in this light, in a type of morphological system (root and pattern) that is very differently organized from those of Indo-European languages. The results of the experiments reported here suggest that Levantine Arabic closed syllable shortening produces complete phonetic neutralization, unlike previous findings for Levantine vowel epenthesis.
Orthography cannot explain why shortening produces complete neutralization while epenthesis does not. The difference between the two processes may lie in the fact that epenthesis is a more synchronically active process, and moreover causes more ambiguity, which incomplete neutralization can mitigate. The present analysis builds on psycholinguistic theories of Semitic lexical organization, in which words are linked through consonantal roots and/or CV word patterns. I have suggested that incomplete neutralization of timing contrasts is possible only when alternants are linked through the word pattern. Closed syllable shortening, which alters the word pattern, breaks pattern-mediated links in the mental lexicon between paradigm members with short and long vowels in their stems, so that they do not influence one another in production. While speculative, this proposal illustrates a way in which the distinctive morphology of Semitic languages could potentially shape their phonetic neutralization patterns.