Gradient phonological relationships: Evidence from vowels in French

The dichotomy of contrastive and allophonic phonological relationships has a long-standing tradition in phonology, but there is growing research that points to phonological relationships that fall between contrastive and allophonic. Measures of lexical distinction (minimal pair counts) and predictability of distribution were applied to Laurentian French vowels to quantify three degrees of contrast between pairs: high, mid, and low contrast. According to traditional definitions, both the high and mid contrast pairs are classified as phonologically contrastive, and low contrast pairs as allophonic. As such, a binary view of contrast (contrastive vs. non-contrastive) predicted that high and mid contrast pairs would pattern together on tasks of speech perception, and low contrast pairs would show a different pattern. The gradient view predicted all vowel pairs would fall along a continuum. Thirty-two speakers of Laurentian French participated in two experiments: an AX task and a similarity rating task. The results did not support a strict binary interpretation of contrast, since the high, mid, and low contrast vowel pairs pattern differently across the experiments. Instead, the results support a gradient view of phonological relationships.


Introduction
The concept of contrast is at the heart of phonological analysis (Avery et al. 2008). In phonological theory, the relationships between speech sounds serve to differentiate words.Thedifferencebetweentheinitialconsonantsoffast [faest] and vast [vaest], for example,signalsadifferenceinmeaninganddistinguisheslexicalitems.Segmentsthat distinguish between lexemes are considered to be in a contrastive relationship, and have traditionally been viewed as belonging to a stored inventory of underlying phonologicalrepresentations.Segmentsthatarenotcontrastiveareinanallophonicrelationship with each other. Recent research, however, suggests that the traditional dichotomy of contrastive versus allophonic phonological relationships is far more complex (Hall 2013). This is because the criteria that are commonly applied in phonological analyses to determine whether or not two sounds are contrastive do not account for intermediate phonological relationships, which fall between fully contrastive and fully allophonic. While the concept of gradient contrast is not new (Goldsmith 1995;Cohn 2006;Ladd 2006;Scobbie & Stuart-Smith 2008), there are an increasing number of authors who employ terms to describe intermediate relationships such as quasi-contrastive, semi-allophonic, and mushy contrasts (see Hall 2013). Researchers have begun to re-examine the way phonologicalrelationshipsaredefined(Dresher2008;Ernestus2011;Hall2009,2013;Lu 2014;Hall2015;Hall&Hall2016),withtherecentdevelopmentofamodelofcontrast described along a continuum (Probabilistic Model of Phonological Relationships, PPRM;Hall2009,2015.Despitethis,therehasbeenlittleresearchonhowtopreciselydefine phonological relationships. For what little experimental research exists testing phonological relationships, the results have varied in the literature. The goals of this research are to explore criteria for establishing phonological relationships, to apply these criteria to identify various degrees of contrast in Laurentian French (LF) vowels, and lastly, to test for evidence of gradient contrasts in two speech perception experiments. Laurentian French refers to dialects of French spoken in Canada excluding Acadian French (Côté 2012).
A typical phonological analysis begins by determining the relationships between speech sounds. In generative frameworks, contrast is often approached with an all-or-nothing view: two sounds either contrast or they do not, and gradience tends to fall under the domainofphoneticsandnotphonology(e.g.Chomsky&Halle1968).Sayingthattwo segmentscontrastindicatesthattheyparticipateinaspecifictypeofphonologicalrelationship. It is taken to indicate that the two sounds are members of a phonological inventory and have distinct underlying representations, except when it can be shown that what appears to be a contrast on the surface is derived from other phonemes. For (Boomershineetal.2008).Thisisanexampleofasurface contrast where an apparent contrast exists but that one of the sounds involved is derived from an underlying phonemeandisnotitselfpartofthephonemicinventoryofthelanguage.Becausesegments can appear to contrast in some word positions and not in others, creating surface contrasts such as the one above, the question has been raised as to whether contrast is binary and an all-or-nothing type of relationship, or gradient (Hall 2013).
The question of whether contrast is gradient or binary directly impacts the way in which contrasts and underlying representations are arrived at (for example, by comparing minimalpairs)andcanhaveasignificanteffectontheoutcomeofagivenanalysis,aswell as on the implications of what is assumed to be stored in an underlying representation. For example, exemplar theories, which allow for a gradient view of contrast due to the natureofphoneticcategories,havedifferentassumptionsofwhatconstitutesacategory of speech sound and how the relationships between these categories are expressed and evaluated. Phonetic categories in an exemplar theory can be viewed as tokens of experienceorganizedinamentalmapofphoneticdistributions,parameterizedwithacoustic, articulatory, and perceptual information (Pierrehumbert 2003). When similar rememberedtokensreachalargeenoughnumber,thisgroupisgeneralizedandacategoryis formed (Pierrehumbert 2000(Pierrehumbert , 2001Bybee 2006). A category is more robust when its associated exemplar tokens are more frequent since every new token mapped to an existing category strengthens that category by grouping more and more similar exemplars together(Pierrehumbert2001;Bybee2006).Forexample,high-frequencyexemplarsare moreresistanttochange(Bybee2006)suggestingthatfrequencycontributestocategory robustness. Frequent recent experiences of exemplars will also have higher resting activation levels than infrequent exemplars (Pierrehumbert 2001), and as such, frequency effectswilldirectlyimpacttheprocessingofasound.Frequently-encounteredcategories will also be favoured in speech perception because this involves resolving a competition between possible alternative classifications; the cumulative force of the more frequent exemplars will steer the resolution of that competition in one direction or another (Pierrehumbert 2006). The frequency of speech sounds in their various environments therefore influences categories and speech processing. Similar tokens are organized in terms of members that are more or less central to the category rather than in terms of features(Bybee2006).Thesefactsarenoteasilycapturedbymoreabstractconceptionsof underlying representations. This view of phonetic categories coincides with models where contrast is described along a continuum such as the Probabilistic Model of Phonological Relationships (Hall 2009(Hall , 2015 which focuses predominately on the continuum of predictability of distribution. The critical role that frequency plays in establishing phonetic categories and relationships between them will be captured in the measurable criteria used below to determine levels of contrast in our experimental stimuli, something which cannot be captured by a binary approach to contrast.

Criteria for contrastive relationships
There are multiple criteria that are typically used in a binary approach to contrast to determine whether two sounds are in an allophonic or contrastive relationship; however, the formulation and application of the criteria is not always clear and the default expectation is that there are only two types of phonological relationships. The most commonly used criteria to determine phonological relationships are outlined in Hall (2013: 223-225). Briefly,twosoundsaretypicallyconsideredcontrastiveiftheydefinealexicaldistinction,iftheydonothaveapredictabledistribution,oriftheyarewrittenwithdifferent graphemes. Two sounds are allophonic if they participate in allophonic alternations conditionedbyaspecificphonemicenvironment,arejudgedtobethesamesoundbynative speakers, and are written with the same grapheme. In addition to work by Hall, other authorshavealsobeguntodefinecontrastusingavarietyofphoneticandusage-based metrics, such as frequency and functional load, see work by Renwick (2014) and Renwick et al. (2016). The two most important and often-used criteria in a binary approach are lexical distinction (also called the distinctive function) and predictability of distribution. We refer to a "binary approach" because how many lexemes are differentiated or how predictable is a distribution is typically ignored; as it pertains to phonological theory, all that matters is that at least one pair of lexemes is distinguished, or that a pair of sounds is unpredictably distributed to establish a contrastive relationship status. 'cool'. It is not clear under a binary view of contrast whether only a handful of contrasts involvingtensevowelsissufficientmakethesephoneslegitimatelycontrastive,andso should perhaps be a part of the phonological inventory, and begs the question of whether they are somehow less contrastive or exhibit a weaker contrastive relationship than highfrequency contrasts. These examples illustrate problems in establishing contrast because of conflicting or unclear criteria. This research aims to quantify two of the most used criteria to determine phonological relationships: lexical distinction and predictability of distribution. In doing so, a scale of contrast can be established and move beyond the all-or-nothing binary approach to classifying phonological relationships. As these criteria form the basis for the current research, they will be discussed in greater detail, along with previous experimental results examining these criteria.

Determining contrast based on the distinctive function
If two sounds serve a distinctive function, i.e. they are used to distinguish two otherwise identical lexemes or morphemes, they are considered to be in a contrastive relationship, anapproachusedinbothstructural(Saussureetal.1916;Twaddell1935)andgenerative phonology(Chomsky&Halle1968).However,differentminimalcomparisonscanyield differentconclusionsaboutwhattoincludeinanunderlyingrepresentationorphoneme inventory. Indeed, for such a common criterion, there are few attested formalizations of how minimal comparisons should be carried out, and there is a lack of commonality acrosstheapproaches.Theseformalizationshavefocusedonthefeaturelevel(asopposed to the phonemic level), but nevertheless use lexical contrast to determine contrastive features.Forexample,ContrastiveSpecificationpositsthatallandonlycontrastivefeatures arespecifiedunderlyingly,andpredictablefeaturevaluesareeliminated(Steriade1995), whereas RadicalUnderspecificationclaimsthatallandonlyunpredictablefeaturesare specified(Archangeli1988).Somealgorithmshavebeencreatedasawayofdetermining underlying features, such as the Pairwise Algorithm which relies on minimal pair contrasts(Dresheretal.1994)andtheSuccessiveDivisionAlgorithm(Dresher2008),which does not depend solely on minimal pairs for determining contrastive features.
Evenwiththeaidofanalgorithm,theminimalpairtestbyitselfisinsufficienttoentirely determine what is contrastive. Take differential substitution where speakers of different first languages (L1s) will produce different phones for the same second language (L2) phone.  (Lombardi 2003). If the assumption is correct that L1 feature matrices are being transferredontoanovelL2phone,thenEuropeanFrenchspeakersandLFspeakersshould substitute the same consonants. Using any kind of algorithm would yield the same features and underlying representations for both dialects of French, and would not be able to accountforthedifferencesintheL2substitutedphone(seeJesney2005foranaccountof thesecasesbydialect-specificactivephonologicalprocesses).Theminimalpairtestalso leads to disagreements on the members of a phonemic inventory, depending on whether loanwordsareconsideredpartoftheL1lexicon.InJapanese[ɸ]and[t s ] only occur before [ɯ];however,inforeignwords,[ɸ]and[t s ] may occur before other vowels (Vance 1987;Ito&Mester1995;Brown1997).AlsoseetheexampleofLaurentianFrenchvowelalternations described above (Côté 2010). These cases bring into question whether a single minimalpairissufficienttoclassifytherelationshipbetweentwophonesascontrastive and whether loan words should be among the lexical items being compared.
Onewaytocarryoutminimalpaircomparisonsinaquantifiablewayistocalculate the functional load of a language's contrasts. Functional load measures the frequencies of two contrastive sounds and the degree to which those two sounds contrast in all possible environments. This is to evaluate how much work the contrast does as compared to other contrasts(King1967;Brown1988;Wedeletal.2013).Unlikethedistinctivefunction, functional load is able to take the simple yes or no answer of whether or not two sounds contrast and place the relative importance of that contrast on a scale as compared to other contrastsinagivenlanguage.Thisallowsforamoreobjectiveassessmentofthecontribution of a contrast to the overall phonological system of a language. Functional load was used in this study as a means to measure the degree to which two sounds are contrastive.Thenumberofminimalpairsbetweentwospecificsoundswascounted,aswellas the number of minimal pairs in which a single sound participated. (This methodology is discussedingreaterdetailbelow.)Soundsthatparticipatedinahighnumberofcontrasts were dubbed High Contrast, those with a small number were dubbed Low Contrast, and those in between were dubbed Mid Contrast (not to be confused with high, mid and low vowelsintermsoftongueheight).Inaddition,werecognizethatthevowelsusedtoexem-plifyHigh,MidandLowContrastpairsalsodifferinotherways,suchastheiracoustic properties.Inordertoruleoutthepossibilitythatperceiveddifferenceswereduesolely totheseacousticpropertiesasopposedtophonologicalfactors,wemeasuredthesedifferencesusingmultiplemethodologiesanddiscussbelowhowtherewasnoconsistenteffect of the acoustic properties on the results.

Determining contrast based on the predictability of distributions
Thepredictabilityofsegmentaldistributionsinagivenlanguageisalsousedtodefine contrast: "Two segments X and Y are traditionally considered to be contrastive if, in at least one phonological environment in the language, it is impossible to tell which segment will occur. If in every phonological environment where at least one of the segments can occur, it is possible to predict which of the two segments will occur, then X and Y are allophonic" (Hall 2009: 2). Rather than being an all-or-nothing criterion, Hall quantifiespredictabilitybythreeprobabilisticmeasures:bias,environment-specificcontrastiveness,andsystemiccontrastiveness.Biasandenvironment-specificcontrastivenessreflect the likelihood of one sound or another occurring in a given phonological environment, whilesystemiccontrastivenessreflectshowmuchuncertaintythereiswhenchoosingone sound or another across all environments. Using type and token frequencies, Hall devises algorithms to calculate the uncertainty (i.e. the entropy) of the distribution of segments, allowingforagradientcomparisonoftheeffectthatindividualwordshaveonthephonological relationship between two sounds across a phonological system.
Due to the fact that some treat the predictability of distribution as an all-or-nothing criterion while others acknowledge its gradient nature, issues also arise with the applicationofthiscriterion.Determiningphonologicalrelationshipsbecomesmorecomplicated when the criterion of distinctiveness overlaps with the criterion of predictability of distribution. For example, in Laurentian French, the lexemes saute[sot]'jump.3p.sg.prs'. and sotte [sɔt]'stupid.f'aredifferentiatedsolelybytheirvowels.Thissatisfiesthecriterion of distinctiveness. Furthermore, in this example, their distribution is also unpredictable since the environment does not condition one or the other vowel. With these two criteria taken into account, these vowels would traditionally be viewed as contrastive sounds. and never in open final syllables. Since their distribution is sometimes unpredictable (associated with contrast) and sometimes predictable (associated with allophomorphy), itisnotclearwhethertherelationshipbetweenthesesoundsshouldbeclassifiedascontrastive, allophonic or as something between the two. When the indicators for typically contrastive relationships contradict each other in this way, the criterion of predictability of distribution is often ignored as long as lexical distinctions exist. However, one might questionwhethersound-pairsthatdonotsatisfyallcriteriashouldbeclassifiedashaving a relationship that is intermediary to contrastive and allophonic. Rather than forcing a classificationoftheserelationshipsasfullycontrastiveorfullyallophonic,suchcasesmay be indicative of intermediate levels of contrast.
Allophones have also been found to be perceived as more similar to one another than phonemes(Boomershineetal.2008).Allophonicalternationsentailachangeinphonetic category, but not phonological category. In a similarity rating task, it is therefore expected thatsoundsthatdonotcueacontrastshouldbedifficulttoperceive,andtheyshould thereforebejudgedasbeingmoresimilar.Asimilarityratingtaskisthoughttobeable toshowsubtletiesintherangeofbelongingtoacategory;e.g.,ifalistenerjudges[t]and [t h ]asbeingverydifferent,thisisbelievedtoreflectaphonologicalrelationshipandtwo separatecategories,whereasifalistenerjudgesthemasbeingverysimilar,thisreflects an allophonic relationship, or belonging to the same phonetic category. Boomershine et al. tested whether allophones are perceived as less distinct than contrastive sounds within a L1. They used similarity ratings as well as reaction times (RTs) from a speeded AX discrimination task, where longer RTs were associated with greater similarity and shorter RTs were associated with less similarity.  Hall (2009). Using predictability of distribution as the main criterion as represented by the entropy of the segments tested, Hall tested four pairs of German consonants exhibiting different levels of predictability of distribution. She hypothesized that pairs with greater predictability would be perceived as more similar. The results were inconclusive, and Hall provides a variety of potential causes for this, such as the entropy values betweenpairsbeingtooclosetooneanother,differencesinphonotacticlicitnessofthe contexts in which the consonants occurred, among others.

Current research
Multiple authors have found the need to appeal to terminology beyond the terms contrastive or allophonic. It appears that phonological relationships more likely fall on a scale from allophonic to contrastive, as opposed to the traditional view that contrast is an all-or-nothing phonological status. However, there has been very little experimental research supporting the view that contrast is gradient (see section above that describes the few studies that have been done). In addition to testing the extremes on the scale of contrastive to allophonic relationships, the current study will also test intermediate relationships ofcontrast;specificallyvowelspairsthatexemplifyHigh,MidandLowdegreesof contrast. Thesecontrasttypesaretestedintwostudies:anAXtask(Experiment1)anda similarity ratingtask(Experiment2)tofacilitatecomparisonsacross experimentalparadigmsand previous research, such as Boomershine et al. (2008). Different results are expected depending on whether phonological relationships are binary in nature (i.e. fully contrastive or fully allophonic) or gradient in nature (intermediate relationships between contrastive and allophonic). If a binary view of contrast is supported, it is predicted that High and Mid Contrasts will yield similar results of higher accuracy and faster RTs, while Low contrastwillpatterndifferentsincetheyareinanallophonicrelationship(H=M>L). If the gradient view of contrast is supported, it is predicted that for our experimental variable of Contrast (High, Mid, Low), the High Contrast vowel pair should be the easiest to discriminate, resulting in high accuracy and shorter RTs; the Mid Contrast vowel pair should result in lower accuracy scores and longer RTs; and the Low Contrast vowel pair shouldresultinthelowestaccuracyscoresandlongestRTs(H>M>L).Wealsolooked attheacousticdifferencesbetweenthespecificvowelpairstoseewhetheracousticscould also account for the accuracy and speed of participants responses. We expand on this in our discussion of the stimuli below.

Stimuli
There were two main criteria used to determine the phonological relationship between two sounds: lexical distinction and the predictability of their distributions.

Lexical distinction stimuli selection criteria
The OMNILEX database (Desrochers 2006) was used to establish a word list of French one-syllable words of CV, VC, CVC and CCV syllable structure. The database includes approximately 102,000 lexical entries originating from multiple French dictionaries and the Lexique corpus (New et al. 2004) and phonetic transcriptions are based on European French. Although the database provides minimal pair counts and neighbourhood density values, these could not be used for Laurentian French. The database was therefore re-transcribed to reflect a standard Laurentian pronunciation by a native speaker and expert in Laurentian French phonetics and phonology, paying particular attention to vowels[ɪ],[ʏ],[ʊ]and[ɜ]becausethesevowelsdonotoccurinEuropeanFrench.
The resulting corpus was then processed using the software Phonological Corpus Tools, version 1.1.1 . The corpus was uploaded into the software and counts the number of minimal pairs in the corpus, in this case using the type frequency. The number of minimal pairs for each of 20 vowels was calculated by counting, for example, how many times a given vowel occurred after [b] in a monosyllable, then how many times that vowel occurred after [d], and so on, for every consonant and consonant combination. From this, it was calculated (a) how many minimal pairs a single vowel participated in with all other vowels (referred to as individual count), and (b) how many minimal pairs existedbetweentwospecificvowels(referredtoasshared count). This method of calculat-ingminimalpairswasdevelopedbasedonBrown(1988).AppendixAsummarizesthese results.Torepresentascaleofcontrast,thefinalselectionofvowelswerechosenfromthe high-end, middle, and low-end range of minimal pair counts, both in terms of individual vowel counts as well as shared counts (Table 1).
NotethatthedifferencebetweenHighandMidintermsofminimalpairsisnotpropor-tionaltothedifferencebetweenMidandLow.Therewasanecessarytrade-offbetween choosing tokens that matched equally well for number of minimal pairs and acoustic similarity(describedbelow   was used to calculate the predictability of distribution of the vowels tested in the current experiment. The vowels' predictability was calculated according to functional load based on four local environments: before the end of a word, before another vowel (0 forthisenvironment),beforeaconsonantthatis [v,z,ʒ]or[ʁ],andbeforeaconsonant that is not [v,z,ʒ]or [ʁ].Thepredictabilityofdistributionoverthefourenvironments is provided as entropy (uncertainty) as a number out of 1, where 1 indicates a perfectly overlapping distribution and therefore unpredictability, associated with contrast, and 0 indicates a perfectly complementary distribution. The specific algorithms are provided in the Phonological Corpus Tools help file . The High Contrast vowel pair exhibits the greatest level of uncertainty, which corroborates the level of contrast assigned to that pair, followed by the Mid contrast vowel pair. The Low vowel pair have complementary distributions, which is associated with allophony (Table 1). Predictability of distribution is tied in part to frequency, since it is based on the number of times a sound occurs in a particular phonotactic environment. The number of minimal pairs and the relative type frequency have been shown to be correlated to robustness of contrastandspeedofprocessing(Vitevitch&Luce1999;Wedeletal.2012).Therefore,the relative frequency of a sound was a controlled factor. Frequency of vowels in Laurentian French were based on the OMNILEX database, and done by calculating the number of timeseachvoweloccurredinmonosyllabicwords.Sincethereisnocorpuswithlexical frequencies for LF, only type frequency calculations were done. The stimuli's type fre-quenciesbasedontheOMNILEXcorpusareprovidedinTable1.Aswithminimalpair counts,thehigh-contrastvowelpair[a-ɔ]consistsofhightype-frequencyvowels;thelowcontrastvowelpair[y-ʏ]consistsoflowtype-frequencyvowels;themid-levelcontrast vowelpair[oʊ]fallsbetweenthetwo.

Acoustic measurements
Acoustic similarity was taken into consideration when choosing High, Mid and Low vowel stimuli pairs used in all experiments, so that the vowels were roughly acoustically matched by tongue position and lip rounding. This was done so as not to introduce a confound with the other measures and thus inadvertently favour one condition over the other. In other words, this was done to avoid having the High Contrast pair of vowels be maximally acoustically different compared to the pair of Low Contrast vowels. For example,nasalizationinEnglishisallophonicbutinFrenchiscontrastive.Researchhas shown that divergence along an acoustic cue is more distinct when the cue signals a contrast(Desmeules-Trudel2015,2016;Versteeghetal.2014).Thiswasdonebasedon thephoneticpropertiesofthechosenvowels,whichwasfurtherverifiedwithacoustic analyses. The stimuli's F1 and F2 measurements were taken in Praat from a steady-state portion of the vowel as close as possible to the mid-point. Figure 1 plots F1 and F2 for the stimuli, and Table 2 provides the F1 and F2 values. Values represent the mean of the two tokens that were selected as stimuli. The values are similar to Martin's (2002: 84) vowel spaceformalespeakersoftheQuebecdialect,indicatingthattheexperimentalstimuliare representative of LF vowels, and no vowels in the stimuli were anomalies of their phonetic categoryorunrecognizableasamemberoftheircategory. Table 2 shows that on average, the largest difference between F1 and F2 is between the High Contrast pair [a-ɔ], followed by the Mid Contrast [o-ʊ] pair, followed by the LowContrast[ʏ-y]pair.Basedontheseaveragevalues,itisnotpossibletodrawaclear line between judgments based on F1-F2 values and strength of contrast as calculated by minimal pair counts and relative frequency: result predictions appear go in the same directionwhetherbasedonaverageF1andF2differencesorlevelofcontrast.However, whencomparingacrossthespecificconsonantframes,F1andF2differencesdonotyield  .Itisnotclearinthiscase, basedonabsoluteHzdifferences,whetherparticipantsshouldfinditeasiertodistinguish between one vowel pair over the other. In order to discourage an acoustic mode of perception,theinterstimulusintervals(ISIs)weresetto1500ms.Previousresearchhasshown thatalongerISIencouragesaphonologicalmodeofprocessing,obscuringfineracoustic differences, and shorter ISIs encourages a more phonetic/auditory mode of processing (Werker & Logan 1985). Our use of an 1500 ms ISI should encourage participants to perform the task more in line with phonological relationships. WhenonetakesacloserlookattheindividualF1andF2differencesforspecificpairs, theamountofdifferencesbetweenF1andF2arenotalwaysinlinewiththecontrastcategory.Forexample,forthe[ʃ_ʃ]frame,thegreatestF1differencebetweenthepairswas forHigh,followedbyLow,followedbyMid,andfortheF2thegreatestdifferencewasfor Mid,followedbyLow,followedbyHigh.Iftheacousticdifferencesbetweenthespecific pairs is the most important factor for the accuracy and speed of participants' responses, wewouldpredictthattherewouldbeacorrelationbetweenF1andF2differencescores and performance on the task. We ran statistical tests to explore this possible interaction and return to this issue in the results.

Machine-assisted calculations of acoustic similarity
The above is only one possible way of evaluating acoustic similarity. Another way of measuring acoustic similarity was developed by Mielke (2012), who uses a phoneticallybased metric to assess the similarity of sounds. This metric combines multiple sources of acousticandarticulatorydata,includingnasalandoralairflow,vocalfoldactivity,larynx height,andultrasoundvideoofthetongueandlips.Spectralinformationandvocaltract shape is also used to calculate phonetic distances between phones. For the present study, theacousticdistancewasmeasuredbetweenthesixvowelsselectedasstimuli ([a,ɔ,o, ʊ,y,ʏ])usingthesamemethodsasinMielke(2012)developedforacousticcomparisons. The waveforms of the stimuli were converted into matrices of 12 Mel-frequency cepstral coefficients(MFCCs)inPraat,andthenadynamictimewarpingtechnique(DTW)was used to quantify acoustic similarities between vowels. This provides a weighted acoustic distance measure between vowels in the stimuli used in this study, where a lower number indicates less acoustic distance (i.e. greater similarity) and a higher number indicates greater acoustic distance (i.e. less similarity). The distances were as follows: A further caveat to interpreting these results is that humans do not perceive all acoustic differences inproportion;for example,absolutedifferences inpitch aremore difficultto perceive in the higher frequencies than in the lower frequencies (Yip 2002). Therefore, it cannot be expected that participants will perceive the vowels according to the absolute acoustic differences provided above. It cannot be predicted from this analysis that, for

example,participantswouldperceive[o]-[ʊ]asthemostsimilarpair,followedby[a]-[ɔ]
as the second most similar pair, followed by [y]-[ʏ], because their phonological system will still play a role in how these vowels are perceived. The pattern found in the above results would be predicted if participants were performing the experimental tasks as if with non-speech stimuli; if this is not obtained in the results, this would likely be indicative of phonological structure being imposed on the acoustic information, or else that the cues that are perceptually salient to participants are other than the cues measured in the weighted acoustic distances.
Given that higher frequencies are less relevant to human speech perception and these frequencies may have played a role in the resulting acoustic distances, the stimuli were downsampled to 11,000 Hz to eliminate periodicity above 5500 Hz and were then re-analyzed. Figure 2 shows the outcome of the re-analysis. Even with downsampling, the low pair [y]-[ʏ] still remained the most dissimilar. If participants perceive stimuli accordingtotheirweightedacousticdifferences,thiswouldsuggestthattheywouldbe performing the task as a non-speech task (i.e. in a purely acoustic/auditory manner).

Procedure
StimuliwerepresentedusingPsyScopesoftware.Participantsweretoldtheywouldhear one syllable followed by another syllable. They were instructed to press one key if they thought the two syllables they heard were the same, or another key if they thought the two syllables were different. Response keys were labelled, and counterbalanced across participantsforwhetherSameorDifferentcorrespondedtotheleftorrightsideofthe keyboard. The beginning of each trial was indicated by a tone. Participants wore headphonesinaquietroomandwereallowedtoadjustthevolumetoacomfortablelistening level.
Basedonagradientviewofcontrast,itwaspredictedthatthelevelofContrast(High, Mid, Low) would be reflected in accuracy scores and RTs, with highest accuracy and shortest RTs for the High condition, followed by the Mid, then Low condition. This prediction was partially borne out. Significant difference for accuracy scores were found between High-Mid, High-Low, and Mid-Low pairs. Results from RTs showed High-Mid andHigh-LowContrastpairsweresignificantlydifferent.Overall,theresultsfromthedifferent trials demonstrate a facility for High Contrast stimuli. If contrast was strictly binary in nature, it was expected that High and Mid pairs would yield similar results, but this was not the case.

Experiment 2
Experiment2usedadifferentmethodologyofsimilarityratingstotestHigh,MidandLow degrees of contrast. If the contrast between vowels is binary, it is predicted that High and Mid Contrast would both yield similar results of faster RTs and higher accuracy, while Lowwouldnotsincetheyareinanallophonicrelationship(H=M>L).Ifcontrastis more gradient, it is predicted that participants would rate the similarity between contrasts onascale(H>M>L),where>means"ismoredifferentthan".

Participants
Participants were the same as in Experiment 1. All participants first completed the AX task (Experiment 1), followed by a 4IAX task, and then the Similarity-Rating Task (Experiment2).Theresultsfromthe4IAXtaskcorroboratedtheresultsfromtheAXtask andarenotpresentedhereforbrevity(seeStevenson2014).

Procedure
Participantsweretoldthattheywouldheartwodifferentsyllablesandtheirtaskwastodecide howsimilarorhowdifferentthetwosyllableswereonascaleof1to6with"1"being"Not very similar" ("Peu similaire") and "6" being "Very similar" ("Très similaire"). A six-point scale was used so as to avoid the use of a middle number as a placeholder when uncertain, as sometimeshappenswithodd-numberedscales(Matell&Jacoby1971).Theyweretoldthat no two syllables were the same so that using a "6" did not mean that stimuli were identical. Stickerswithnumberswereaffixedtothelowerlettersofthekeyboard(keys"x"to"m"were labelled"1"to"6")alongwithareminderofwhattheextremenumbersmeant.Everyonehad the same scale, so that "x" was always "1 -Not very similar" and "m" was always "6 -Very similar". Participants were told that there was no time limit, that there was no correct answer, andtotrusttheirownspontaneousjudgment.Itwasnotpossibletoreplayanyofthetrials. Thetasklastedapproximatelyfiveminutesandtherewerenobreaks.AswiththeAXtask,a tone was used to draw participants' attention to the beginning of each trial.
High Contrast pairs were judged to be the most different; Low Contrast pairs were judgedtobethemostsimilar;andMidContrastpairsfellbetweenHighandLowinterms ofsimilarity(H>M>L).Thesefindingsareconsistenttheviewofgradientlevelsof contrast. No support for the binary view of contrast was found, otherwise High and Mid vowels should have yielded similar results.

General discussion
This research examined the notion that phonological relationships do not always perfectly match the criteria for being wholly contrastive or allophonic. High, Mid, and Low levels of contrast were tested to see whether phonological relationships are perceived as binary (i.e. only contrastive vs. allophonic), or whether degrees of contrast can be perceived (i.e. on a scalefromcontrastivetoallophonic).InExperiment1,resultsonthedifferenttrialsyielded differentiationbetweenHigh,Mid,andLowconditions.OntheSametrials,RTdifferences were found between High-Low and Mid-Low pairs. The likely reason why the results were notmirroredonDifferentandSametrialsliesinthenatureofthetaskbeingaskedofthe participants.Differenttrialstestedvowelcontrasts,whileSametrialstestedparticipants' ability to judge acoustic similarities between two same vowels. In Experiment 2, High Contraststimuliwerejudgedasbeingtheleastsimilar;LowContrast(allophonic)stimuli werejudgedasbeingthemostsimilar;andratingsforMidContraststimulifellbetween theothertwopairs.WhileBoomershineetal.(2008)usedafive-pointscaleanda1000 msISIandthepresentstudyusedasix-pointscaleanda1500msISI,bothstudiesshow that phones in an allophonic relationship were perceived to be more similar than those in aphonemicrelationship.TheBoomershineetal.studydidnot,however,testsegmentsin an intermediate relationship and therefore only presents evidence from two extremes of the scale of possible contrasts. The present study included stimuli from three strengths of contrastasquantifiedbypredictabilityofdistributionandfunctionalload.
Although the results do not perfectly support the prediction based on a gradient view of contrast, they clearly do not support a purely binary view of contrast where a relationship can be considered contrastive as long as one criterion for contrast is satisfied (such as lexical distinction). For the purely binary view to have been supported, there should have been no difference between High and Mid Contrast conditions, regardless of acoustic differences between the vowels. For example, in terms of similarity ratings inExperiment2,ifthebinaryviewofcontrastheld,theHighandMidContrastvowels shouldhavebeenperceivedasequallydifferentorsimilarascomparedtotheallophonic Lowvowels.However,resultsshowedthatthethreevowelpairswereclassifiedindistinct rangesofsimilarity,withHighContrastvowelsbeingperceivedasmoredifferentfrom one another than Mid Contrast vowels, despite the fact that both pairs are considered contrastive under a binary view.
The results corroborate previous literature regarding purely allophonic and contrastive relationships:phonesinanallophonicrelationshiparemoredifficulttoperceivethanthose inacontrastiverelationship(e.g. Dupouxetal.1997;Boomershineetal.2008;Johnson &Babel2010).Moreover,thecurrentstudyprovidesnewdatasupportingthehypothesis that there are phonological relationships between these two extremes. How can these findingsbeincorporatedintocurrenttheoreticalframeworksusedtodefineanddescribe phonologicalrelationships?Classifyingsegmentsascontrastiveornotcaninfluencehow a phonological analysis proceeds. When segments contrast in some contexts and not in others, this can create disagreement about whether those segments should be included in anunderlyingphonemicinventory(describedinLarson-Hall2004).Determiningtheset ofunderlyingphonemesinaninventoryisoftenafirststeptodeterminingwhatfeatures are active in a language's phonological processes, and so this can impact how feature sets andspecificationsaredeterminedaswell,whicharecriticalelementsinanyanalysisof speech patterns. Cohn (2006) explores various aspects of gradient phonology and suggests that often the grey areas of determining what is phonological in a language are due to difficultiesindrawingalinebetweenthetraditionalgenerativistmodulesofphonetics andphonology.Forexample,lengtheningofvowelsbeforevoicedconsonantsinEnglish is systematic, but it is unclear whether a length distinction between vowels has been phonologizedorifthislengtheningismoreproperlythedomainofphonetics.Cohnargues that whether there needs to be a line drawn between phonetics and phonology should be anempiricalquestion,determinedbywhichapproachprovidesthebestfitfortherange of more categorical to more gradient phenomena.
Indeed, a modular view of phonology and phonetics, as well as a modular view of contrast and allophony, is inadequate in describing phenomena which fall between one and the other (see Hall 2013 for an extensive list of authors that use terminology such as "quasi-phonemic" and "mushy contrast"). Hall's PPRM focuses on the factor of predictability of distribution to quantify the continuum of phonological relationships, measured asentropy(alsoseeHall2015).WhileHall's(2009)studydidnotyielddefinitiveresults, the idea of quantifying phonological relationships was extended in this paper to the measure of functional load, in addition to that of predictability of distribution, and evidence of phonological relationships between contrastive and allophonic was found. However, since thetwomeasuresdidnotofferdifferentpredictionsfromoneanother,ourresultscannot serve to distinguish between these two measures as one being a greater predictor of results over the other, or as a stronger measure of contrast. Further research is needed to determine whether these two factors are too highly correlated to be distinguishable from one another, or whether they can be isolated as independent factors. It may also be that if sound pairs are too close to one another in their measures -which is to say, too close in strengthofcontrast-nosignificantdifferenceswillbefound.
Asthisisoneofthefirstexperimentalstudiestotestgradientlevelsofcontrastbased onspecificmeasures,itprovidesareferencefromwhichdifferentlanguagesandexperimental paradigms can be compared. The testing of contrast should not stop at the two ends of the scale of allophony and contrast, and these two ends of the scale cannot be takenasrepresentativeofallpossiblephonologicalrelationships.Basedonthepresent study, it should be possible to apply the same measures to segments that occur in other languagesandarriveatcomparableresults.Onewouldpredictthatspeakersofanother language would yield results that represent the lexical distinction and predictability of distributions between segments in their own language. For example, speakers of French fromotherdialectsandforwhom[y]and[ʏ]arenotinallophonicrelationshipshould yield different results from Laurentian French speakers. Applying this methodology in reverse, it may have the potential to be used as a diagnostic for phonological relationships.Onelimitationofthecurrentstudyisthatitonlyexaminesprocessingofvowels. Ithasbeenarguedthatconsonantsandvowelsmaybeprocesseddifferently,andthat consonants play a greater role than vowels with regards to lexical processing (Nespor etal.2003;Havyetal.2014).Thus,onemightfindlessevidenceforgradientcontrast when examining the processing of consonants, given their preferential status in lexical processing. In addition, although no evidence was found of a direct correlation between acousticdifferencesandresults,itwouldbeidealforfuturestudiestoteasetheseapart, using stimuli that are equally acoustically different and of different phonological relationships. Unfortunately, many previous studies do not include measurements of acoustic differencesbetweenstimuli.
In summary, this work provides experimental evidence for what is being more frequently acknowledged in the theoretical literature, namely that there are phonological relationships that fall between purely allophonic or purely contrastive. An all-or-nothing view hasprovenproblematicinanalyseswheresomecriteriaforcontrastaresatisfiedwhile othercriteriaarenot,orwhereonecriterionispartiallysatisfied.Theresultingambiguitiesinphonologicalstatusmayberesolvedbyusingquantifiablemeasuresforthecriteria traditionally used to evaluate phonological relationships. In doing so, we may better represent the range of relationships between categories of speech sounds and further our understanding of sound patterns in human language.