Marginal contrasts and the Contrastivist Hypothesis

The Contrastivist Hypothesis (CH; Hall 2007; Dresher 2009) holds that the only features that can be phonologically active in any language are those that serve to distinguish phonemes, which presupposes that phonemic status is categorical. Many researchers, however, demonstrate the existence of gradient relations. For instance, Hall (2009) quantifies these using the informationtheoretic measure of entropy (unpredictability of distribution) and shows that a pair of sounds may have an entropy between 0 (totally predictable) and 1 (totally unpredictable). We argue that the existence of such intermediate degrees of contrastiveness does not make the CH untenable, but rather offers insight into contrastive hierarchies. The existence of a continuum does not preclude categorical distinctions: a categorical line can be drawn between zero entropy (entirely predictable, and thus by the CH phonologically inactive) and non-zero entropy (at least partially contrastive, and thus potentially phonologically active). But this does not mean that intermediate degrees of surface contrastiveness are entirely irrelevant to the CH; rather, we argue, they can shed light on how deeply ingrained a phonemic distinction is in the phonological system. As an example, we provide a case study from Pulaar [ATR] harmony, which has previously been claimed to be problematic for the CH.


Introduction
The Contrastivist Hypothesis (CH ;Hall 2007;Dresher 2009) holds that the only features that can be phonologically active in any language are those that serve to distinguish the members of the underlying inventory-i.e., the phonemes-of that language from one another. Many phonological processes appear to ignore non-contrastive features (see, for example, Kiparsky 1985;Calabrese 2005;and Dresher 2009, among many others); the CH, in its strongest version, predicts that this should always be the case. In this context, the Successive Division Algorithm (SDA; Dresher et al. 1994;Dresher 2009) has been proposed as a means specifying contrastive (and hence potentially active) features. In essence, the SDA starts with all sounds assumed to be allophones of a single phoneme, and assigns contrastive feature specifications only when there is evidence that phones need to be distinguished underlyingly. In order to implement the SDA, then, one must be able to unambiguously determine what is a (separate) contrastive phoneme and what is not. This method thus presupposes that phonemic status is categorical.
A number of researchers, however, have pointed out the existence of phones in a language that are not easily categorized as either phonemic or allophonic (e.g., Gleason 1961;Crothers 1978;Goldsmith 1995;Hill 1998;Hualde 2005;Ladd 2006;Scobbie & Stuart-Smith 2008;Kager 2008;Hall 2009;Bye 2009;Ferragne et al. 2011;Boulenger et al. 2011). Hall (2013b) provides a comprehensive overview of such intermediate relationships, and in particular provides a typology illustrating the many different ways in which contrasts can be 'marginal.' In this paper, we explore the question of whether the existence of such intermediate degrees of contrastiveness makes the Contrastivist Hypothesis untenable, or even meaningless. We propose that it does not, and that furthermore, marginal contrasts may offer insights into how contrastive hierarchies determined by the SDA change diachronically, and what it takes for learners to acquire them. In particular, we examine the case of Pulaar [±ATR] mid vowels, which have been claimed to be problematic for the CH (Archangeli & Pulleyblank 1994;Campos Astorkiza 2007), and show that while there is good evidence for the contrast between [+ATR] and [−ATR] to be marginal, it should nonetheless be considered a contrast, and is therefore not a counter-example to the CH.
In Section 2 we provide background, first on the Contrastivist Hypothesis and the Successive Division Algorithm, and then on marginal contrasts and how to measure them. Section 3 discusses the differences in how contrastiveness is treated by the SDA and by Hall's (2009) entropy-based approach to marginal contrasts. In Section 4, we show what each of these two theories of contrast has to say about Pulaar; Section 5 concludes.

The Contrastivist Hypothesis and the Successive Division Algorithm
The impetus behind the Contrastivist Hypothesis, and behind contrast-based theories of underspecification more generally, is the observation that redundant feature values often appear to be ignored by phonological patterns. For example, voicing is contrastive on obstruents but predictable on sonorants in Japanese (as in many other languages), and Japanese rendaku, which voices the initial consonant of the second member of a compound, is blocked by the presence of another voiced obstruent, but not by sonorants (Itô & Mester 1986). Accordingly, many phonologists have proposed that redundant features are omitted from some or all of the phonological computation, or that contrastive features have some special status (e.g., Kiparsky 1982a;Archangeli 1984;Itô & Mester 1986;Clements 1987;Pulleyblank 1988;Avery & Rice 1989;Piggott 1992;Dresher et al. 1994;Itô et al. 1995;Calabrese 1995;2005;Dyck 1995;Ghini 2001;Hall 2007;2011b;Dresher 2009;2015;Mackenzie 2009;2013;Nevins 2010;2015;Iosad 2012;Oxford 2015); see Hall (2011a; forthcoming) for a concise overview and Dresher (2009) for a more detailed one. Hall (2007) formulates the Contrastivist Hypothesis as in (1): (1) Contrastivist Hypothesis (Hall 2007: 20) The phonological component of a language L operates only on those features which are necessary to distinguish the phonemes of L from one another. This is arguably the simplest and strongest possible hypothesis that assigns a special role to contrastive features. At a minimum, the phonological features included in lexical representations must suffice to distinguish all underlyingly contrastive segments. The hypothesis in (1) holds that these are the only features that can be phonologically active, not only in underlying representations, but also in the phonological computation. 1 In other approaches, redundant features may be filled in during the course of the derivation and thereby affect the application of subsequent rules (e.g., Archangeli 1985;, or contrastive and redundant features may both be present from the start but be treated differently by the grammar (e.g., Calabrese 1995;2005).
Any theory along these lines requires a means of identifying which features are contrastive. One intuitive approach to this works analogously to the process of identifying contrasting phonemes by finding minimal pairs of words. First, all phonemes are fully specified; then, in every pair of phonemes that differ by only a single feature value, that feature is identified as contrastive on those phonemes. However, as Archangeli (1988) and Dresher (2009) have pointed out, the features identified as contrastive by this procedure do not always suffice to distinguish all the phonemes in the inventory. For example, consider the inventory /i ɛ a ɔ u/, which  gives as the underlying vowel system of Pulaar. Full specifications of these vowels for five commonly used binary features are shown in (2): (2) i ɛ a ɔ u high In (2), any two vowels differ by at least two feature specifications (e.g., /i/ and /ɛ/ are distinguished by both [±high] and [±ATR]; /ɛ/ and /a/ are distinguished by both [±low] and [±back]; and so on). The minimal pairs test will therefore fail to designate any feature values as contrastive on any of these vowels. In a framework such as that of Calabrese (1995;2005), in which both contrastive and redundant features are specified, but some rules apply only to contrastive features, this is not necessarily problematic; it simply means that all rules in languages with inventories like the one in (2) must be able to see 'redundant' features. However, in theories that posit that redundant features are systematically absent from at least some levels of phonological representation, the minimal-pairs test is untenable; if all features that are non-contrastive according to this test are omitted, then it will not always be possible to differentiate all (or even any) of the underlying phonemes.
Accordingly, Dresher et al. (1994) propose an alternative procedure, inspired by Cherry et al. (1953), which Dresher (2009) gives in a more general form as the Successive Division Algorithm (SDA): (3) Successive Division Algorithm (Dresher 2009: 16) a. Begin with no feature specifications: assume all sounds are allophones of a single undifferentiated phoneme. b. If the set is found to consist of more than one contrasting member, select a feature and divide the set into as many subsets as the feature allows for. c. Repeat step (b) in each subset: keep dividing up the inventory into sets, applying successive features in turn, until every set has only one member.
Unlike the minimal pairs test, the SDA will necessarily produce specifications that are sufficient to differentiate any phonemic inventory, because it does not terminate until all phonemes have been distinguished. Because it proceeds by adding specifications rather than by pruning them, it does not require a universal set of features to determine what the full specification for each phoneme would be, as the minimal pairs test does; it is thus equally compatible with universal feature theories (such as those of Jakobson et al. 1952;Chomsky & Halle 1968;or Clements & Hume 1995) or emergent features (as proposed by Mielke 2008). The primary purpose of the SDA is to provide a formal definition of contrastiveness, and thus to make predictions about which features can be phonologically active in any given system. However, it is also framed as a procedure that could be part of the acquisition process, with learners assigning features as they discover the phonemic contrasts that define the underlying inventory. A complete learning algorithm based on the SDA would need to elaborate what it takes to identify the presence of a phonemic contrast and how the learner selects the features to assign, and it would also need to take into account how the acquisition of the phoneme inventory interacts with the acquisition of the system of rules or constraints operating on that inventory. (See Dresher (2009: §7.8) for discussion of the role of contrastive hierarchies in theories of acquisition.) The SDA relies on a hierarchical ordering of features, capturing the intuition that features are contrastive or redundant not in isolation, but with reference to the other features whose values have already been determined. For example, in the inventory in (2), [±ATR] is predictable on any vowel whose value for [±high] is known, and [±high] is predictable on any vowel already specified for [±ATR]. For the minimal pairs test, this means that neither [±high] nor [±ATR] will be identified as contrastive on any vowel in this inventory, but the SDA resolves the problem by requiring that features be hierarchically ordered. Because the SDA itself does not stipulate any particular ordering of features, it allows for the possibility that similar inventories may have different contrastive feature specifications in different languages. 2 For example, Yoruba has a phonemic seven-vowel inventory /i e ɛ a ɔ o u/. In Ifẹ Yoruba, the (predictably tense) high vowels are ignored by tongue root harmony, while in Standard Yoruba, a high vowel requires a preceding mid vowel to be tense (Nevins 2010;Dresher 2013). Dresher (2013) [±RTR] is contrastive only on the mid vowels /e o ɛ ɔ/. For Dresher, who uses the SDA, contrastiveness depends not on the inventory alone, but also on the language-specific hierarchical ordering of the features. 4 In this approach, there is greater room for variability in feature specifications, but no feature is assigned unless it serves to mark a phonemic contrast, and the total number of features predicted to be potentially phonologically active by the Contrastivist Hypothesis is constrained by the phonemic inventory.

Marginal contrasts and entropy
The Successive Division Algorithm assigns contrastive features with reference to phonemic inventories, and thus presupposes that there is some reliable way of determining which segments need to be distinguished at the underlying level of representation-i.e., which segments are phonemic. At the same time, it has been widely noted that segments are not always clearly and categorically either in contrast with one another or in a relation of allophony; rather, there appears to be a continuum of intermediate possibilities (e.g., Gleason 1961;Crothers 1978;Goldsmith 1995;Hill 1998;Hualde 2005;Ladd 2006;Kager 2008;Scobbie & Stuart-Smith 2008;Bye 2009;Hall 2009;Boulenger et al. 2011;Ferragne et al. 2011).
There can be many reasons for relationships to be intermediate, as Hall (2013b) outlines. One particularly prevalent case is based on intermediate degrees of predictability of distribution (see, e.g., Hall 2009). In many cases, the phonological relation between two sounds can be determined by the extent to which they are predictably distributed in the language, with allophony defined as occurring if the sounds are complementarily distributed, and contrast occurring otherwise. 5 Trubetzkoy ([1939] 1969: 239) predicted, however, that partial predictability of distribution might also be important, noting that neutralizing a contrast can lead to a reduction in its 'distinctive force.' The psychological validity of this claim has been empirically verified by, e.g., Huang (2001) and Hume & Johnson (2003), who found that tone pairs in Mandarin that neutralize in certain contexts are perceived as being more similar to each other than non-neutralizing pairs, to an extent beyond what might be expected due to acoustic characteristics. Hall (2009; proposes that there is in fact a continuum of possible relations that hold between two phones, based on the degree to which they are predictably distributed. This continuum can be measured using the information-theoretic quantity of entropy (Shannon & Weaver 1949), a measure of uncertainty. Given a relation between exactly two sounds, entropy will range from 0 to 1, measured in bits. An entropy of 0 bits indicates a complete absence of uncertainty, i.e., perfect predictability of distribution, and is analogous to perfect allophony. An entropy of 1 bit, on the other hand, indicates total uncertainty, i.e., complete unpredictability in all environments and hence total contrast. This is illustrated in Figure 1, where black triangles schematically represent individual phones, and circles indicate the set of environments each phone can occur in. At the left-hand side of the continuum, the circles are entirely non-overlapping; given a particular environment, there is no uncertainty as to which sound would occur, and so the entropy is 0. At the right-hand side of the continuum, the circles are entirely overlapping; both phones occur (with equal frequency) in all the same environments, and so there is perfect uncertainty about which might occur in any given environment-an entropy of 1. The two phones will have an entropy between 0 and 1 when they have partially overlapping distributions. As hinted at above, the frequency of occurrence of the phones also matters: if one phone is significantly more or less frequent than the other, that will decrease the uncertainty about which one will occur in a given environment. Thus, even if the two phones have entirely overlapping distributions, if one occurs more frequently than the other in those environments, the entropy of the pair will still be less than 1. This reflects, for example, the claim in Goldsmith (1995) that asymmetries in frequency of occurrence are relevant for understanding what he terms a 'cline' of contrast; see also discussion in Hall (2009).
The entropy of a pair of phones in a language (their systemic entropy) is calculated as in (5), as proposed by Hall (2009) and implemented in the Phonological CorpusTools software (Hall et al. 2015a). First, the entropy in each individual environment in which at least one of the two phones occurs is calculated, using the formula in (5a). 6 Here, the probability of each phone's occurring in that environment is multiplied by the log of that same probability, and the products for both phones are summed. To calculate the systemic entropy, then, the entropy of each environment, H(e), is multiplied by the probability of that environment's occurring, p(e), and these products are summed for all environments. The probabilities of the environments are calculated as in (5b), where N e is the number of occurrences of a given environment and ƩN e∈E is the number of occurrences of all environments in which at least one of the phones occurs.  are subject to the Loi de position (Grammont 1913;Walker 2001;Féry 2003;Durand & Lyche 2004), such that the tense vowels tend to occur in open syllables while the lax vowels tend to occur in closed syllables. The vowels are generally still considered contrastive, though, because there are certain environments in which both vowels can occur. Note that this same formula for calculating the degree of contrastiveness can be applied to different types of data. At one extreme, it can be applied to a corpus of running speech in a language, such that the probability of occurrence of an individual phone in a given environment is based on its token frequency in that environment in the corpus. The token frequency of the sound [e] occurring in word-final position in the Lexique corpus of French film subtitles (New et al. 2004 ; the probability of [ɛ] is 0.32; and the entropy of [e, ɛ] in this environment is 0.9 bits. This number is relatively high (close to the maximum of 1), but not at 1; this suggests that while these sounds fall slightly short of complete unpredictability, they are more contrastive than not.
Alternatively, one could determine the probabilities of occurrence based on a lexicon of the language rather than a corpus, such that each word containing a phone in a particular environment is counted only once; i.e., using type frequency. The type frequency of word-final [e] in Lexique is 31,601, while that of [ɛ] is 15,981. The probabilities of each phone are thus 0.66 and 0.34, respectively, and the entropy in this environment is 0.92 bits. In this case, the choice between type and token frequency doesn't make much difference, but in cases where these two measurements are different, the entropy values will obviously follow suit.
A third way of applying this formula, however, would be in terms of type occurrences rather than type frequencies; that is, if there is at least one occurrence of a given phone in a given environment, it would be counted as having a type occurrence of 1; if there is not at least one occurrence, its type occurrence would be 0. In this case, both word-final [e] and word-final [ɛ] have type occurrences of 1 in Lexique; their probabilities would each then be 0.5, and the entropy would be that of perfect contrastivity, i.e., 1 bit. This third way of applying the entropy formula is of course most similar to traditional approaches to contrast; it allows for entropy values of only 0 and 1, i.e., allophony and contrast, with no intermediate relations possible. Under such an application, of course, the notion of 'marginal' contrasts-at least, those based on predictability of distribution-disappears, and what appears to be a gradient mathematical formula gives rise to binary, categorical distinctions.
In addition to being able to apply this formula to different kinds of frequency measures, one can, of course, apply it to different levels of representation. In the examples above, token frequency, type frequency, and type occurrences were counted at the level of surface representations, but it would be possible to count any of them on the basis of underlying representations instead, if one had independent evidence for such representations. Thus, the formula itself makes no inherent claim about what level of representation is relevant; it simply provides a means for objectively quantifying the extent to which two phones are predictably distributed in a language, given some measure of probability.
Recall, however, that predictability of distribution is not the only reason two phones might be considered to be intermediate between contrast and allophony. Other reasons include (1) cases where the sounds are predictably distributed, but where crucial reference is made to non-phonological information, such as morphological boundaries (e.g., the marginal contrast between [ai] and [ʌi] in tied vs. tide in Scottish English; see, e.g., Scobbie & Stuart-Smith 2008); (2) cases where one or both of the sounds is foreign, specialized, or belongs to some distinct stratum of the language (e.g., the marginally contrastive status of [x] in English due to words like Bach and loch; see, e.g., Ladd 2006); or (3) cases where the sounds are highly variable in their phonetic realization (e.g., the marginal contrast between [e] and [ɛ] in Italian; see Ladd 2006).
Thus, as Steriade (2007: 140) points out, "the very existence of a clear cut between contrastive and non-contrastive categories" is "called into question." If this is true, then what is to become of theories, such as the Contrastivist Hypothesis and the SDA, that crucially rely on such a clear cut being possible? As we will argue in the next section, the answer lies in understanding what kind of binary distinction the CH actually assumes, and at what level of representation.

What counts as contrastive, and where
As mentioned in §2.2, the use of a continuous quantification metric does not preclude the possibility of categorical distinctions. In the case of using entropy to define a continuum of contrastiveness, there is an obvious line to be drawn between pairs of phones with zero entropy and pairs of phones with non-zero entropy: if two phones are unpredictable in at least some contexts, then they will have an entropy greater than zero, and the system of phonological representations must have some means of distinguishing them. This is analogous to the case of applying the entropy formula to type occurrences; rather than counting the frequency of occurrence, one simply determines whether the phones occur at all in each environment.
In a theory of phonology that adopts the Contrastivist Hypothesis and the Successive Division Algorithm, the most immediately relevant form of contrastiveness is not the gradient degree to which two sounds contrast at the surface, but the categorical question of whether the phonological system needs to distinguish them in lexical representations. That is, if a phone is at least marginally contrastive-i.e., if there are at least some contexts in which it cannot be predictably derived from other information independently known to be present in the representation-then it must be treated as a separate phoneme for the purposes of the SDA. To take an extreme example for the sake of clarity, if a language adopts a single loanword containing a potentially contrastive new phone, then the novel segment must either be treated as an exception (and thus not integrated into the regular phonological system) or be regarded as phonemic and included in the SDA.
As an approximation, then, one could say that the SDA assigns distinct featural representations to any pair of phones with non-zero entropy. But this is only an approximation. Measuring the entropy of phones at the surface level gives an index of their functional contrastiveness in an information-theoretic sense, 7 but the SDA is concerned with formal, not functional, contrast, and at the level of underlying (lexical) representations, not surface forms. The set of phonemes to be distinguished by the SDA can be identified only by analysis of the phonological system, and is thus not trivially calculable from surface data. 8 In cases of absolute neutralization, two phonemes that appear identical on the surface may need to be distinguished underlyingly because they exhibit different phonological patterning. For instance, 'strong i' and 'weak i' in some Inuit dialects are both realized phonetically as [i], but one triggers palatalization and the other does not (Archangeli & Pulleyblank 1994;Compton & Dresher 2011). 9 Conversely, some surface contrasts between phones can be analyzed as arising from differences in underlying structure that are not segmental. For example, European Portuguese has a marginal surface contrast between [a] and [ɐ], as in the minimal pair in (6)  However, Spahr (2016) argues that there is no underlying phonemic contrast between these two vowels; rather, the surface difference between (6a) and (6b) is derived from a morphological contrast in the presence or absence of a perfect suffix consisting of an empty V slot, as in (7). In (7b), the theme vowel /a/ at the end of the verb stem spreads to fill this empty position. Singly linked /a/ raises to [ɐ] Harris (1990;1994), Carr (2008), Bye (2009), andHall (2013b). For the purposes of the SDA, then, contrastiveness is a categorical property, and one that depends on the analysis of the underlying phonological system. This does not mean, however, that the study of gradient surface contrastiveness as measured by type frequency or token frequency is untenable or meaningless. As mentioned above, there is empirical evidence that intermediate degrees of predictability affect the perception of phones (e.g., Huang 2001;Hume & Johnson 2003;Hall 2009;Hall & Hume 2015). Quantifying predictability of distribution can also be useful in tracking and interpreting phonological change (e.g., Hall 2013a). Furthermore, even within the context of the CH, the existence of a categorical split into 'contrastive' (assigned by the SDA, and therefore potentially phonologically active) and 'predictable' (and thus not available in the phonological computation) does not preclude further study into gradient contrastiveness and its possible relation to differing degrees of phonological activity. In the context of the SDA, a contrast might be considered 'marginal' if it is relatively low in the contrastive hierarchy, particularly if it appears on only a small number of branches. Contrasts with low scope might, in turn, be expected to be diachronically unstable-either emerging or disappearing (Dresher et al. 2014;Oxford 2015)-and synchronically more susceptible to neutralization (Spahr 2014). Similarly, but potentially independently, contrasts with low entropy may offer less evidence to the learner, and consequently also be diachronically precarious. Although the hierarchical and entropy-based notions of marginality will not necessarily always align, the study of marginal contrasts and the study of the Contrastivist Hypothesis can be entirely symbiotic.

Pulaar [ATR] harmony patterns
Pulaar ATR harmony provides an example of the importance of marginal contrasts for the Contrastivist Hypothesis (see also Hall 2000;2007). As mentioned in §2.1, the underlying vowel inventory according to  is as in (8a) is not the feature that distinguishes /i u/ from /ɛ ɔ/, as in (12b), where [±high] is used instead, then [±ATR] has no contrastive role in the system; in that case, ATR harmony would entail that a non-contrastive feature is phonologically active, contra the Contrastivist Hypothesis.
Indeed, the Pulaar facts have been cited in arguing against contrastive underspecification, although not with specific reference to a contrastive hierarchy. Archangeli & Pulleyblank (1994: 134) say that "although completely predictable, [ATR] values play an active role in the phonology of Pulaar," and Campos Astorkiza (2007: 194) claims that "[a]ccording to underspecification theories, the [ATR] value for high and low vowels should not play an active role in the phonology of the language."

A solution
This apparent paradox can be solved, however, by recognizing that [e o] not only occur on the surface, but are marginally contrastive. According to Paradis (1992: 90), there are three morphemes that counterexemplify the surface generalization that [+ATR] mid vowels occur only to the left of high vowels; these are given in (13).
[fof] 'all' b. [-(ɡ)el] diminutive singular c. [-(ɡ)ol] noun class marker Two of these, [-(ɡ)ol] and [-(ɡ)el], are suffixes and therefore appear in a variety of longer words; they then trigger harmony in the stems that precede them, as illustrated in (14) and (15) Paradis (1992) avoids positing underlying /e o/ by analyzing these morphemes as /fɔuf/, /-ɡɛil/, and /-ɡɔul/, with the high vowels underlyingly lacking any association to a timing slot. The floating high vowels trigger harmony, but are not themselves pronounced. Archangeli & Pulleyblank (1994: 192) (16), harmony can be unproblematically represented as the leftward spreading of [+ATR] from one non-low vowel to the next. 12 But if /e o/ are indeed phonemic, then they might be expected to appear in more than three morphemes. A reexamination of the Pulaar system in §4.3 will provide evidence that this is indeed the case. Given that other related languages such as Wolof, Kisi, and Diola-Fogny have a more robust [ATR] contrast (Casali 2003), Pulaar as described by Paradis might represent a diachronic change from an earlier stage in which [+ATR] mid vowels were more prevalent. The Contrastivist Hypothesis predicts that if the marginal underlying contrast between /e o/ and /ɛ ɔ/ is lost, the harmony pattern could not remain productive in its present form. A learner acquiring Pulaar phonology must acquire the feature [ATR] in order to generate the harmony pattern. According to the SDA, a feature can be assigned only if it serves to mark some contrast in the underlying inventory. In the hierarchy in (16), [ATR] serves to distinguish /ɛ ɔ/ not only from /e o/, but also from /i u/. If the acquisition of contrasts proceeds monotonically from the root of the contrastive hierarchy to its leaves, with no backtracking, and if a child acquires harmony before identifying /e o/ as distinct underlying segments-that is, before making the lowest of the three divisions in (16)then we might expect the child to pass through a stage in which harmony raises /ɛ ɔ/ to [i u]. On the other hand, if phonemes are not acquired in perfect hierarchical order, or if the full inventory is acquired before the harmony rule, then no such stage is predicted. Note that the hierarchical rank of contrasts in (16) does not directly reflect their degree of prominence or marginality: while the marginal status of /e o/ relates to their relation with /ɛ ɔ/, (16) groups them more closely with /i u/, suggesting that the acquisition path may not be wholly straightforward.
In any case, the complete set of underlying phonemes must be learned in order to arrive at the adult grammar's harmony patterns. If there are indeed only three morphemes that provide evidence for the phonemic status of /e o/, harmony would seem to rest on startlingly unsteady ground-as if, for example, there were some phonological rule of English that depended on the learner's exposure to Bach and loch, though in the Pulaar case the crucial morphemes are affixes that can combine with multiple stems, rather than lowfrequency free roots. In §4.3, then, we turn to entropy as a means of quantifying the degree to which /e o/ contrast with /ɛ ɔ/ on the surface, and thus as a potential approximation of the robustness of the evidence available to the learner.

Quantifying the contrastiveness of [ATR] in Pulaar
As described in §2.2, the extent to which two phones are contrastive can be quantified using entropy, with an entropy of 0 indicating perfect predictability and an entropy of 1 indicating perfect unpredictability. To get a sense of the extent to which [ATR] distinctions might actually be contrastive in Pulaar, at least on the surface, the formulae in (5) were applied to the lexicon as represented by Niang (1997), a Pulaar-English and English-Pulaar dictionary based on "the Pulaar dialect spoken essentially in Mauritania, Senegal, and The Gambia" (Niang 1997: x), giving us a measure of type frequency. 13 All distinct forms appearing as headwords in the Pulaar-English section, a total of 6332 words, were taken as input. The entropy measure was calculated using Phonological Cor-pusTools (Hall et al. 2015a).
Interestingly, in Niang's orthography, [+ATR] mid vowels are represented as 〈é, ó〉, and [−ATR] ones as 〈e, o〉. The fact that Niang represents this distinction at all is itself suggestive, as entirely predictable variations are typically excluded from orthographic representations.
In order to calculate entropy, one must decide which environments are relevant. This is not a trivial decision, as the selection of environments can relatively drastically change the results, as shown in Hall (2012), where environments are chosen both to emphasize and minimize the apparent contrasts between vowels in Canadian English. In the limit, one can always simply assume that there are no conditioning environments and calculate entropy based on the relative frequencies of the sounds in question across all words in the lexicon (or tokens in the corpus). This move actively ignores the entire enterprise of phonological analysis by ignoring the visible patterns of conditioning that are at the heart of phonological rules or constraints. Using a mathematical formula does not make the analysis of a phonological pattern deterministic; one still must bring active meta-knowledge of the phonological patterns being analyzed to bear. What is crucial for the application of the formula is that (1) whatever environments are chosen, they exhaustively cover all the individual tokens of the sounds being analyzed and (2) they do so in a unique way, i.e., not double-counting any such tokens. For example, it would be meaningless to include both the environments 'word-initial before a vowel' and 'word-initial before [i]' because these environments overlap and would double-count any segment occurring wordinitially before [i]. Thus, in order to calculate a sensible number, the environments must be exhaustive and unique, but they still must be selected by the analyst in a transparent way (see more discussion in Hall 2009).
For the calculation of the entropy between [+ATR] mid vowels on the one hand and [−ATR] mid vowels on the other, three mutually exclusive and comprehensive environments were chosen. These environments were picked so as to maximize the possibility of perfect allophony, such that the expected entropy in each environment is 0 given the typical descriptions of Pulaar vowel harmony in the literature. The three environments were: (1) before [+ATR] vowels (where only [+ATR] mid vowels are predicted to occur under a harmony account); (2) before [−ATR] vowels (where only [−ATR] mid vowels should occur); and (3) where the vowel is the final vowel in the word (where again, only [−ATR] mid vowels are predicted to occur). A non-zero entropy in any one of these three environments would indicate at least a partial surface contrast in that environment.
As it turned out, all three environments had a non-zero entropy. Specific numbers will be reported shortly. However, this meant that there were a number of words that were 'surprising' from the perspective of a harmony account: there were 401 words in which a [+ATR] mid vowel occurred without a [+ATR] trigger to its right (or with such a trigger, but only to the right of an /a/), and there were 101 words in which a [−ATR] mid vowel occurred even though there was a [+ATR] trigger available to its right. Before continuing, it was important to verify that these were in fact real instances of non-harmonic [ATR] mid vowels rather than, say, typos in the dictionary. The 101 cases of [−ATR] vowels before a [+ATR] trigger, in particular, seemed likely candidates for typographical errors, as they could potentially be attributed to a simple lack of an acute accent mark on the first vowel. Therefore, all of these words were checked by a native speaker of Pulaar from Senegal. 14 Although not a trained linguist, this speaker had studied Pulaar at university and spontaneously used the terms "open" and "closed" to identify the [−ATR] and [+ATR] mid vowels, respectively; he had very little trouble in identifying which vowels occurred in any given word. While several typos were indeed discovered, there were still 397 words containing a [+ATR] mid vowel without an available [+ATR] trigger, and 88 words containing a [−ATR] mid vowel despite the presence of an available [+ATR] trigger. All entropy calculations given in the text are based on these verified words. 15 Table 1 shows the total numbers of occurrences of the [±ATR] mid vowels in each of the three contexts: before [+ATR] vowels, before [−ATR] vowels, and word-finally. Numbers in bold are ones for which a non-zero value indicates a deviation from the expected pattern-viz., mid vowels followed by another vowel with the opposite value for [±ATR], or [+ATR] mid vowels in the final syllable of the word. Table 2 shows the entropy calculations in the three environments for [±ATR] mid vowels, along with the systemic entropy (the weighted average entropy across the three environments). As can be seen, the entropy is non-zero in all three environments, and the overall weighted average entropy is 0.27. This is clearly in the realm of being contrastive, i.e., it is not 0, but it is also clearly a somewhat 'marginal' contrast, in that it has a relatively low entropy value considering the scale of 0-1.
To make this marginality clearer, consider the contrast between [±back] mid vowels, in the analogous conditioning environments of before [+back] vowels, before [−back] vowels, and word-finally. There has never been, as far as we know, any suggestion of vowel harmony or allophony in this domain. Thus, we would predict much higher entropy values in all three environments and systemically; Table 3 shows the actual calculations.
As predicted, the [±back] mid vowels are much more clearly contrastive across all three environments, and the weighted average entropy is 0.84.
Thus, there is clearly a difference between the degree of contrastiveness for [±ATR] mid vowels on the one hand and [±back] mid vowels on the other, with a tendency for the [±ATR] mid vowels to be harmonically distributed. At the same time, both are indeed squarely in the 'contrastive' range of the continuum. 16 As mentioned above, there are 397 words in the 6331-word lexicon that contain [+ATR] mid vowels that are not immediately followed by another [+ATR] vowel and are thus 'contrastively' (unpredictably) [+ATR]. While many of these do involve the suffixes in (13), they also include borrowings (largely from French) and a number of other apparently native words that are transcribed with [+ATR] mid vowels, some of which are 16 Note that at present, there is no mathematical way of (1) categorically classifying contrasts as 'marginal' or 'non-marginal,' or (2) calculating whether two entropy values are even statistically significantly different from each other. The former is a question of phonological analysis; as we argue here, for example, there really is not a clear case to be made for making such a distinction in the application of the SDA, for the purposes of which the relevant question is simply whether sounds do or do not need to have distinct representations underlyingly. Whether there is a specific role for 'marginal' contrasts as a unified category remains to be seen; other work illustrating the relevance of gradience in phonological relationships has largely focused instead on showing a correlation between degree of contrastiveness and degree of perceived similarity (e.g., Hall 2009;Hall & Hume 2015;Hall et al. 2015b    These contribute to the surface unpredictability of [±ATR] on mid vowels, and suggest that this contrast extends beyond the suffixes, borrowings, and handful of native words containing unexpected [+ATR] mid vowels. They also suggest that the harmony pattern is less than fully productive, at least in the sense that it appears to have exceptions. Interestingly, despite the clearly contrastive nature of [±ATR] mid vowels, the contrast has a nearly null functional load. There was exactly one minimal pair listed in Niang (1997), shown in (19) Paradis's (1992) floating-vowel analysis could still be tenable. The surface data demonstrate only that there is a phonologically necessary distinction in underlying representations; by themselves, they do not tell us whether that distinction is encoded paradigmatically, as an underlying [ATR] contrast among mid vowels, or syntagmatically, as a contrast between the presence and absence of a floating high vowel. Paradis (1992: 92) writes that "it is preferable to put the burden on a few underlying forms […] than to expand the whole phonological system." The larger the number of 'exceptional' forms, then, the less reason there is to treat the unpredictable [+ATR] mid vowels as anything other than underlyingly contrastive phonemes. To the extent that there is any empirical 17 Page references in (17), (18), and (19) are to Niang (1997). Niang lists source languages of loanwords where they are known; of the words in (17), only (17a) is marked as a borrowing. A linguistically informed native speaker of a related Fula dialect (A. Alkali, p.c.) confirms that (17b, c, d, f, g, h) are native words, but speculates that (17e) pulóók may not be, because the final plosive is phonotactically unexpected. Bah (2014) indicates that this form is specific to the Fouta Toro dialect, the more general word being bantara.
difference between positing marginal underlying /e o/ and positing floating high vowels, we might expect that there could be phonotactic principles or morpheme structure constraints restricting vowel sequences but not individual vowels, or that floating vowels might interact with other nearby vowels or glides. The forms in (17) are quite varied: some have long vowels, others short; some have a following glide /j/ or /w/, others not. If there are floating high vowels in the underlying representations of these words, it is not obvious that they have any consistent effect other than giving rise to the [±ATR] mid vowels.

Conclusion
In sum, we argue that there is no incompatibility between the Contrastivist Hypothesis and recognizing the existence of a continuum between pure contrast and pure allophony.
To some extent, these two approaches to phonological contrast are simply orthogonal.
The CH is about the information that may or must be present in underlying representations; Hall's (2009) entropy measure quantifies contrastiveness in terms of unpredictability of distribution at any level for which independent evidence is available for representations.
One might then conclude that the entropy measure could be applied within a framework embracing the CH only by using type occurrence measures on underlying representations, as this would result in the relevant categorical distinction between non-contrastive and contrastive segments. We argue, however, that this misses the potential for a deeper understanding of phonological phenomena. While there may not be a formal role for a distinction between fully contrastive and marginally contrastive phonemes under the CH, the distinction is certainly useful from an analytic perspective.
In the case of Pulaar, earlier researchers had noted the existence of morphemes containing [+ATR] mid vowels not followed by high vowels, but had discounted them as not reflective of an underlying [±ATR] contrast on these vowels. The Contrastivist Hypothesis, however, predicts that the Pulaar harmony pattern requires precisely such a contrast to exist. Both from the perspective of the learner and of the analyst, once a contrastive featural specification is posited, it becomes available to be used contrastively throughout the system, and might be expected to expand. For the learner, this might mean that loanwords containing underlying /e/ or /o/ are adoptable without adaptation, or that small acoustic differences are more easily interpreted as being a reflection of an actual vowel contrast (cf. Ohala 1981;. For the analyst, the realization that a feature is unpredictable in one instance should lead one to examine whether the distinction is in fact contrastive elsewhere in the lexicon. Calculating the entropy of the sounds provides a way of exploring exactly how robust the contrast is at the surface level. Our results also suggest how the measure of entropy can contribute to measuring the progress of changes in segmental inventories (see also Hall 2013a), though we do not yet have the historical data that would be needed to chart the diachronic course of the Pulaar vowels. Here, we went from the observation that the CH predicts a phonemic [±ATR] contrast to the verification of that contrast as being present but less robust than other contrasts in Pulaar. One could also go in the other direction, that is, from noting that there is a marginal surface contrast to exploring whether the feature specifications that might formally encode that contrast are in fact phonologically active.
It is of course important to keep in mind, though, that surface contrasts do not always correspond to underlying contrasts, as noted in Section 3. Even in the present case, as discussed in Section 4, we have no definitive proof that the distinction between [±ATR] vowels in Pulaar must be represented by adding underlying /e/ and /o/ to the inventory instead of something like Paradis's (1992) abstract high vowel analysis.
Under the strictest interpretation of the Contrastivist Hypothesis, any evidence that a feature is phonologically active-even in only a single form-implies that that feature should be categorically contrastive. Even in the contrastivist framework, though, isolated examples are more plausibly treated as exceptions rather than as reflective of fundamental properties of the system of representations; particularly from the perspective of learnability, the more robustly attested a contrast is, the more reasonable it is to posit that it is systematically encoded. It it therefore tremendously useful to have mathematical methods for quantifying gradient contrastiveness (Hall 2009) and software for applying those methods (Hall et al. 2015a), even in a theoretical framework that is primarily concerned with contrast as a categorical phenomenon (Hall 2007;Dresher 2009).