Zero and triviality

This paper takes issue with Bylinina & Nouwen’s (2018) hypothesis that the numeral zero has the basic weak meaning of ‘zero or more.’ We argue, on the basis of empirical observation and theoretical consideration, that this hypothesis implies that exhaustification can circumvent L-triviality, and that exhaustification cannot circumvent L-triviality. We also provide some experimental results to support our argument.


Exhaustification
The aim of this squib is to discuss some issues regarding the hypothesis about zero put forward by Bylinina & Nouwen (2018). These authors assume that numerals have the weak, 'at least' meaning as basic and the strengthened, 'exactly' meaning as derived by way of exhaustification, and propose that zero be considered no different from other numerals in this respect. 1 This is illustrated by (1a) and (1b).
(1) a. Zero students smoked ⟺ | ⟦students⟧ ⋂ ⟦smoked⟧ | ≥ 0 'the number of students who smoked is zero or greater' b. EXH [Zero students smoked] ⟺ | ⟦students⟧ ⋂ ⟦smoked⟧ | ≥ 0 ∧ | ⟦students⟧ ⋂ ⟦smoked⟧ | ≥ | 1 'the number of students who smoked is zero and not greater' Before we go on, let us be explicit about the notion of exhaustification which we will adopt for the purpose of this discussion. First, [EXH φ] is true iff φ is true and every excludable alternative of φ is false. Second, an alternative of φ is a sentence derivable from φ by replacing certain scalar items in φ with their scale mates. And third, ψ is an excludable alternative of φ iff ψ is an alternative of φ, ψ is stronger than φ, and φ ∧ ¬ψ does not entail any other alternative of φ. 2 These premises, together with the standard view that numerals form a scale (cf. Horn 1989), will derive (1b). 3 Coming back now to the sentences in (1), we can see not only that (1a) is entailed by (1b), but also, that (1a) is entailed by every sentence. In other words, (1a) is trivially true: the cardinality of the set of students who smoked, or in fact, of any set, is necessarily 0 or greater than 0. Given this consequence of their analysis, and given the assumption that trivial sentences are "semantically defective," Bylinina and Nouwen claim that zero, unlike other numerals, always requires semantic strengthening by exhaustification. 4 We will call this claim exhaustification.

Prevention
exhaustification may be said to receive supporting evidence from the deviance of (3) which, to the best of our knowledge, is a novel observation. (

3) #At least zero students smoked
Here is how the reasoning may go. It has been noticed that modification of numerals by at least prevents semantic strenghthening of the numeral (cf. Krifka 1999;Nouwen 2010;Kennedy 2015;Schwarz 2016;Buccola & Haida 2017). For example, (4a) cannot be read as (4b). 2 We say ψ is "stronger" than φ iff ψ entails φ but not vice versa. That excludable alternatives are stronger than the prejacent, i.e. the sister of EXH, is a standard Neo-Gricean view (cf. Horn 1989;Sauerland 2004) and covers most of the facts about scalar inferences. For arguments for replacing "stronger" with "nonweaker" see e.g. Spector (2006);Schlenker (2012). In this squib we stick to "stronger" as that will simplify the exposition and serves the purposes at hand. Our position, we believe, is consistent with what is said in Bylinina & Nouwen (2018), although these authors were not as explicit about the interpretation of [EXH φ] in their paper as we are here (cf. Bylinina & Nouwen 2018: 10). For more discussion on exhaustification, see Magri (2009) In this connection, we would also point out something else which our paper has in common with Bylinina & Nouwen (2018): both works appeal to alternatives, but neither work provides a theory of scales. It is common practice to propose accounts which are premised on specific claims about alternatives but which, at the same time, remain agnostic about how these claims are derived from deeper principles or, in fact, whether such principles exist at all. The validity of such accounts should remain even if it turns out that scales are just "given to us" as primitives of grammar (Gazdar 1979). There is a research program towards an intensional characterization of alternatives (cf. e.g. Matsumoto 1995;Katzir 2007;Fox & Katzir 2011;Katzir 2014;Trinh & Haida 2015;Breheny et al. 2017;Trinh 2018;2019). However, we believe that the evaluation of proposals which appeal to alternatives but do not take part in this research program, such as Bylinina & Nouwen (2018) and this paper, can be independent of how successful the program is. We thank an anonymous reviewer for raising our awareness of the need to be clear about this issue. 3 Here is how. As the scale mates of zero are one, two, three etc., the alternatives of zero students smoked will be one student smoked, two students smoked, three students smoked, etc. The conjunction of zero students smoked with the negation of any of these alternatives does not entail any other alternative. For example, zero students smoked ∧ ¬two student smoked does not entail one student smoked and does not entail three students smoked. Thus, all of these alternatives are excludable, which means that [EXH [zero students smoked]] is true iff zero students smoked and not one student smoked and not two students smoked etc., i.e. true iff the number of students who smoked is zero and not greater. 4 A semantic representation of zero students smoked which would more immediately reflect Bylinina and Nouwen's analysis is (i).
These authors assume that the linguistic ontology, i.e. D e , contains a "bottom element" ⊥ such that ⟦zero⟧ (⊥) = 1 and × ⟦α⟧(⊥) = 1 for each expression α of type ⟨e,t⟩. We prefer the equivalent but ontologically less controversial representation in (1), as the discussion below does not hinge on the existence of ⊥. Note, also, that in both representations, zero ends up being trivially downward and upward entailing in its NP as well as in its VP argument. This means that both representations are compatible with Bylinina and Nouwen's explanation for zero's inability to license NPIs.
(4) a. At least two students smoked b. At least two students smoked and it is not the case that at least three students smoked c. (4a) ⇎ (4b) Let us state the generalization, which we will call prevention. Although what is crucial for our discussion is that prevention holds, not how it is explained, here is one way this generalization can be accounted for. Suppose that the alternatives of a sentence containing at least are derived from it by replacing at least with its scale mates exactly and more than (cf. Kennedy 2015; Buccola & Haida 2017). This means the alternatives of φ = at least two students smoked are ψ = exactly two students smoked and χ = more than two students smoked. And because (φ ∧ ¬ ψ) ⟹ χ and (φ ∧ ¬ χ) ⟹ ψ, neither ψ nor χ is an excludable alternative of φ, which means that [EXH φ] ⟺ φ. Thus, modification of a numeral by at least makes EXH vacuous, thus preventing the numeral from being semantically strengthened by exhaustification. 5 Given that modification of a numeral by at least prevents semantic strengthening by exhaustification of the numeral, exhaustification, which says zero requires such strengthening, leads to the prediction that modification of zero by at least will result in deviance. This prediction is confirmed by the deviance of (3). Thus, the deviance of (3) constitutes supporting evidence for exhaustification. 6

Pragmaticism
Now, starting from the supposition that exhaustification is correct, a question that arises is (6).

(6)
Is semantic strengthening by exhaustification required by zero pragmatic or grammatical?
The answer given by Bylinina and Nouwen is that it is pragmatic. To quote from Bylinina & Nouwen (2018: 10): "Unlike other numerals, zero invokes exhaustification obligatorily. This is for purely pragmatic reasons." We will call this claim pragmaticism.
(7) pragmaticism Semantic strengthening by exhaustification is required by zero for purely pragmatic reasons Under the perspective of pragmaticism, one would say that the sentence in (8) cannot be parsed as (8a) because (8a) is trivially true, hence uninformative, hence pragmatically odd. The reason (8) is acceptable is because there is another parse, (8b), which is informative and thus pragmatically appropriate. We believe that pragmaticism can be challenged. If the deviance of (9), repeated in (10a), comes about by it being uninformative, then all sentences expressing the same meaning, and hence are equally uninformative, should be perceived as deviant in the same way. This is not the case, as evidenced by the acceptability of (10b), which is semantically equivalent to (10a).
(10) a. #At least zero students smoked b. Zero or more students smoked Both (10a) and (10b) are trivial and thus equally uninformative, but only (10a) is perceived as deviant. 8 This suggests that the contrast between these two sentences is grammatical in nature. The question now is (11).

L-triviality
To address this question, let us consider (12a) and (12b), which are possible parses of (10a) and (10b), respectively. 9,10 Note the underlining in (12). Its purpose is to distinguish between the logical expressions, which are underlined, and the non-logical expressions, which are not underlined. For the purpose of this squib, we take logical expressions to be those whose semantic contribution depends only on facts about language, and the non-logical expressions to be those whose semantic contribution depends on both facts about language and facts about the world. To illustrate, suppose we are to know whether (12b) is true in a given possible world w. 11 For EXH, zero, or and more than, we would have to know what they mean in English. For students and smoked, however, we would have to know not only what they mean in English, but also, who the students are in w and among them who smoked in w.
Given the classification of expressions into the logical and the non-logical, we now have a way to describe a crucial difference between (12a) and (12b) regarding the source of their triviality: (12a) is trivial by virtue of its logical vocabulary alone, while (12b) is trivial by virtue of both its logical and non-logical vocabulary. Here is what we mean in more concrete terms. Suppose we replace the word students in the second disjunct of (12b) with the word professors, the result, which is (13), is not trivial: (13) is true in case no students smoked or some professors smoked, and false in case some students smoked and no professors smoked.

(13) [[EXH [Zero students smoked]] or [more than zero professors smoked]]
In other words, had the non-logical part of (12b) been different, the sentence would not have been trivial. The same, however, does not hold for (12a): there is no way to make (12a) non-trivial by changing its non-logical part. It remains trivial no matter which noun replaces students and which verb replaces smoked.
Let us call sentences such as (12a) "L-trivial," borrowing from Gajewski (2009). Now, L-triviality has been argued to cause deviance (Barwise & Cooper 1981;Fintel 1993;Gajewski 2003;Abrusán 2007;Gajewski 2009). To the extent that that argument is convincing, the fact that (12a) is L-trivial while (12b) is not can be taken to explain the contrast between these two structures, and hence, explain the contrast between (10a) and (10b). 12 Thus, we have come to a tentative answer to the question in (11), given below. 13 11 Under the standard assumption that knowing the meaning of the sentence means knowing its truth condition, which means knowing what has to be the case for it to be true, which means knowing whether it is true in a given possible world (cf. Wittgenstein 1921). The distinction between facts about language and facts about the world is of course open to debate. When the question is raised as to whether this distinction has any justification, the situation becomes murky (cf. Quine 1951). But this squib is not the place for philosophical contemplation at this level. 12 The following contrast might raise doubt about the deviance of #at least zero being due to L-triviality, as it is implausible that Celsius and Kelvin are part of the logical vocabulary.
(i) a. The temperature is at least zero degrees Celsius b. #The temperature is at least zero degrees Kelvin However, we do not need to assume that Celsius and Kelvin are logical terms to explain this contrast. What we can say is that the presence of Celsius vs. Kelvin affects whether zero denotes the lower endpoint of the scale or not. Our discussion on zero is premised on the understanding that it does denote such a point. Nonstandard readings in which it does not, as in (ia), are not relevant.
In this connection, we would also acknowledge a problem which mathematical statements pose for this whole approach. The problem is illustrated by the fact that (ii) is not perceived as deviant.
(ii) One plus two equals three Given our notion of "logical expression," all words in (ii) belong to the logical vocabulary of English. Thus, the sentence is L-trivial, which means it is predicted to be deviant, contrary to intuition. At this point, we have nothing to say about mathematical statements. Our conjecture is that mathematical discourse invokes a different understanding of "facts about the world," hence can be perceived as being contingent and informative, even though they are either trivially true or trivially false. We thank an anonymous reviewer for drawing our attention to this problem. 13 L-triviality is a "grammatical" criterion in the sense that whether a sentence is L-trivial can be determined entirely on its syntax and semantics, without any reference to contextual factors such as discourse participants or common knowledge.
The grammatical criterion which distinguishes between (10a) and (10b) is L-triviality

Circumvention
The argument for (14) relies, crucially, on the assumption that EXH can rescue a sentence from L-triviality. To see this, consider (15a), which has the truth condition in (15b).
The first disjunct is trival. Moreover, it is L-trivial, as substituting ⟦students⟧ or ⟦smoked⟧ with any other predicate will not change the fact that it is trivial. This means that the whole disjunction is L-trivial. But (15a) is just (12b) without EXH, and since the argument for (14) depends on the claim that (12b) is not L-trivial, this means that that argument depends on the claim that EXH can circumvent L-triviality. Let us call this claim circumvention.
(16) circumvention EXH can rescue a sentence from L-triviality We believe circumvention can be challenged. Consider (17) and its possible parses in (17a) and (17b). This sentence is perceived as deviant in the same way (10a) is. The parse without EXH, (17a), is L-trivial. However, the parse with EXH, (17b), is not. Assuming, again, that at least alternates with exactly and more than, the truth condition of (17b) will be (18a), which is equivalent to (18b), a contingent proposition. 14 (18) a. (Every human has zero or more children) ∧ ¬ (every human has exactly zero children) ∧ ¬ (every human has more than zero children) b. Some humans have children and some humans have no children If circumvention is correct, we predict (17) to be non-deviant, because it has a parse which is not L-trivial. Thus, the fact that (17) is perceived as deviant suggests that circumvention is not correct, i.e. that EXH cannot rescue a sentence from L-triviality. This is perhaps not a surprising result. We expect a sentence to be ungrammatical if it contains a constituent which in isolation would itself be ungrammatical. If we take L-triviality to be a grammatical criterion, we expect a sentence with an L-trivial constituent to be deviant regardless of what other linguistic materials it may contain. This means that we cannot append EXH to an L-trivial sentence φ to "save" it, even if [EXH φ] is not L-trivial. 14 An anonymous reviewer raises two questions about (17).
(i) a. What if EXH takes scope below every?
b. What if the alternatives also include those generated by replacing every with some?
Here are our answers. First, if EXH takes scope below every, it would be semantically vacuous, and the parse would be L-trivial. The existence of such a parse, of course, has no bearing on our argument against circumvention, namely that (17) is deviant even though there is a parse for it which is not L-trivial. Our answer to the second question is that it makes no difference whether the some alternatives are included in the domain of EXH or not, as these would not be stronger than the prejacent and hence would not be excludable.
Another fact which speaks against circumvention is that (19) is perceived as deviant (Barwise & Cooper 1981).
(19) #There is every student As every expresses the subset relation, the truth condition of (19) can be stated as ⟦ student⟧ ⊆ ⟦there⟧, where ⟦there⟧ = D e , i.e. the set of all things that exist. Since every set is a subset of D e , the sentence is L-trivial, hence deviant. Now, given the standard assumption that every has some as its scale mate, a parse of (19) with EXH, (20a), would express the proposition in (20b), which is equivalent to (20c). (20) a. [EXH [There is every student]] b. (There is every student) ∧ ¬ (there is some student) c. There is no student Thus, appending EXH to (19) results in the proposition that there is no student, which is not trivial. If circumvention is correct, (19) should be acceptable and should express this contingent proposition. This is not what is observed.

The dialectical situation
Let us recap. We start with a claim about the meaning of the numeral zero, made by Bylinina & Nouwen (2018), which says that zero has the basic weak meaning of 'zero or more.' The dialectics then goes as follows. If Bylinina and Nouwen's claim is correct, then zero requires EXH. This requirement is either pragmatic or grammatical. Observation suggests that it is not pragmatic, hence must be grammatical. Saying that the requirement for EXH by zero is grammatical, however, implies that EXH can rescue a sentence from L-triviality. But observation suggests that EXH cannot rescue a sentence from L-triviality. Thus, we are lead to the conclusion that Bylinina and Nouwen's claim is not correct, i.e. that zero does not mean 'zero or more.' The conclusion is tentative, pending alternative explanations for the facts that Bylinina and Nouwen provided to support their claim. We would note, ending this section, that all of the data we discussed would be consistent with the view that zero has the strong meaning as basic and that EXH cannot circumvent L-triviality. Specifically, (21a) would mean 'no students smoked,' (21b) would be trivial but not L-trivial, and both (21c)  However, noting such facts is far from proposing a theory of zero. What we hope to have achieved is to raise some issues, present some novel observations, and share some ideas which can stimulate and provide some guiding intuition for further research.

Existential sentences
To corroborate the intuition that zero, unlike other numerals, cannot be modified by the adverb at least, we conducted an experiment, hosted on Amazon MTurk, in which participants were asked to rate the naturalness of eight English sentences. The sentences were derived from the sentence frames in (22) by replacing the placeholder n with the numeral zero, for a set of four deviant sentences, and the numeral two, for a set of four non-deviant sentences.
(22) a. There are at least n students in the seminar room b. The drawer contains at least n towels c. The company hired at least n employees d. The bartender served at least n guests We asked participants to rate the resulting sentences, presented in pseudo-randomized order, on the following four-point Likert scale: 4 (natural), 3 (relatively natural), 2 (relatively weird), 1 (weird). To illustrate, in advance of the trials, how these values may be associated with English sentences, we gave participants the four sentences in (23) and commented: "You may agree that (23a) is natural, (23b) is weird, while (23c) and (23d) are possibly somewhere in between." 16 (23) a. Everyone but John came to the party b. Someone but John came to the party c. Everyone who is not John came to the party d. Someone who is not John came to the party We used an even-numbered scale to force participants to discriminate between sentences which they may be unable to judge as 4 ('natural') or 1 ('weird'), and avoided suggesting that there is agreement on the score of such sentences in order not to stifle participants' trust in their own judgments. We believe that the use of a four-point scale, excluding a neutral value, does not have an adverse effect on the interpretation of the expected findings of our study: even if there was perceived social or other pressure (not) to classify sentences as ' (relatively) natural' or ' (relatively) weird', potential skewing would not have a bearing on the expected difference in the score for sentences with at least zero and sentences with at least two. We conducted our experiment with 32 Amazon MTurk workers who, after the trials, identified as being native speakers of English. Thus, overall we received 32 scores for each of our eight sentences and hence 128 scores per sentence type. Figure 1 shows that sentences with at least two received the highest score 4 ('natural') by ≥50% of all subjects, while sentences with at least zero received the two lowest scores 2 ('relatively weird') and 1 ('weird') by ≥50% of all subjects. The difference in the means of the scores (3.4 v 2.0), depicted in Figure 2, is highly significant (p < 2.2 -16 ).
This result supports our empirical claim that numerical statements containing at least zero, unlike statements containing other superlative modified numerals, are deviant.

Universally quantified sentences
If the deviance induced by the triviality of zero is pragmatic, it should be alleviated by exhaustification, and at least zero should be acceptable under universal quantification, as such quantification renders exhaustification non-vacuous. Example (17) suggests that the deviance of at least zero persists under universal quantification. To corroborate this intuition, we conducted another experiment, again hosted on Amazon MTurk. This experiment targeted the contrast between #every … at least zero and its counterpart containing zero or more as well as the non-zero counterparts of the former two. We used the eight sentences in (24) as stimuli.
(24) a. (i) #Every human has at least zero children (true) (ii) #Every human has at least zero biological mothers (false) b. (i) Every human has zero or more children true (ii) Every human has zero or more biological mothers false c. (i) Every human has at least one biological mother false (ii) Every human has at least two relatives true d. (i) Every human has one or more biological mothers false (ii) Every human has two or more relatives true Two of these sentences, viz. the sentences in (24a), are deviant, while the others are non-deviant. Abstracting from the deviance of (24a-i) and (24a-ii), 17 we can characterize 17 That is, we consider these sentences as if language was not sensitive to L-triviality.  the meaning of the sentences in (24) in the following way. All sentences license a distributive inference. 18 For instance, the sentence in (24b-i) licenses the inference that some humans have (exactly) zero children and some humans have one or more children. The distributive inference renders (24b-i), as well as the other sentences (24a) and (24b), nontautological. The distributive inference of (24b-i) does not, however, render this sentence false in the actual world. In this regard, it differs from e.g. the sentence in (24a-ii): in the actual world, it is false that some humans have (exactly) zero biological mothers. Overall, as indicated, if we take the distributive inference into account four of the eight sentences in (24) are true in the actual world, while the other four are false. 19 We instructed participants to classify the sentences in (24) as either 'true', 'false', or 'weird' 20 by giving them the following instruction in advance of the trials: "Imagine you are helping an alien who wants to learn English and also learn about humans in general. Your job is to say for several English sentences whether they are true, false, or sound weird." The scenario we sketched in this instruction served to introduce an addressee without common knowledge. With this, we aimed for subjects to be less drawn to classify non-deviant sentences such as the sentences in (24b-i) or (24c-ii) as weird despite the fact that they express common knowledge with little informational content. 21 At the same time, the instruction aimed at making subjects more inclined to classify the deviant sentences in (24a) as weird by pointing out that the addressee also wants to learn English.
We expect that the proportion of 'weird' responses is higher for the sentences in (24a) than for the sentences in (24b), in accordance with our empirical claim that the former, in contrast to the latter, are deviant. Furthermore, we expect that the proportion of 'weird' responses is higher for the sentences in (24a) and (24b) than for the sentences in (24c) and (24d). This expectation is based on two factors: (i) the assumption that tautological sentences are more prone to being judged as weird than contingent sentences, and (ii) the experimental finding that there tends to be a substantial subpopulation of subjects that do not to compute distributive inferences (Crnič et al. 2015). Such a subpolulation derives tautological truth conditions for the sentences in (24a) and (24b), while the sentences in (24c) and (24d) are contingent with or without the distributive inference.
We conducted our experiment with 157 Amazon MTurk workers who, after the trials, identified as being native speakers of English. Thus, we received 157 scores for each of our eight sentences (1256 observations in total). Figure 3 shows the proportion of 'true', 'false', and 'weird' responses for the degree phrases in (24a), (24b), (24c), and (24d), respectively.
In the subsequent statistical analysis, we consider as dependent variable the binary distinction between 'weird' responses and 'true' or 'false' responses ('non-weird' responses). We used binomial logistic regression to analyze the relationship between this binary response variable and the following four independent variables: modifier with values 'at least' and 'or more', numeral with values 'zero' and 'non-zero', at least zero with values '+' and '-', and truth value with values 'true' or 'false'. These variables describe whether the stimulus sentence contains the modifier at least or the modifier or more, whether the stimulus sentence contains the numeral zero or the numeral one or two, whether the stimulus sentence contains the degree phrase at least zero or any other degree phrase, and whether the stimulus sentence is true or false in the actual world, respectively. All four variables turn 18 As pointed out in Büring (2008), at least n licenses the same inferences as n or more. 19 Again, we are abstracting from the deviance of the sentences in (24a). 20 A pilot study using the four-point Likert scale in §2.1 failed to elicit discriminative judgments for the contrast between (24a-i) and (24b-i). That is, both sentences received equally low scores. We believe that different factors are at play for the two sentences to be judged as (relatively) weird on the four-point scale we used. See the discussion below. 21 Recall that (24b-i) has a distributive inference which renders this sentence non-tautological. out to be significant predictors of the response variable with p values of 0.01449, 0.00273, 0.01984, and 0.00995, respectively. Specifically, the modifier at least and the truth value 'true' decrease the odds of the 'weird' response (reduction of the odds by 43% and 28%, respectively), while the numeral zero and the degree phrase at least zero increase the odds of the 'weird' response (increase of the odds by 78%, and 110%, respectively). The results of the ANOVA between this model and the null model show that, as suggested by the p values of the logistic regression, the numeral variable is by far the biggest contributor to the fit of the model (reduction of deviance by 44.4 from 1347.7 null deviance, p = 2.686 -11 ), followed by the at least zero variable (reduction of 6.7, p = 0.0094), and truth value (reduction of 5.2, p = 0.0221), while the modifier variable does not contribute significantly to the fit of the model (p = 0.5).
These results confirm the expectation that the proportion of 'weird' responses is higher for the sentences in (24a) and (24b) than for the sentences in (24c) and (24d), and higher for the sentences in (24a) than for the sentences in (24b). Thus, overall the results are compatible with our empirical claim that the sentences in (24a) are deviant, while those in (24b), (24c), and (24d) are non-deviant. 22 Questions remain about the influence of the truth or falsity of the stimulus sentence on the odds of the 'weird' response and about the reason for the overall tendency for the modifier or more to raise the odds of the 'weird' response compared to the modifier at least. 23 We hope to address these questions in future research. 22 For further corroboration, we note that an informal Google search on September 14, 2019, shows a clear difference in the production of the expressions under consideration: a search of, e.g., the phrase zero or more times gives 237,000 results, while a search for at least zero times only gives 453 results. 23 Recall that, importantly, our results also show that the odds are reversed for at least zero relative to other degree phrases.  Bylinina & Nouwen (2018) propose the 'at least' meaning as basic for zero to account for a number of intuitions about this numeral. This squib discusses some issues which arise from this proposal. At the center of the discussion is the observation that at least zero leads to deviance while the semantically equivalent zero or more does not, as well as the concept of L-triviality and its place in the linguistic system. We present some experimental data to corroborate our observation.