An investigation of numeral quantifiers in English

There exist a broad range of theories which model quantifier scope (Ruys & Winter 2011). The empirical coverage and predictions of three theories: Quantifier Raising (May 1985) with Scope Economy (Fox 2000), Feature Checking (Beghelli & Stowell 1997), and Choice Functions ( Winter 1997) are tested using a sentence-picture verification task. Speakers rated the quality of sentences with bare and modified numeral quantifiers in surface and inverse scope conditions. In Experiment 1, it was discovered that bare numerals marginally resisted inverse scope, contra the predictions of (Beghelli & Stowell 1997). Experiment 2 revealed that the scope of numerals is clause-bound, in argument against the theory of Choice Functions which predict extra-clausal scope for existential quantifiers. The results suggest that a modified QR theory best accounts for the observed data.


Introduction
It is well known that multiply quantified utterances in English display scopal ambiguity (Montague 1973;May 1977;Szabolcsi 2010). That is, the linear order of the quantifiers does not always determine their respective scopes. Consider (1); this utterance is ambiguous as there are two possible interpretations.
First, this sentence may be interpreted using its surface scope, where the linear order of the quantifiers mirrors their relative scopes. The first quantifier takes the widest scope and the following quantifier takes narrow scope. The surface scope reading means that there is a single, unique ceramicist who likes all of the vases. In predicate logic, this interpretation would be encoded as in (2). The expression a ceramicist is quantified by the existential quantifier (∃) "there exists". The expression every vase is quantified by the universal quantifier (∀) "for all". (2)

∃x[Ceramicist(x) ∧ ∀y[Vase(y) → Likes(x, y)]]
One may describe and model the ambiguity of a sentence such as (1) by referencing the relative scopes of the quantifiers in the sentence. Both scopal configurations for (1) are truth-conditionally distinct, thus establishing their status as two distinct meanings. In formal semantics, to know the meaning of a sentence is to know how a world must look like in order for the sentence to be true; that is, to know the sentences' truth conditions (Nouwen 2015). In order for (2) to be true in some world, then there must be a single ceramicist who has the property of liking every vase. The truth-conditions for (3) are different; it is only required that each vase has at least one ceramicist who likes it, and the ceramicist need not be the same for each vase. The literature on quantification contains a number of theoretical mechanisms which attempt to explain and model the apparent flexibility of quantifier scope, among them: Quantifier Raising (May 1985), Feature Checking (Beghelli & Stowell 1997), and Choice Functions (Winter 1997).
As seen in (2) and (3), quantifiers may be broken into two classes: universal/distributive quantifiers (∀: each, every) and existential quantifiers (∃: a, some, two). Beghelli & Stowell (1997) and Liu (1997) argue for additional quantifier classes to account for an asymmetry in the scope of numerals. Namely, it has been claimed that modified numerals (MNs) cannot take inverse scope in all of the environments that bare numeral quantifiers (BNs) can. Consider (4) and (5)  The inverse scope interpretation is claimed to be ungrammatical for modified numeral quantifiers (5). Judgments on scope are notoriously difficult to obtain (Szabolcsi 2010) and to date, there has been no work which empirically demonstrates that this difference exists. The results of two experiments on the scope of BNs and MNs reveal that their scope is clause-bound and that the claims in the literature may be incorrect. The results of Experiment 1 reveal that bare-numerals resist inverse scope, not modified-numerals as predicted.
There have been several experimental investigations of quantifier scope; however, the stimuli were confounded and fail to differentiate between surface and inverse scope. Ruys (2002) discusses a pernicious scope confound dubbed "Reinhart's Riddle", in honor of (Reinhart 1976) who originally discussed the entailment pattern. The entailment pattern is dubbed a "riddle" because of the fact that it is often confused in the literature. Generally, the inverse scope reading entails the surface scope reading because the inverse scope reading describes one specific situation which is supplied by the surface scope reading. Reinhart (1976) and Ruys (2002) show that it cannot be determined if the inverse-scope reading ever obtains in such utterances.
The confound works as follows: let represent a doubly quantified sentence using surface scope where φ is some arbitrary transitive predicate. Then entails (i) because the meaning of (ii) satisfies the truth conditions of (i). Using a real world example, consider (6). The surface scope reading (6a) asserts that all artists are such that they love a painting. Importantly, the painting does not need to be the same for each artist. If we have 3 artists (A, B, C) and 3 paintings (p0, p1, p2) then (6a) can describe a situation where A loves p0, B loves p1, and C loves p2. (6a) may also describe a situation where A loves p0, B loves p0, and C loves p0. This reading, where all artists love the same painting is exactly the interpretation asserted by the inverse scope reading (6b). This confound is present in the stimuli of earlier experimental investigations of scope. Ionin (2010) investigates the scope of various indefinite expressions in English. The experiments used a truth-value judgment task where subjects read a brief story and then reported whether or not a continuation sentence was true. An example story-continuation is given below. As can be confirmed, the continuation sentence which is intended to elicit an inverse scope reading for the indefinite expression a reviewer entails the surface scope reading. Thus, we cannot be sure that an acceptance of this sentence with this story means that subjects were interpreting the sentence with inverse scope for the indefinite. (7) Example (13) from (Ionin 2010: 238) The teenagers who live in this neighborhood are film buffs, and closely follow the film reviews in the local newspaper. The newspaper has two reviewers, Paige and Robert, and the teenagers tend to trust Paige's judgment more. This week, for instance, the teenagers watched all the movies recommended by Paige, but they completely ignored Robert's recommendations.
a. Every teenager watched every film that a reviewer had recommended. Raffray & Pickering (2010) and Chemla & Bott (2015) also investigate quantifier scope using confounded stimuli. Specifically, their studies sought to investigate whether scope can be primed; however, the images used to disambiguate surface scope from inverse scope were such that the inverse scope pictures were always compatible with a surface scope interpretation. As can be seen in (8), the "Existential-Wide" (inverse scope) condition for the sentence every hiker climbed a hill is compatible with a surface scope reading. Similarly, in (9), the "Universal-Narrow" (inverse scope) condition is compatible with a surface scope interpretation for the sentence every square is below a heart.
(8) Stimuli from Raffray & Pickering (2010) (9) Stimuli from Chemla & Bott (2015) In order to investigate quantifier scope experimentally, the stimuli must not succumb to "Reinhart's Riddle". To that end, the stimuli used in Experiments 1 and 2 were created such that they were able to differentiate surface scope from inverse scope while avoiding the problem of confounding entailment. The data from Experiments 1 and 2 are used to infer which theory has the best empirical coverage. This determination is given in section 6. Extensions and suggestions for future research are provided in the conclusion (section 7).

Scope in formal linguistics
As was shown in (1), multiply quantified sentences in English are often ambiguous. Of particular interest is the inverse scope reading, where the lowest quantifier takes the widest scope. Under inverse scope readings, speakers are interpreting a quantifier in a position different from its apparent surface position. In order to account for this fact, several theories have been proposed. In this section, the details of three theories are provided: Quantifier Raising with Scope Economy, Feature Checking, and Choice Functions. May (1977) discusses Quantifier Raising (QR), a syntactic movement operation that reorders quantifiers at Logical Form (LF). 2 QR works by adjoining quantifiers to phrasal nodes (e.g., S, VP). 3 The relative scope of the quantifier is its c-command domain at LF. Consider the ambiguous sentence in (10), below.

Quantifier Raising with Scope Economy: QR
(10) A ceramicist likes every plate.
(10) may either be interpreted on its surface scope (there is only one, unique ceramicist who likes every plate) or its inverse scope (all plates are such that a ceramicist likes them). As QR is a syntactic theory, scope is determined through structural manipulation. Before QR has applied, (10) would be assigned the following (simplified) syntactic structure. To derive the surface scope interpretation, QR moves the lowest phrase containing the quantifier every, and adjoins it to the VP node. The subject quantifier c-commands the object quantifier adjoined to VP, and therefore takes wider scope. The resulting, simplified, LF is shown in (12). To interpret this sentence on its inverse scope, QR will instead adjoin the phrase containing the quantifier every to the S highest node such that it c-commands the subject. The resulting, simplified LF is shown in (13), below.
As a syntactic theory which supplies representations to the semantic component of grammar LF, QR is constrained by various locality restrictions both syntactic and semantic. See, e.g., Ross (1967) for a discussion of syntactic "Islands", or Chomsky (2008) for a discussion of syntactic "Phases". Fox (2000) suggests that syntactic operations are restricted by a semantic "economy" constraint which prohibits movements which are "semantically vacuous".
(14) Scope Economy SSOs (scope-shifting operations) which are not forced for type considerations must have a semantic effect.
QR coupled with Scope Economy predicts that BNs and MNs are only able to take inverse scope if there is a meaning difference. Additionally, the theory of Scope Economy posits an additional rule: Shortest Move.
(15) Shortest Move QR must move a quantifier phrase (QP) to the closest position in which it is interpretable. In other words, a QP must always move to the closest clause-denoting element that dominates it.
This restriction on movement predicts that numeral quantifiers should only be able to take exceptionally wide-scope if there is a new meaning as a result. When an indefinite takes scope outside of its own clause, it is interpreted as a presupposed and unique entity (Diesing 1992). That is, it is imbued with a referential reading. Scope in QR with Scope Economy is equivalent to the c-command domain of the quantifier at LF. The QR theory applies to all quantifiers equally so it predicts that scope taking for bare and modified numerals should be identical. QR only applies when it will affect the semantic interpretation. Additionally, QR is restricted to the local clause unless necessary for referential interpretation. Beghelli & Stowell (1997) introduces the second syntactic theory of scope taking: Feature Checking (FC). The theory is distinguished by three basic architectural settings:

Feature Checking: FC
(16) a. Quantifiers are divided into five classes based on their inherent properties. b. There exist distinct structural positions where quantifiers must move for scope and feature checking (these positions are determined by quantifier class membership). c. Scope is calculated at LF by examining each quantifier's c-command domain in the structure.
The most salient design feature of the Beghelli & Stowell system is the classification of quantifiers into five types. This division can be seen as an extension of Liu (1997), who was among the first to realize that the scope taking ability of a quantifier may depend on its type (Kiss & Pafel 2017). Beghelli & Stowell (1997) frame this difference as a direct response to the QR theory of scope. They state that the QR theory is too free. Namely, it is able to move any quantified phrase in any order, thereby generating unattested scopal readings. They suggest that all quantifiers should not be treated equally, and that the mechanism responsible for assigning scopal interpretations is sensitive to the diversity of the quantifiers. 4 Beghelli & Stowell's solution is to posit a series of functional heads, each of which serves as a landing site for a specific class of quantifier. In this way, their theory is similar in spirit to that of "cartography" in syntax. 5 There are five classes of quantifiers identified in Beghelli & Stowell's system. Each is discussed below. The quantifiers belonging to this class are marked by the following properties, "…[T]hey count individuals with a given property, have very local scope (take scope essentially in situ) and resist specific interpretations." Members of this class include decreasing quantifiers such as few and fewer and modified numerals like at most 10 and at least 2. CQPs take take scope from their Case positions: AgrSP for CQP subjects and AgrOP for CQP objects. e. Group-Denoting QPs (GQPs): This very large quantifier class subsumes indefinite QPs such as a and some, bare numerals such as 8 and 67, and definite expressions e.g., the vase. The authors note the following fundamental property of GQPs: they easily allow for wide-scope readings, even when c-commanded by other quantifiers. These quantifiers readily allow for a variety of readings. Depending on the reading they manifest, GQPs take scope in a variety of positions. When referential, GQPs take scope in the specifier of RefP. E.g., [ RefP [A certain celebrity] was seen skiing]. When interpreted as "specific" in the sense of Diesing (1992), GQPs take scope from the specifier of ShareP. E.g., [Every tennis player likes [ ShareP some brand] of racket.] Lastly, when GQPs are interpreted non-specifically, they take scope in their case positions, just like CQPs. E.g., [Every artist likes [ AgrOP a paintbrush]].
The final class, the CQP, is posited to account for an asymmetry in the scope of numeral quantifiers. The authors claim, "A CQP in object position should never be able to take inverse scope over a GQP or DQP in subject position" (Beghelli & Stowell 1997: 80, ex. 3(d) The claim that a CQP in object position should never be able to take scope over a DQP in subject position is not trivial to prove. As discussed in the introduction, whenever there is a universal quantifier in subject position and an existential quantifier in object position, it is impossible to detect any inverse scope reading for the sentence ("Reinhart's Riddle"). Experiment 1 provides an empirical test of this claim.
To determine scope, Beghelli & Stowell's quantifier types must move to specific structural positions at LF. The positions are shown in (19) (20) A ceramicist threw every pot.
On an inverse scope interpretation, (20) means that every pot is such that a (potentially different) ceramicist threw it. There are two quantifier phrases in this sentence, a GQP "a ceramicist", and a DQP "every pot". Each QP will move to its respective scope-checking position at LF, thus deriving all possible scopal configurations. For the inverse scope reading, the final (simplified) LF will be as in (21), below.
Brendel: Numeral quantifiers in English Art. 104, page 9 of 25 As shown in (21), each quantifier must move to its respective feature checking position at LF in order to derive the various scopal configurations. Notice how the indefinite expression a ceramicist takes scope from the specifier of ShareP, rather than at the two other possible positions: the specifier of RefP or its case checking position, AgrSP. The reason for this is as follows: if the GQP were to take scope from the Spec of RefP, then the indefinite would be interpreted as taking the widest scope, and it would be endowed with a referential reading making it a specific/unique individual. The meaning would be, roughly: a single, unique ceramicist (known to both speaker and hearer) has the property that they threw every pot. If the indefinite were to take scope from its case checking position, Spec of AgrSP, then it would not be out-scoped by the DQP, and it would be assigned a non-specific interpretation. On this interpretation, the sentence would roughly mean: some ceramicist has the property of throwing every pot. This is the most uninformative reading. If the GQP takes scope from the Spec of ShareP, then the correct interpretation and configuration obtains. The GQP is out-scoped by the DQP, and it is assigned a "presupposed individual" interpretation. The meaning is roughly: Every pot is such that some ceramicist (that the speaker assumes the hearer knows) threw it. The Beghelli & Stowell system has 3 positions for GQPs which encode their relative scope and correlate with an interpretation. The details of how these readings emerge from these positions or how the system knows which indefinites must be assigned certain readings, and why these readings necessitate these scopal configurations is left open. The authors state that: "We will not be concerned here with the issue of how referential readings (cf. Fodor & Sag 1982) of indefinite QPs should be generated. We refer the reader to Kratzer (1995) for a recent proposal." (Beghelli & Stowell 1997: 76).
Unlike the QR theory, the feature checking theory is not free: it does not allow any quantifier to appear in any order. The scope taking potential for each class of quantifier is different; this system predicts that there will be differences between bare and modified numeral quantifiers. Specifically, it is predicted that modified numerals quantifiers (CQPs) should not be able to take inverse scope over a bare numeral quantifier, because bare numeral quantifiers are GQPs, and GQPs always out-scope object CQPs.

Choice Functions: CH
Indefinite expressions such as some and a are capable of taking scope outside of a finite clause, while distributive/universal quantifiers such as every may not.
(22) Some art teacher said that a certain/every student glazed their sculpture.
cannot mean that for each student, there was a (potentially different) teacher who said that they glazed their sculpture. However, if the indefinite expression a certain takes wide scope, then the sentence can mean that there is single, unique student such that some teacher said that the student glazed his/her sculpture. This ability has been widely acknowledged by a number of researchers; the reader is referred to Fodor & Sag (1982) for the seminal discussion of these facts. Within Generative Linguistics, the most popular mechanism for modeling the scope of an indefinite expression is the choice function. Choice functions, as described in Reinhart (1997), are functions which return a member from any non-empty predication set. More formally, Winter (1997) defines choice functions as in (23). (23) The choice condition (Winter 1997) A function f is a choice function (i.e., CH(f) holds) only if for every non-empty predicate P, f(P) is defined and it is in the extension of P (i.e., P(f(P)) holds).
A choice function selects some element from a set, and the element chosen by the function is capable of participating in predication. Consider the ambiguous sentence in (24).
(24) If some student glazes their pot, Chablis will be glad.
Under the sentence in (24), there are two interpretations translated using choice functions.
(24a) means that if any student glazes their own pot, then Chablis will be glad. This is so, because the scope of the choice function is contained within the protasis of the conditional. This is the reading assigned to the sentence when the indefinite takes scope within its local clause. (24b) means that Chablis will be glad if a specific student glazes their own pot; this reading corresponds to the widest scope reading for the indefinite. The formula captures this fact by having the choice function scope over the entire conditional. Namely, the formula requires that a specific student is chosen; the student who would make Chablis glad if s/he glazes their own pot. Choice functions work by returning a member of some non-empty set of predication, and depending on the scope of the choice function in the semantic formula, derives the scope of the indefinite. This theory does not require any "movement" of indefinites in the syntax, unlike the QR or Feature Checking theory. In this way, choice functions are a purely semantic theory of scope. They have no reflex in the syntax, and thus the curious availability of indefinites to "escape" scope islands (see (24)) can be explained in a principled way. Indefinites are not subject to clause boundary and island restrictions, because indefinites never "move" in the syntax, they always remain in situ; their scope is calculated in the semantics alone using choice functions. The choice function theory predicts that bare numerals and modified numerals should not differ in ther scope taking ability as they are both indefinite quantifiers. Additionally, choice functions predict that exceptionally wide scope will always be available for indefinite expressions like bare numerals and modified numerals, so their scope should not be clause-bound.

Predictions of each theory
I provide a brief summary of each theory highlighting the most salient aspects as well as the predictions each theory makes regarding the scope of bare and modified numerals (Table 1).

Methods
To investigate the scope taking behavior of modified and bare numeral quantifiers, a sentence-picture verification task (SPVT) was chosen. In this SPVT, subjects were presented with sentence-picture pairs and were tasked with reporting whether or not the sentence was a good description of the picture. This experimental paradigm was chosen for maximal similarity to previous work on quantifiers e.g., Chemla & Bott (2015); Raffray & Pickering (2010) and for its simplicity. Scope is an abstract concept and elicitation of reliable judgments is notoriously difficult (Szabolcsi 2010), so, it was decided that the simplest methods should be used. Namely, when presented with a sentence-picture pair, the meaning of the sentence can be quickly checked against the picture rather intuitively.

Subjects
A total of 20 subjects participated in this study. All subjects included in analysis were 18+ native speakers of English with no reported vision disorders. Subjects were recruited from the linguistics research participation pool at a large, American University. Compensation for participation was either course credit or $5.00.

Stimuli
The stimuli used in this study were identical to those used in the original pilot study, thus this discussion follows the original in form. The stimuli consist of solid, black, geometric shapes (circles, squares, and triangles) arranged in 3 × 3 grids and a sentence describing the picture. There were a total of 20 test items and 20 fillers. All fillers were excluded in analysis. Fillers were such that they used geometric shapes in 3 × 3 grids and were paired with sentences that used quantifiers. None were scopally ambiguous. 10 of the fillers appeared with sentences which were good/true for their picture, and the other 10 appeared with sentences which were bad/false for their picture. All 5 test stimuli pictures used in the study are pictured in Figure 1. Each stimulus picture appeared with a sentence that was true of the picture. The four sentences were such that they investigated the surface (S) and inverse (I) scope of two different quantifiers two and at least two, a bare numeral (BN) and modified numeral (MN) respectively, thus, a 2 × 2 design with quantifier type [BN-MN] and scope order [S-I] as factors. The sentence frames testing surface and inverse scope for the two quantifier types are presented in (25).

(25)
Sentence frames for testing scope and quantifier type. a. Surface scope: (i) There are two x directly above a y. [BN] (ii) There are at least two x directly above a y. [MN] b. Inverse scope: (i) There is an x directly above two y.
[BN] (ii) There is an x directly above at least two y. [MN] The sentences above, when paired with one of the test images in Figure 1, allow for the detection of surface and inverse scope by quantifier type. Recall the specific claim in (Beghelli & Stowell 1997: 80, ex. 3(d)) regarding the distinction between bare and modified numeral quantifiers: "A CQP [modified numeral] in object position should never be able to take inverse scope over a GQP [indefinite] or DQP[distributive/universal] in subject position." Given that it is impossible to detect inverse scope whenever we have a universal quantifier in subject position and an existential quantifier in object position the sentences tested all employed two existentially quantified expression, thus avoiding the confound. Given the findings of Michaelis & Francis (2007), all sentence frames are existential constructions. The authors report that indefinites in sentence initial position are marked in English. By using "there" in initial position, the markedness can be overcome. So, instead of reading, "A circle is directly above two squares", subjects read, "There is a circle directly above two squares." An example trial with all four sentence conditions is presented in Figure 2. One will notice that the sentence frames and pictures used in this study require distributive interpretations for the numeral quantifiers across all conditions. This decision was made for the following reason. Consider the image in Figure 2: it is identical for all four conditions, and it is capable of distinguishing both scope interpretations. If one tries to construct images There are at least two circles directly above a triangle, BN-I: There is a circle directly above two triangles, MN-I: There is a circle directly above at least two triangles.
where the numerals are interpreted collectively, one will discover that a single image cannot be used for both surface and inverse conditions. This is illustrated below in (26).
(26) Collective interpretation for numeral quantifiers cannot distinguish surface scope from inverse scope.
a. There is a circle that is directly above two triangles. b. There is a circle that is directly above at least two triangles.
These sentences (26a) and (26b), which are supposed to isolate inverse scope interpretations, are in fact compatible with the image in (26) on a surface scope reading. That is, there exists a circle such that it is directly above two/at least two triangles. In order to keep stimuli design consistent (that is, not require separate images for surface scope and separate images for inverse scope) it was decided that stimuli which are consistent across scope conditions should be used; this required distributive interpretations through-out. While distributive interpretations were forced in this study, all conditions were equally affected. So, if any differences are observed, we can be sure that distributivity is not the sole reason for there being differences. 6 It thus remains an open question whether or not the results of this study could be replicated if collective stimuli were used. If one wishes to, replicate this study using collective interpretations for the stimuli, then care must be taken to ensure that the images unambiguously isolate scope. That is, the stimuli must be able to distinguish surface and inverse scope while avoiding the confounding entailments discussed previously.

Procedure
Subjects took the experiment while seated at a desktop computer in a quiet room. The experiment was presented using DMDX (Forster & Forster 2003). Before the experiment began, consent was acquired, and the subjects were shown the following text on the screen: Introductory Text for Experiment In the following study you will be shown pictures that are paired with a sentence. Your task is to judge how well each sentence describes the picture. You will rank each sentence on a 4 point scale. A score of 1 indicates that the sentence describes the picture badly, a score of 4 indicates that the sentence describes the picture well. The test should take no longer than 10 minutes. Please complete the survey in one sitting and try your best! 6 It is important to note however that collective interpretations are claimed to be preferred over distributive interpretations for numerals: Ussery (2008) states that collective/cumulative interpretations are preferred over distributive interpretations because the derivation for collective interpretations is simpler than a derivation for distributive interpretations, as distributive interpretations require more syntactic structure (Kratzer 2002;2005). If economy considerations which favor simpler structures have an influence in grammar, then we should expect that such an uneconomical structures will incur some processing difficulty (Reinhart 2006). Further discussion on this issue would take us too far afield; however, such considerations remain important for future research on quantifier scope.
All subjects were then shown a total of 40 sentence-picture pairs in random order. 7 All subjects saw the same 40 items. 20 sentence-picture pairs were fillers and were discarded in the final analysis. 20 sentence-picture pairs were test items whose corresponding ratings were submitted for analysis. Subjects had a total of 10 seconds to assign each sentencepicture pair a rating before the test automatically advanced. Upon assigning a rating, the test would automatically advance.

Results
For each subject, the ratings assigned to each condition were submitted for analysis in Python: IPython (Pérez & Granger 2007), Pandas (McKinney 2011), NumPy (Van Der Walt et al. 2011, SciPy (Jones et al. 2001), Seaborn (Waskom et al. 2018), StatsModels (Seabold & Perktold 2010), and Scikit-learn (Pedregosa et al. 2011). Analysis consisted of a linear mixed-effects model with scope (surface vs. inverse) and quantifier type (bare numeral vs. modified numeral) as the factors; additionally, each subject was incorporated as a random effect. Bootstrap confidence intervals at 95% were calculated using 10k samples (Efron 1992). 8 The mean ratings with confidence intervals at 95% in each condition are presented below: Consideration of the means in Experiment 1 reveals that bare numerals in inverse scope condition received the lowest rating; while all other confidence intervals overlap. These results are visualized in bar-plots and violin plots (Figure 3). By examining the distributions of the scores in each condition, we can see that the low scores assigned to bare numerals in inverse scope conditions are concentrated to the middle range of the scale, while all other conditions have ratings distributed across the top half of the scale. Upon submitting these data in a mixed-effects model with subject as a random factor, we found two main effects and no statistical interactions. This was a surprising finding given the confidence intervals computed above. There was a main effect of quantifier type (p = 0.002) such that bare numerals were assigned lower ratings than modified numerals, and a main effect of scope (p = 0.000) such that inverse scope conditions were rated lower than surface scope conditions. While not significant, there was almost a statistical interaction (p = 0.058) between scope and quantifier. These results are summarized in Figure 4.
These results are surprising given the claims in the literature. We would expect that modified numerals should resist inverse scope and bare numerals should allow for inverse scope interpretation; however, we found that bare numerals were rated lower than modified numerals across both surface and inverse scope conditions. This suggests that the 7 Each subject saw a different random order. 8 Bootstrap sampling is a statistical re-sampling technique where the data in your conditions are randomly sampled with replacement and thousands of new datasets can be generated conforming to the empirical distribution of scores you received in your first collection. Your original sample's mean M og is calculated, and then the means of every sample acquired through re-sampling are calculated M n , and their differences diff n = M og -M n are recorded and sorted in ascending order[d 0 , d 1 , …, d n ]. The percentile values in this list of differences correspond to the confidence interval percentages for your measure at the desired level of confidence. All bootstrap CIs calculated using the "bootstrap" method (Efron 1992) using 10k samples. All code implemented in Python with statistical features from the NumPy library. theory of Feature Checking in Beghelli & Stowell (1997) is incorrect: modified numerals should not be barred from taking inverse scope. But, nonetheless, the idea that quantifiers might differ is on the right track. It is useful to consider subject level performance on this task as there was considerable subject level variation (see Figure 5). Consider the main effect of quantifier type. Only subjects: 15,25,29,31,33,37,39,43,45,47,57,61 show an obvious preference for modified numerals, but this is only within the inverse scope condition. Only subjects 35, 45, 47, and 61 show a slight preference for modified numerals in surface scope.
With our model and subject-level plots in hand, the reason for the preference for modified numerals seems to be the result of subject level difference; more data is ultimately needed to fully understand this effect.
These data suggest that the claims in the literature regarding the inability for modified numerals to take inverse scope are unwarranted: it is clear that modified numerals may take inverse scope. This result argues against the Feature Checking theory which posits a separate class of quantifiers to account for this supposed asymmetry. While the specific prediction regarding modified numerals in Beghelli & Stowell (1997) was not borne out in thees data, it appears t quantifiers might differ in their scope taking behavior, but more data is needed to fully understand what may be responsible for such differences.
These data do not clearly distinguish between Choice Functions or QR. Both theories predict that numeral quantifiers should not differ in their scope taking behavior; however, we discovered statistically significant differences between BNs and MNs. We observed that bare numerals were rated lower than modified numerals across both surface and inverse scope. We thus require more data to decide between QR and Choice Functions.

Experiment 2
Experiment 2 was identical to Experiment 1 except for the addition of placing the second quantifier within a relative clause. Other minor differences are noted below. If the theory of Choice Functions is correct, numeral quantifiers should be able to take extra-clausal scope. The theory of QR with Scope Economy predicts that numeral quantifiers can only take extraclausal scope if it effects the interpretation by having a "referential" (Diesing 1992) reading.

Methods
Identical to Experiment 1.

Stimuli
The stimuli in Experiment 2 differ from those in Experiment 1, in regard to placing the second quantifier within a relative clause. This was done as a "test control". That is, it is assumed that subjects will never access inverse scope readings for the quantifiers embedded within a relative clause, but, given that this ban has no empirical validation, it may be the case that subjects will access inverse scope, thus generalizing the findings of Syrett (2015) who showed that quantifiers may take scope out of their clause in ACD constructions. All fillers used in Experiment 2 were the same as those in Experiment 1, as well as all test image conditions. An example trial with all sentence conditions is provided in Figure 6. As one can see, all is identical save for the addition of relative clauses.

Procedure
The procedure was identical to that in Experiment 1.

Results
For each subject, the ratings assigned to each condition were submitted for analysis in Python (see 4.2). Analysis consisted of a linear mixed-effects model with quantifier type (BN-MN) and scope (Surface-Inverse) as the only factors and using subject as a random effect. Bootstrap confidence intervals at 95% were calculated using 10k samples. Mean ratings for each condition are reported below. 9 From these means and their confidence intervals, it is clear that inverse scope conditions were dis-preferred regardless of quantifier. Consideration of the raw values indicates that subjects found the differences between the surface and inverse conditions in experiment 2 to be more prominent than the differences found between bare and modified numerals in experiment 1. Consideration of bar and violin plots (Figure 7) confirms the above. The There are two circles that are directly above a triangle, MN-S: There are at least two circles that are directly above a triangle, BN-I: There is a circle that is directly above two triangles, MN-I: There is a circle that is directly above at least two triangles. scores assigned to sentences in surface scope condition are concentrated at the top of the scale while scores in inverse scope conditions are clustered at the lower end of the scale. Analysis revealed a significant main effect of scope (p = 0.000) such that inverse scope conditions were rated lower that surface scope conditions. No other comparisons emerged as significant. These results are summarized in Figure 8.

Summary
The results of experiment 2 demonstrate that quantifier scope is clause-bound. This pattern was robust between subjects and within ( Figure 9) (save for subject 62 who assigned low ratings regardless of quantifier type or scope and subjects 48 and 54 who appear to have slightly preferred the inverse scope sentences.) The results of experiment 2 argue strongly against the theory of Choice Functions which allow for indefinite expressions to take extra-clausal scope. It was clear that the majority of subjects (18/20) did not accept extra-clausal scope for numeral quantifiers. With the results of Experiment 1 in hand, the mounted evidence suggests that the theory of QR with Scope Economy best accounts for the facts.

Discussion
The results of Experiment 1 suggest that modified numerals allow for inverse scope interpretations. This finding conflicts with the claims in Beghelli & Stowell (1997) regarding the difference between bare and modified numerals. With this data in hand, we can conclude that the Feature Checking theory makes the wrong predictions. The results of Experiment 2 clearly argue against the Choice Function theory. Choice Functions predict that extra-clausal scope should be available for indefinite expressions and both BNs and MNs strongly resisted extra-clausal scope. This leaves the QR theory with Scope Economy as the theory which most accurately models the empirical data at hand.
Why then did we observe that bare numerals and modified numerals differ (Exp. 1) and that numerals resisted extra-clausal scope (Exp. 2)?
I suggest that distributivity and referentiality coupled with the principle of Scope Economy and Shortest Move (Fox 2000) are responsible. I will argue that results observed in Experiment 1 follow from the results obtained from Experiment 2. Following the claim in (Fodor & Sag 1982: 356): indefinites (bare numerals and modified numerals) may have a quantificational and a referential reading. The referential reading for an indefinite expression is truth-conditionally equivalent to the reading where it takes the widest scope; however, it may still be distinguished on pragmatic grounds. Namely, a referential readings for the indefinite results in specific interpretation for the indefinite.
If Fodor & Sag (1982) are correct in their argument that indefinites have a specific/referential reading which allows these expressions to take exceptionally wide scope, then it quantifier type (BN, MN) in Exp 2 across all subjects. The plots show that each subject rated surface scope sentences higher than inverse scope sentences except for subject 62 who rated all sentences poorly and subjects 48 and 54 who appear to have slightly preferred the inverse scope sentences.
is perhaps unsurprising that the numerals in Experiment 2 resisted inverse scope interpretations. Because the stimuli forced distributive interpretations for the quantifiers (see sub-subsection 4.1.2), a specific, referential reading would be inaccessible for the numeral quantifiers in the study. In example (30), I provide an example picture from Experiment 2, along with its two inverse scope sentences. I have boxed, in red the two areas of interest in the image.
(30) Picture with inverse scope conditions (Experiment 2) a. There is a circle that is directly above two triangles. b. There is a circle that is directly above at least two triangles.
Because the shapes of interest were always separated, in all cases, subjects were forced to interpret the referents of the quantifiers distributively. Thus, (30a) and (30b) which can only be true of the picture on an inverse scope interpretation, would have necessitated exceptionally wide scope for a non-specific quantifier. The lack of an ability to assign a specific/referential interpretation to the numerals in Experiment 2 accounts for their lack of wide scope. This is predicted by the theory of Scope Economy and Shortest Move which allow for scope shifting operations only in the case that they affect the semantic interpretation. Because the necessary semantic representation could not be accessed from the pictures, the scope shifting operation necessary to allow inverse scope could not be applied. The effects of distributivity and referentiality coupled with Scope Economy help to explain the effect seen in Experiment 1 where BNs were rated lower than MNs (especially in inverse scope). Ussery (2008) notes that cumulative interpretations are preferred for numerals and the stimuli used in the study forced distributive interpretations. It is possible that the preference for cumulative readings is stronger for bare numerals than modified numerals. I leave this as an area of active research. Recall the simple main effect of quantifier type in Experiment 1 where MNs were rated higher than BNs in inverse scope conditions. Additionally, a similar pattern was seen in Experiment 2, where MNs (mean rating = 2.79) were rated higher than BNs (mean rating = 2.56); however, this did not reach significance (p = 0.320). Ultimately, more data is needed to understand this asymmetry. With these results in hand, is is clear that the QR theory with Scope Economy predicts the observed data most accurately.

Conclusion
Three theories quantifier scope in natural language Quantifier Raising (May 1985) with Scope Economy (Fox 2000), Feature Checking (Beghelli & Stowell 1997), and Choice Functions (Winter 1997) were tested in their empirical coverage for the scope of bare and modified numerals. The results of Experiment 1 argue against the Feature Checking theory which predicts that MNs cannot take inverse scope and the results of Experiment 2 revealed that the scope of numeral quantifiers is clause-bound. This finding argues against the theory of Choice Functions which does not restrict extra-clausal scope. It is argued that excep-tionally wide scope did not obtain because of an inability to assign referential/specific interpretations for the referents of these quantifiers. This leaves the theory of QR with Scope Economy as most compatible with the observed facts. Scope Economy predicts that extra-clausal scope should not obtain unless there is a meaning difference and the stimuli restricted the interpretation of the numeral quantifier to be non-referential.
The effect detected in Experiment 1 requires more data. It was suggested that the difference between BNs and MNs was due to a difference in their preference for cumulative interpretations (Ussery 2008). While the Feature Checking theory made the incorrect prediction, the spirit of the theory may be on the right track. Namely, quantifier diversity may be real and quantifiers may not behave as a homogeneous set (Liu 1997;Beghelli & Stowell 1997). Ultimately more data is needed. Such data is necessary if we wish to have empirically accurate models of quantifier scope. These areas are a matter of active research.