The effects of quantifier size on the construction of discourse models

Sentences with quantified expressions involve mental representations of sets of individuals for which some property holds (the reference set), as well as of sets for which the property does not hold (the complement set). Both sets can receive discourse focus with negative quantifiers, while the reference set is strongly preferred with positive quantifiers, complement set focus however being possible if contextually motivated. In an offline semantic plausibility study and two online EEG studies, we investigated whether the complement set is an available discourse entity inherently for positive quantifiers, as it is for negative quantifiers. The results show that while the default focus patterns induced by positive and negative quantifiers are robust, both complement and reference set are represented as discourse entities and this is to our knowledge the first study to show that even positive quantifiers make both reference and complement set mentally represented during discourse processing without contextual influence. We also discuss the impact the results from the two ERP studies have on the functional interpretation of two well known ERP effects: the N400 and the P600.


Introduction
This paper looks at an essential property of quantifiers, namely that of picking out proportions of different size of some known or implied total quantity. We call this QUANTIFIER SIZE and the question we investigate is what importance this property of quantifiers has in the construction of the mental discourse model (see e.g. von Heusinger & Schumacher, 2019). To interpret a quantified statement such as Most teachers attended the meeting, the relation between 'teachers attending the meeting' and 'teachers not attending the meeting' has to be considered. At the discourse level, however, usually only one of these groups of 'teachers', those attending the meeting, is prominent, salient, or focused, in the discourse (in the sense of Ariel, 1988;Brocher & von Heusinger, 2018;Gundel, Hedberg, & Zacharski, 1993, see also Moxey & Sanford, 1987). Positive and negative quantifiers have been claimed to differ in what discourse entities they make available, and these differences have been argued to be lexical in nature (Kibble, 1997;Nouwen, 2003), or to stem from differences in contextual licensing (Moxey, 2006;Moxey, Sanford, & Dawydiak, 2001). In this paper we explore this question further, asking what discourse entities quantified expressions make available in the discourse model.
We are particularly interested in the quantified expressions inherent property of size, i.e. to what extent the proportion between 'teachers attending' and 'teachers not attending' plays any role when the mental discourse model is built up. We investigate whether there are indications that both these groups are mentally represented in the discourse model, even though usually only one is salient enough to be explicit in discourse. We address this question in one off-line judgement study and two online experiments using eventrelated potentials (ERP). At the same time, the results from the ERP experiments will touch on the question of the functional interpretation of the ERP components N400 and P600. Are they both related to contextual integration of lexical items, or is that a function reserved for the P600 (see discussion and references in Delogu, Brouwer, & Crocker, 2019? One of the functions of quantified expressions (e.g. some, no, most, henceforth QEs) is to specify the proportion or number of entities for which some property holds (e.g. Barwise & Cooper, 1981;Keenan & Stavi, 1986). In Fig. 1, we illustrate this using sets: Set A contains 'teachers' and Set B contains 'people attending the meeting' and the intersection between these contains 'teachers attending the meeting'. When we use the QE most, we say that there are more members in the intersection between the two sets, the striped area in Fig. 1, than in the part of Set A that is not in Set B, the shaded area of Set A. If we instead use the QE almost no (almost no teachers attended the meeting), there will be very few members in the intersection compared to members in the rest of Set A. Following Moxey and Sanford (1987), we refer to the intersection as the REFERENCE SET (REFSET) and the part of Set A that is not in Set B as the COMPLEMENT SET (COMPEST).
At the discourse level, most, which is a positive QE, and almost no, a negative QE, behave quite differently. 1 Both of the sentences in (1) again talk about teachers attending a meeting. When referred back to using an anaphoric pronoun (they), negative QES such as almost no make available both the set initially talked about, the REFSET,as in (2a), and the set not mentioned before, the COMPSET, as in (2b). Positive QES, such as most, in contrast, do not readily allow this switch in reference, generally only making the REFSET, (2a), available for reference (e.g. Moxey & Sanford, 1987;Sanford, Moxey, & Paterson, 1996).
(1) a. Almost no teachers attended the meeting … b.
Most teachers attended the meeting … (2) a.
… and that they were present was noteworthy. b.
… and that they were absent was noteworthy.
Previous research on QES in Swedish using offline methods has shown that speaker preferences are even more pronounced than in English: for both positive and negative QES, only one set is focused at the discourse levelthe REFSET for positive QES, and the COMPSET for negative QES Klingvall & Heinat, submitted). In an online processing study, however, there was evidence that both sets indeed compete for focus following negative QES (Heinat & Klingvall, 2020). Whether that is the case also for positive QES has not been tested before. As a first step in investigating whether QE size is relevant in processing, we conducted an offline semantic plausibility experiment, Experiment 1, the results of which indicated that online studies of relative set size may reveal something about the discourse models of QES. In two ERP experiments, we then investigated the online processing of positive QES, Experiment 2, and negative QES, Experiment 3.

Experiment 1: Semantic plausibility study
From previous research using acceptability ratings, it is clear that positive QES in Swedish as a group focus the REFSET, while negative QES focus the COMPSET . In Experiment 1, we wanted to investigate whether such ratings are modulated when the relative set size was manipulated. The broader aim of the study was to learn more about what factors play a role in the discourse model construction of quantified statements.
A number of factors can have an effect on how speakers judge the acceptability of sentences. In addition to syntactic and semantic factors, also processing related factors can play some role for acceptability judgements (Häussler & Juzek, 2017;Juzek, 2015;Schütze, 2011;Schütze & Sprouse, 2013). In the present experiment, the syntactic structure was identical in relevant respects in the material. While lexical semantic or pragmatic factors may always play some role, we used a large number of experimental items to reduce the risk of such factors resulting in a consistent pattern in the ratings. Processing, in contrast, was one factor that we predicted would have a particular effect on the ratings. Processing is known to affect acceptability judgements in the way that a condition that requires more processing than some other condition is likely to lead to lower acceptability (Hofmeister, Casasanto, & Sag, 2012;Hofmeister, Culicover, & Winkler, 2015;Hofmeister & Sag, 2010;Miller & Chomsky, 1963). If set size manipulations play a role in processing, we hypothesized that this would also be reflected in the ratings of the sentences. To our knowledge there are no other studies investigating this and we therefore remained open to how set size would affect the ratings. From research in psychology, it has been shown that our mental representations of the size of objects affect our attentional focus (Collegio et al., 2019). If these effects of size are also relevant in linguistic processing, two scenarios suggest themselves. In the first, a large set will be more salient than a small set, simply because of its size: a large object attracts more attention than a small one. In the second scenario, the number of individual members of the large set will have the opposite effect on the mental representation. In this scenario, a set with fewer members will be more salient because there are fewer members competing for attention. In the following, we use the label LARGE for a QE with a REFSET that is larger than the COMPSET, and the label SMALL for a QE with a REFSET that is smaller than the COMPSET. In Fig. 1, this means that for LARGE QES (both positive, such as almost all, and negative, such as not exactly all), the REFSET, the striped area, is larger than the COMPSET, the shaded area. For SMALL QES (both positive, such as a few, and negative, such as almost no), the REFSET, the striped area, is smaller than the COMPSET, the shaded area.
For the first scenario where the large set is seen as more salient, we formulated the following hypotheses for both positive and negative QES: Ai REFSET reading should be judged as better following LARGE QES, than following SMALL QES. Aii COMPSET reading should be judged as better following SMALL QES, than following LARGE QES. For the second scenario, where the number of members in the large set causes difficulties, we formulated hypotheses in the opposite direction: Bi REFSET reading should be judged as better following SMALL QES, than following LARGE QES. Bii COMPSET reading should be judged as better following LARGE QES, than following SMALL QES.

Material
In a pre-test, the size of seventeen QES (eight positive and nine negative 3 ) was rated by 596 participants. 4 Out of the seventeen QES, we selected six that pick out quantities (i.e. REFSETs) smaller than 20% of the given total amount, and six QES that pick out quantities larger than 75%, see Table 1.
Based on these selected QES, 128 experimental items were constructed. Each QE occurred 42 or 43 times. In addition to a QE in the subject position, the items contained an adjective that was coherent with either a REFSET or a COMPSET reading, as shown below in (3). Out of the 128 item, four received very low ratings in all conditions, indicating that something other than QE size affected the judgements of these items, and they were excluded from the statistical analyses.
The items were manipulated in a 2 × 2 × 2 design: Polarity (POSITIVE (POS) vs NEGATIVE (NEG) QE) x Size of REFSET (LARGE vs SMALL) x Set reading (REFSET (REF) vs COMPSET (COMP) reading, induced by an adjective), as in (3). 5 The sentences were distributed across eight lists in a Latin square design, such that all lists included an equal number of each manipulation but never more than one sentence from the same item. Each list contained 81 filler items, making the total number of sentences each participant read 209. 'The great majority of children fell asleep immediately and that they were so tired confused the parents.' b.
Condition: POS-LARGE-COMP Det stora flertalet barn somnade direkt och att de var så uppspelta förvånade föräldrarna. 'The great majority of children fell asleep immediately and that they were so excited confused the parents.' c.
Condition: POS-SMALL-REF Några enstaka barn somnade direkt och att de var så trötta förvånade förldrarna. 'A small number of children fell asleep immediately and that they were so tired confused the parents.' d.
Condition: POS-SMALL-COMP Några enstaka barn somnade direkt och att de var så uppspelta förvånade förräldrarna. 'A small number of children fell asleep immediately and that they were so excited confused the parents.' e.
Condition: NEG-LARGE-REF Inte riktigt alla barn somnade direkt och att de var så trötta förvånade föräldrarna. 'Not quite all children fell asleep immediately and that they were so tired confused the parents.' f.
Condition: NEG-LARGE-COMP Inte riktigt alla barn somnade direkt och att de var så uppspelta förvånade föräldrarna. 'Not quite all children fell asleep immediately and that they were so excited confused the parents.' g.
Condition: NEG-SMALL-REF Nästan inga barn somnade direkt och att de var så trötta förvånade föräldrarna. 'Almost no children fell asleep immediately and that they were so tired confused the parents.' h.
Condition: NEG-SMALL-COMP Nästan inga barn somnade direkt och att de var så uppspelta förvånade föräldrarna. 'Almost no children fell asleep immediately and that they were so excited confused the parents.'

Method
The experiment was run in computer labs at Lund University and Linnaeus University. The sentences, which were implemented in Google Forms, were presented one at a time on a computer screen. The participants were asked to rate each sentence on a Likert scale from 1 (very unnatural/strange) to 7 (completely natural). Written instructions and practice sentences were given at the beginning of the experiment session. 3 The positive QEs are all monotone increasing and give rise to affirmations at the clause level, for instance, in co-ordination they use the pattern "och … också" ('and … too') rather than "och inte … heller" ('and neither'). The negative QEs are all • monotone decreasing and give rise to denials at the clause level (for instance, they use the reverse pattern for co-ordination from that of affirmations) (see Klima, 1964;Moxey et al., 2001, among others). 4 The participants were asked to state the number they thought that a particular QE corresponded to out of a given maximum in a specific context.
The context was the same in all cases and the participants gave the number for only one QE each. The task was to answer the question: There were 100 students in the auditory. QE of them had been there before. How many do you think had been there before? (Give your answer in numbers), where QE stands for one of the tested QEs (for details, see . 5 Note that the English versions are not literal translations.

Statistical analysis
We followed the recommendations in Juzek (2015) and Häussler and Juzek (2017) and treated thë Likert scores as continuous data and used z-scores for the statistical analysis. The analysis was made using R (R Core Team, 2018) and linear mixed effects models (Baayen, Davidson, & Bates, 2008) using the packages lmerTest (Kuznetsova, Brockhoff, & Christensen, 2017) and Emmeans (Lenth, 2018). The fixed effects in the models were Size (LARGE and SMALL QE), Polarity of the QE (POS and NEG) and Set reference (REFSET and COMPSET) and their interaction. Random intercepts and random slopes for participants and items were included as maximally as permitted by the data.

Results
As seen in Fig. 2a, there were differences in all four conditions with positive QES. For the positive QES with REFSET continuations (the default condition), ratings for LARGE QES (REFSET larger than COMPSET) were higher than for SMALL QES (REFSET smaller than COMPSET). This difference corresponds to the contrast between sentences (3a) and (3c). When positive QES were followed by COMPSET continuations (the non-default condition), ratings for SMALL QES (COMPSET > REFSET) were higher than for LARGE QES. This difference corresponds to the contrast between sentences (3d) and (3b). The statistical analysis (see Table 2) shows that these differences in rating between LARGE and SMALL positive QES are significant in both the REFSET and the COMPSET conditions.
The negative QES also showed differences in all conditions, as seen in Fig. 2b. For the negative QEs with COMPSET continuations (the default condition), ratings for SMALL QES (REFSET smaller than COMPSET) were higher than for LARGE QES (REFSET larger than COMPSET). This difference corresponds to the contrast between sentences (3h) and (3f). As shown in Table 2, this difference is significant. When negative QES were followed by REFSET continuations (the non-default condition), ratings for LARGE QES were slightly higher than for SMALL QES, but the pairwise comparisons in Table 2 show that the difference is not significant. This difference corresponds to the contrast between sentences (3e) and (3g).

Discussion of experiment 1
As pointed out above, it has been shown that with contextual manipulations the attentional focus can in some cases fall on the COMPSET even for positive QES (Moxey, 2006;Moxey & Filik, 2010;Moxey et al., 2001Moxey et al., , 2009). This suggests that the COMPSET can be accessible, at least in some cases, for positive QES. In this experiment, we investigated whether the focus patterns of QES are affected by the relative size of the REFSET and the COMPSET. We found that set size indeed has an effect on speaker judgements of quantified statements. The lack of previous research regarding the effects of relative set size on attentional focus made this an exploratory experiment. We formulated two opposite hypotheses regarding the direction of the effect of size and the results are compatible with the one on which the largest set is more salient, repeated here: Ai REFSET reference should be judged as better following LARGE QES, than following SMALL QES. Aii COMPSET reference should be judged as better following SMALL QES, than following LARGE QES. For positive QES in both set readings, the relative set size affected the ratings. The ratings were significantly higher when the set that the adjective focussed was also the largest set. We interpret these findings as an indication that the COMPSET is taken into account also in the case of positive QES, even though it is not available for anaphoric reference. More precisely, the reason that SMALL QES were rated lower than LARGE ones in the REFSET condition could be that the relatively larger COMPSET interferes in this case, and if a larger COMPSET makes COMPSET readings better, it also seems that this set is considered in some sense when the sentences are rated.
For negative QES, there was a significant difference in ratings between SMALL and LARGE QES only in the default COMPSET condition. As with positive QES, QES picking out the larger set (SMALL QES for COMPSET readings) were rated as significantly better than QES picking out the smaller set (LARGE QES for COMPSET readings). Interestingly, this difference was approximately double the size of that for positive QES. In the non-default REFSET condition, on the other hand, the difference in ratings was in the expected direction but was not large enough to reach significance. Contrary to our expectations thus, size seems to have mattered more in the default condition for negative QES than it did for positive QES, but less in the non-default condition. It is not immediately obvious to us why size would play such an important role in the default condition, but not in the non-default condition. One possibility is that the effects are largely driven by the ratings of LARGE negative QES. These received overall lower ratings than all other QEs, 6 enhancing the difference between LARGE and SMALL in the COMPSET condition and mitigating the differences in the REFSET condition. We think there are some possible explanations for the low ratings of these QES that are not related to size. All LARGE negative QES contain the overt negation inte 'not', which might in itself lead to lower ratings. In addition, these QES are stylistically marked to some degree and might therefore differ in use from the other QES. All of these things are possible reasons for the low ratings (see Schütze, 1996). Given that the LARGE negative QES were rated lower overall, we think the results for negative QEs should be interpreted with caution. 7 Thus, QE size has an effect on speaker judgements, although this effect is not strong enough to turn non-default focus readings into preferred ones: COMPSET readings with positive QES, and REFSET readings with negative QES were still judged as overall bad. Size manipulations, however, modulate these preferences. This is in fact in line with previous studies investigating the role of expectations and desires on QE focus readings (Moxey, 2006;Moxey et al., 2009;Moxey & Filik, 2010;Upadhyay et al., 2019). In those studies too, the manipulations modulated the patterns but did not reverse them. Moxey and Filik (2010) report, for instance, that the number of COMPSET continuations increased for positive QES when an explicit desire for 'all' was coupled with an actual outcome of 'a few', indicating a large shortfall, compared to when the actual outcome was 'nearly all', indicating a small shortfall. REFSET continuations were however by far more common in both cases. For the reverse case with negative QES, a large surplus increased the number of REFSET continuations considerably compared to cases with a small surplus but again did not reverse the pattern.
One possibility is that these differences in rating that we find in Experiment 1 are a reflection of what happens in processing. That is, it might be the case that size manipulations increase the salience of a non-default set to such extent that it causes disruptions in the processing of the sentences. To test this, we designed two ERP experiments where we investigated the processing of LARGE and SMALL positive and negative QES, Experiments 2 and 3, respectively.

Experiment 2: The processing of positive QEs
In order to investigate the effect of relative set size in the processing of positive QES, we measured the ERP effects on the particular word that explicitly points to either the REFSET or the COMPSET (the adjective trötta/uppspelta 'tired/excited'), as in (4), illustrating only the LARGE condition, not the SMALL. 'The great majority of children fell asleep immediately and that they were so tired confused the parents.' b.
Condition: POS-LARGE-COMP Det stora flertalet barn somnade direkt och att de var så uppspelta förvånade föräldrarna. 'The great majority of children fell asleep immediately and that they were so excited confused the parents.'  When listeners encounter anaphors, memory representations of possible antecedents in the discourse representation are activated, and as the discourse proceeds, antecedents are integrated into the unfolding discourse (Barkley, Kluender, & Kutas, 2015; van Berkum, Brown, & Hagoort, 1999). The process of activation links a concept with two linguistic forms, the form of the antecedent and the form of the anaphor (for example Mary and she) (see Coopmans & Nieuwland, 2020, and references therein). We assume that the pronoun is interpreted as referring to the default set at this point since there is nothing in the context that would make the non-default set more salient. 8 Since the REFSET is the default set for anaphoric reference for positive QES, it should be even easier to establish reference to this set in the condition where the REFSET is the larger set compared to the condition where the COMPSET is the larger set. In the condition with a larger REFSET, both the relative set size and the default focus work together in making this set the most salient one.
Whether the set that is actually targeted in the sentence is coherent with the default REFSET reading is not clear until the adjective (trötta/uppspelta 'tired/excited' in 4) is presented. By the time the adjective is processed, the antecedent of the pronoun should be integrated in the discourse model, and as a consequence we expected to find ERP-effects on the adjective in the non-default contexts relative to the default contexts. Two components have been argued to signal difficulties of discourse integration: the N400-and the P600-components.
The N400-component is an enhanced negativity, peaking at 400 ms after word onset. In addition to being indicative of lexicosemantic processing (see Kutas & Federmeier, 2011, for a review), this component is also sensitive to how well a word fits with the discourse model under construction (Filik & Leuthold, 2008;Nieuwland et al., 2019;Nieuwland & van Berkum, 2006b; van Berkum et al., 1999;van Berkum, Zwitserlood, Hagoort, & Brown, 2003). The P600-component is a slightly later ERP component of the opposite polarity. It has been shown to indicate, among other things, a new referent in the discourse (Burkhardt, 2006). Importantly, it also signals a word's plausibility in a discourse context, irrespective of its semantic relation to the discourse (Kim & Osterhout, 2005), and is an indication of a more demanding inference process of updating the discourse model (Brouwer, Crocker, Venhuizen, & Hoeks, 2017;Brouwer, Fitz, & Hoeks, 2012;Burkhardt, 2007;Heinat & Klingvall, 2020), or of an attempt to revise an initial parse or interpretation (Roll, Horne, & Lindgren, 2007;Van Petten & Luka, 2012).
In the context of positive QES, it is clear that REFSET reference is always the default reading, but Experiment 1 indicated that the COMPSET can gain salience in the condition where it is larger than the REFSET. In terms of ERP effects, we therefore predicted the following: For adjectives picking out the REFSET, we expected an N400-and/or a P600-effect for SMALL QES relative to LARGE QES. In this condition, the adjective is coherent with the default set reading and it should be easier to integrate the adjective when the REFSET is the larger set. In the COMPSET condition, we expected an N400-and/or a P600-effect for LARGE QES relative to SMALL QES. In this condition the adjective is not coherent with the default set reading and it should show ease of integration when the COMPSET is the larger set (prediction i). Finally, given that the REFSET is the default set for positive QES, irrespective of their size, we also expected a contrast between REFSET and COMPSET for both LARGE and SMALL QEs. More specifically, we expected to see an N400-and/or a P600-effect on the adjectives that are incoherent in the discourse, i.e. adjectives that are congruent with the COMPSET reading, in comparison to adjectives that are congruent with the REFSET reading (prediction ii).
Predictions: If both the REFSET and the COMPSET are competing for focus in the processing of statements with positive QES, there should be.
i N400/P600 effects at the adjective; more specifically, we expected effects for SMALL relative to LARGE in the REFSET condition, and for LARGE relative to SMALL in the COMPSET condition. ii There should also be a general effect on the adjective in both the size conditions such that the COMPSET should show N400/P600 effects compared to the REFSET.

Participants
Thirty-eight students from Lund University, who were all native speakers of Swedish, participated in the study in exchange for cinema vouchers. They were all right handed and had no diagnosed psychiatric or neurological disorders. One participant completed only half of the experiment due to technical problems and as a consequence the data from that participant have been excluded from the data analysis.

Materials
The materials consisted of 160 experimental items, 124 of these from Experiment 1 and the remaining 36 from Heinat and Klingvall (2020), modified in the same way as for Experiment 1. All of them featured only positive QEs and were manipulated along two dimensions: QE Size (LARGE, SMALL) x Set reading (REFSET, COMPSET) of the adjective (Det stora flertalet/Några enstaka barn somnade direkt och att de var så trötta/uppspelta förvånade föräldrarna. 'The great majority of/A small number of children fell asleep immediately and that they were so tired/excited surprised the parents.' See (3a)-(3d) above). The sentences were randomly distributed across four lists in a Latin square design, such that each participant saw an equal number of all conditions but only one sentence from each item. There were also 202 unrelated filler sentences included, making the total number of sentences on each list 362.

Procedure
At the very beginning of the session, the participants were given a consent form to complete, also informing them that all data would be anonymized. Once the electrodes had been applied, the participants were placed in front of a computer screen in the lab. On the first slide, instructions were given. These were followed by 10 practice sentences and time for questions. Only after that did the actual experiment start. The stimuli were presented using PsychoPy (Peirce et al., 2019) and each trial began with a cross-hair displayed centrally on the screen for 800 ms, followed by a blank screen for 200 ms after which the sentences were presented word-by-word for 300 ms each, with a 200 ms blank screen interval between them. A message indicating progress appeared on the screen after half of the sentences had been presented, and a second message after two thirds of the sentences. To keep the participants on task, one fourth of the sentences were followed by a yes/no question about the contents of the sentence. No participant was excluded due to low scorings on the questions (correct answers: mean 91%, range 82%-96%. The experiment lasted about 80 min, varying slightly depending on the length of the participants' pauses and the time they took to answer the questions.
The EEG was filtered (0.01-30 Hz band-width filter) and corrected for ocular artefacts using independent component analysis in EEGLab (Delorme & Makeig, 2004). Data was then segmented into epochs that started 200 ms before critical word onset, and lasted until 1000 ms after adjective onset (trötta/uppspelta, 'tired/excited' in (3)). All epochs were baseline-corrected and then automati-

Data analysis
Following Luck and Gaspelin (2017), we restricted the analysis to pre-defined regions and time windows relevant for our predictions (see also Nieuwland, 2016). Since we were looking for effects that we suspected were very small, we used the plots from Heinat and Klingvall (2020) and by means of visual inspection refined the region (see Luck & Gaspelin, 2017). The new region was determined before visual inspection and analysis of the data in the present study. The resulting Posterior region consists of CP3, CPZ, CP4, P3, PZ, P4, O1, OZ and O2 (TP7/8 have been removed from this region compared to Heinat & Klingvall, 2020).
package for R (Kuznetsova et al., 2017;R Core Team, 2018). The fixed effects in the models were Size (LARGE and SMALL QE) and Set (REFSET and COMPSET) and their interaction. Random intercepts and random slopes for participants and items were included as maximally as permitted by the data. All models reported are the maximal converging models after model comparison.

Results
In the posterior region at the adjective (e.g. trötta/uppspelta 'tired/excited'), there were significant differences between Large QES and Small QES in the COMPSET condition in the timespan 600-900 ms after critical word onset, see Table 3 and Fig. 3. The top half of Table 3 shows that the contrast between the intercept, that is LARGE QE -COMPSET, and SMALL QE is a significant negativity. This means that in the COMPSET condition, LARGE QES (COMPSET < REFSET) showed a positivity in relation to SMALL QES (COMPSET > REFSET). There was also a significant negativity in the condition with LARGE QE -COMPSET (i.e. the intercept) relative to REFSET. In other words, for LARGE QES, the adjectives that focus the COMPSET showed a positivity in relation to the adjectives. that focus the REFSET. In order to see the results for the other contrasts, the model was re-levelled with SMALL QE -REFSET as intercept, but as seen in the lower part of the top half of Table 3, there were no significant differences. The results from the 300-500 ms timespan in the posterior region (the N400) are shown in the lower part of Table 3. There was no significant effect in this timespan.

Discussion of experiment 2
In this experiment, we predicted that if both the REFSET and COMPSET are competing for focus in the processing of positive QEs, there should be an effect of set size at the adjective. We expected effects in both the coherent default REFSET reading and the incoherent COMPSET reading, and indeed found effects in the COMPSET reading. In the COMPSET condition, LARGE QES gave rise to a P600 effect on the adjective in comparison to SMALL QES.
A possible explanation for an effect of set size only in the COMPSET condition is that this is the non-default condition. The P600 is often taken to be an indication of an update or reorganization of the discourse model. Burkhardt (2007) sees the P600 as indicating an increased cost for discourse memory. She found a P600 for words that were semantically coherent, but not inferable from the context (the noun phrase the pistol elicited a P600 in the context A student was killed … the pistol was found … versus A student was shot … the pistol was found …). In such cases, the information in the mental model of the discourse must be restructured and this causes an increased load on discourse memory, visible as a P600 effect, according to Burkhardt. In our study, the adjectives indicating REFSET or COMPSET were always semantically coherent from a lexical point of view (the semantic content of the clauses with the QES were identical, which is probably why there were no significant N400 effects), but incoherent from a discourse perspective depending on what group had the referential focus. When the incongruent adjective is encountered (the COMPSET condition), the discourse model needs reorganizing and this is when the size of the COMPSET matters. When the COMPSET set is large, it is more salient than when it is small, and thereby also easier to accommodate into the discourse model and less taxing on discourse memory (cf Burkhardt, 2007).
We also expected to see effects for the non-default COMPSET condition relative to the default REFSET condition for both LARGE and SMALL QES. For LARGE QES, there was indeed such a difference in the expected direction in the P600 timespan -COMPSET being more positive than REFSET. We think that these results can be accounted for in line with the discussion in the previous paragraph. In the condition with LARGE QES and an adjective picking out the REFSET, all factors enhance processing: the focussed set is the largest set, the REFSET, and the adjective is coherent with the discourse model. When the adjective picks out the COMPSET, on the other hand, all factors make processing more difficult: Since the COMPSET is not the default set, the discourse model needs restructuring, and since the COMPSET is the smaller set, it is less salient and causes an increased cost for discourse memory.
If all factors conspire to make the updating process more difficult in the condition with LARGE QES, as described above, the opposite holds for SMALL QES. As the results show, with SMALL QES, it is easier to restructure the discourse model in the non-default condition, than it is for LARGE QES, because for SMALL QES, the COMPSET is bigger and more salient. The ERPs in the comparison between REFSET-COMPSET reference is therefore not significant in the SMALL condition.
Other results supporting such an account come from a self-paced reading study conducted by Sanford et al. (1996). Sanford and colleagues investigated reading times for sentences very similar to the ones in the present study. A QE (positive or negative) was followed by a sentence containing a phrase targeting either the REFSET or the COMPSET (such as their presence or their absence). As expected, the results from the experiment confirmed that positive QES favor the REFSET for anaphoric reference and negative QES the COMPSET. However, what is more interesting from the perspective of our study is that the two positive QES used by Sanford et al. (1996) differed in size: a few is a small QE, and many is a large QE. While the individual QES were not compared in the statistics, the data presented indicate that QE size mattered to some degree. In the default REFSET condition there was no real differences in reading times between the QES, but in the non-default COMPSET condition, the LARGE QE many had markedly longer reading times than the small QE a few. So the same situation obtains in their experiment: a large COMPSET (that is a SMALL QE) facilitates the restructuring of information in the discourse model.
Taken together, we interpret the results from Experiment 2 to mean that QE set size matters in processing only when the information in the discourse model has to be restructured. The larger the COMPSET is, the easier it is to accommodate in the discourse model. We take this to mean that the COMPSET is indeed part of the discourse model, but that the REFSET and the COMPSET are not competing for discourse focus to the extent that there is referential ambiguity.

Experiment 3: The processing of negative QEs
From previous studies it is clear that negative QES differ from positive ones in that both the REFSET and the COMPSET are possible antecedents for anaphoric reference without any contextual manipulations, even though the COMPSET is the favored set (Filik et al., 2011;Paterson, Sanford, Moxey, & Dawydiak, 1998;Sanford et al., 1996). The rationale for Experiment 3 was to investigate what effects QE size has when both sets are indeed possible antecedents. In Experiment 2 we saw that QE size plays a role primarily in the strongly incoherent condition for positive LARGE QES. In light of this, our expectations for Experiment 3 were as follows.
As in Experiment 2, we expected to see effects of size on the adjective in the non-default condition, which is the REFSET condition for negative QES. In this condition, processing should be facilitated if the default COMPSET is small and the REFSET is large. Unlike for positive QEs, we did not expect to see significant overall differences between REFSET and COMPSET readings at the disambiguating adjective within each of the size conditions (i.e. for LARGE and SMALL, respectively), since both sets are available to some extent. No significant differences between REFSET and COMPSET for negative QES were found in Heinat and Klingvall (2020).
In sum, the prediction for Experiment 3 was the following. Prediction: If size affects anaphoric resolution and if size plays a role for restructuring the discourse model, there should be.
i P600 (/N400) effects at the adjective; more specifically, we expected the adjective fol-lowing SMALL QES to show a positivity in the P600 time span (and/or a negativity in the N400 time span) relative to adjectives following LARGE QES in the REFSET condition.

Method & material
Procedure, EEG recording, processing and data analysis were identical to what has been described for Experiment 2 above. No participant had more than 20% of the epochs removed from analysis (remaining epochs per condition: Adjective

Participants
Thirty-seven students from Lund University, who were all native speakers of Swedish, participated in the study in exchange for cinema vouchers. They were all right handed and had no diagnosed psychiatric or neurological disorders.

Materials
As in Experiment 2, there were 160 experimental items taken from the same sources, manipulated in the same way (Size x Set reference), but all featuring a negative QE (Nästan inga/Inte riktigt alla barn somnade direkt och att de var så trötta/uppspelta förvånade föräldrarna. 'Almost no/Not quite all children fell asleep immediately and that they were so tired/excited surprised the parents.' See (3e)-(3h) above). The sentences were distributed across four lists in a Latin square design, such that each participant saw an equal number of all conditions. 222 filler sentences were also included, making the total number of sentences read by each participant 382. No participant was excluded due to low scores on the comprehension questions. The mean for correct answers were 90.3%, and the range 79%-98%.

Results
In the posterior region, there was a marginal effect in the P600 timespan at the adjective (e.g. trötta/uppspelta 'tired/excited') for SMALL relative to LARGE in the REFSET condition, as shown in Table 4 and Fig. 4. That is, SMALL QES were more positive than LARGE ones. The opposite pattern was found in the COMPSET condition; LARGE QES were more positive than SMALL QES. However, this effect did not reach significance. There was no effect in the N400 timespan. However, as pointed out by Delogu et al. (2019), P600 effects to semantic manipulations vary in their latencies, from 600 ms to above 1000 ms after the onset of the critical word (see Delogu et al., 2019 and references therein). We therefore followed their procedure and divided the P600 time window into three 200 ms time windows with 100 ms onset intervals: 600-800 ms, 700-900 ms and 800-1000 ms, to see where the effect was the most pronounced (Delogu et al., 2019). As we can see in Table 5, there is a significant effect of size in the non-default condition where the adjective focusses the REFSET in the 800-1000 ms timespan. 9

Discussion of experiment 3
Based on the findings in Experiment 2, we expected an effect of size in the non-default REFSET condition on the adjective and indeed found a P600 effect there, although the effect appeared later for negative QES compared to positive ones. We interpret this effect in line with Experiment 2: when there is an incongruency in the sentence, as with a REFSET reading with negative QES, the discourse model needs restructuring and this gives rise to a P600 effect. The fact that this effect was later for negative QES, we think is due to the fact that REFSET focus is not completely ruled out for negative QES. In the same vein, in their eye-tracking study, Paterson et al. (1998) found only very late effects of set interpretations for negative QES, compared to positive QES, and these effects were only present in regions later than the disambiguating region (see also Filik et al. (2011) for differences in the processing of positive and negative QES measured in ERPs).

General discussion
When interpreting quantified statements, we consider the relation between the set of entities for which some relevant property holds, the REFSET, and the set of entities for which the property does not hold, the COMPSET. With negative QES, such as few and not all, it is clear that both of these sets are also accessible at the discourse level since both can serve as antecedents to anaphoric pronouns (although the COMPSET is the default set). With positive QES, such as a few and almost all, in contrast, it is much less clear that both sets are accessible (e.g. Moxey & Sanford, 1987;Sanford et al., 1996). It has been shown, however, that with contextual manipulations the attentional focus can in some cases fall on the COMPSET even for positive QES (Moxey, 2006;Moxey & Filik, 2010;Moxey et al., 2001Moxey et al., , 2009. This suggests that the COMPSET is accessible to some degree, after all, for positive QES. In this paper we set out to investigate if the attentional focus can also be affected by inherent QE properties, other than polarity. In three experiments we looked at this issue, by manipulating the size of the QES, i.e. investigating the focus properties of QES whose REFSET is large (and COMPSET is small) and QEs whose REFSET is small (and COMPSET is large). Research in psychology has shown that perceived object size affects attentional focus in general (Collegio et al., 2019). Our experiments suggest that set size also matters for the attentional discourse focus in some particular cases when it comes to language.
Across the three experiments, the same general patterns obtained: large sets showed a benefit over small sets. The extent to which this was manifested in the experiments differed though.
In the semantic plausibility study, Experiment 1, large sets received higher ratings than small sets in both the REFSET and COMPSET conditions for positive QES, and in the COMPSET condition for negative QEs. This was also the case in the REFSET condition for negative QES, but not significantly so. These findings are interesting in at least two ways: firstly, the fact that it is larger sets rather than smaller sets that enhance ratings could be interpreted to mean that larger sets receive more attentional focus; secondly, the fact that ratings for positive QES in the non-default COMPSET condition were affected by set size indicates that properties of the COMPSET are indeed taken into account also for positive QEs when sentences are rated. Although the effects of set size manipulation were not large enough to turn otherwise non-default readings (COMPSET for positive QES, REFSET for negative QES) into preferred ones, they did modulate the ratings. This is in fact in line with previous studies investigating the role of expectations and desires on QE focus readings (Ingram & Ferguson, 2018;Moxey, 2006;Moxey et al., 2009;Moxey & Filik, 2010;Upadhyay et al., 2019). In those studies too, the manipulations modulated the patterns but did not reverse them. 9 Relevelling shows that there is no significant effect of size in the COMPSET condition.
If both expectations and set size can play a role for the discourse structure of QES, we might ask if these are related, or even the same thing. Recall the ideas argued for by Moxey and colleagues (e.g. Moxey, 2006;Moxey et al., 2009Moxey et al., , 2001) that COMPSET focus is more likely to appear when there is a shortfall between an expected (whether implicit or explicit) quantity and a stated quantity. Almost all negative QES give rise to such shortfall readings by themselves: they all imply that a larger quantity was expected. For these QES, size will conspire to this effect. SMALL negative QES (in our study: få 'few', inte många 'not many', nästan inga 'almost no') have a potentially big shortfall, while all the LARGE negative QES (inte riktigt alla 'not quite all', inte exakt alla 'not exactly all', inte precis alla 'not precisely Signif. codes: '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1. Model = lmer(Ampl ~ Size * Set + (1 + Size | Participant) + (1 + Size | Item)).  all') can only have a small shortfall. This is the pattern we see in the plausibility study: COMPSET being rated as better with SMALL QES (= big COMPSET) than with LARGE ones. For positive QES, it is more difficult to see how implicit expectations and a potential shortfall would play a role. While the SMALL positive QES (några 'some', några enstaka 'a small number of', några få 'a few') all have a big COMPSET, it is not clear that they imply that a larger quantify was expected. In the same way, LARGE positive QES (nästan alla 'almost all', i stort sett alla 'virtually all', det stora flertalet 'the great majority') do not in themselves seem to imply any expectations of quantities. Effects of size for positive QES can therefore not as readily be explained in terms of implied expectations and shortfalls. On the other hand, the effects of size show us that the COMPSET is cognitively active also for positive QES, and this is presumably also why it can receive the attentional focus in those cases where the context helps bring it out. The online experiment on the processing of positive QES, Experiment 2, partly confirmed the findings in the offline experiment. In the non-default COMPSET condition, large sets were processed with more ease than small sets, small COMPSETS showing P600 effects relative to large ones at the adjective. In addition, for QES with a large REFSET (condition LARGE), there was a processing cost at the adjective, when these were coherent with a COMPSET reading rather than the default REFSET reading, again manifested as a P600-effect. For positive QES, the REFSET is always the default reading, so the discourse model has to be restructured in case the adjective is not coherent with this reading. This restructuring is what causes the P600-effect (cf. Brouwer et al., 2012Brouwer et al., , 2017Burkhardt, 2007). Restructuring is less costly when the non-default COMPSET is large than when it is small, as the large set receives more attentional focus in the discourse model. To our knowledge, these are the first results showing that the COMPSET is available in the processing of sentences with positive QES, without expectations set up by a context.
For negative QES, Experiment 3, the patterns were similar to those for positive QES in the nondefault condition, but the effect was slightly later. 10 It is interesting that the effects of size are found in the non-default condition for both positive and negative QES. The type of effect found, a P600, indicates that at the point when the adjective that is incoherent with the default reading is encountered, the discourse model has to be restructured. It is in this context that size plays a role: the non-default set receives more attentional focus when it is large than when it is small and therefore causes less disruptions in the processing than small sets do. For positive QES, the nondefault COMPSET is non-default to a much higher degree than the REFSET is for negative QES and the effects are thus earlier for these than for negative ones. The default set, for both positive and negative QES, has so much attentional focus in the discourse model that the size of this set is irrelevant when no restructuring of the model is necessary, i.e. in the default conditions. We think that is why there are no effects of size in these.
The manipulation of size resulted in more differences in offline measures than in online measures. In the offline experiment, size modulated the ratings of both the default and non-default readings, but in the online experiments, there were effects of size only in the non-default conditions. It seems likely that ratings in this case only partly reflect the online processing as measured in the ERP experiments. They may also reflect effects that appear much later in the processing, at stages later than we have examined.
Finally, the results in the two ERP experiments are interesting from a more general neurolinguistic perspective. They have direct bearing on the ongoing debate about the functional interpretation of the two ERP components we investigated, the N400 and the P600. The fact that the lexical manipulation of the adjective gives rise to only a P600 effect, but no N400 effect for both positive and negative QES in our experiments can be accounted for in a view that sees the N400 as an indicator of lexical retrieval, but not semantic/discourse integration, and the P600 as an indicator of semantic/discourse integration (see e.g. Brouwer et al., 2017Brouwer et al., , 2012Delogu et al., 2019, and references therein). In particular in the case of positive QES, the results from Exp. 1 show that the adjectives focussing the COMPSET are rated very low, i.e. they are not given by the context in comparison to the adjectives focussing the REFSET. If the N400 indicates the predictability and integration of word meaning into the discourse context, we would expect to see an N400 effect on the adjective in the non-default conditions, COMPSET for positive QES and REFSET for negative QEs. However, we do not find an N400 effect. This result is fully compatible with an approach that sees the N400 as indicator of access/retrieval, but not contextual integration (Brouwer et al., 2012;Kutas & Federmeier, 2011;Lau, Phillips, & Poeppel, 2008;van Berkum, 2009). It should be pointed out, though, that the experiments were not set up to address this question specifically and we will refrain from drawing too far reaching conclusions on a null result regarding the N400, and just note that the results are compatible with such an approach, but that the experiments lack a manipulation directly addressing the question of lexical prediction and discourse integration.
To conclude, this paper confirms the picture from previous studies that negative QES have the possibility of focusing either the COMPSET or the REFSET. The results from Experiments 1 and 3 show that set size does play a role for the processing and acceptability of the sentences with negative QES. In addition, this study shows that positive QES focus the REFSET very early on in the processing, making anomalies immediately detectable. More important, however, is the new finding that when the anomaly is detected, the results from both Experiments 1 and 2 indicate that the. COMPSET is part of the discourse model even for positive QES. The larger the COMPSET is, the easier the information in the discourse model can be updated. Crucially, the effects we see from the COMPSET with positive QES are not induced by contextual manipulations such as an explicit or implied shortfall, but by an inherent property of QES, size. It may be the case that it in fact is the availability of the COMPSET in the discourse model that makes contextual manipulation of positive QES possible at all. In order to gain a more complete picture of how quantified expressions are integrated in discourse models, we think this interaction between inherent qualities of QES, on the one hand, and context and the expectations it induces in language users, on the other hand, must be explored in more detail. 10 It should be noted that in their investigation of REFSET and COMPSET focus, Filik et al. (2011) found that the ERP effect of reference to the sets was different for positive QEs compared to negative QEs.