Differences between Spanish monolingual and Spanish-English bilingual children in their calculation of entailment-based scalar implicatures

Kristen Syrett1, Anne Lingwall2, Silvia Perez-Cortes3, Jennifer Austin4, Liliana Sánchez2, Hannah Baker5, Christina Germak5 and Anthony Arias-Amaya5 1 Department of Linguistics, Rutgers University, New Brunswick, NJ, US 2 Department of Spanish and Portuguese, Rutgers University, New Brunswick, NJ, US 3 Department of World Languages and Cultures, Rutgers University, Camden, NJ, US 4 Department of Spanish and Portuguese Studies, Rutgers University, Newark, NJ, US 5 Rutgers University, US


Introduction and background
It is by now well known that in addition to the meaning generated by semantics, there is an extra layer of meaning provided by pragmatics that arises in the process of language usage. The challenge arising from this combination of meanings is that a child acquiring language must not only learn the meaning of words and propositions, but also understand what they signal when a speaker utters them in a discourse context. For example, uttering, Allison is allergic to peanuts, could simply be interpreted as a true statement if the predicate holds of Allison, but in the context of a mother referring to her daughter in a preschool setting, this utterance could be taken as an indication that the teacher needs to take extra steps to ensure that the child is entering a nut-free school environment and doesn't have a severe allergic reaction as a result of peanuts being brought into school.
The fact that many utterances signal more than the strict logical (semantic) meaning is most notably associated with the calculation of conversational implicatures, which arise from guidelines governing how speakers use language to communicate. One model of such guidelines comes from Grice (posthumously published in 1989), and discussed by Horn (1972) who proposed the following principle. (1) Cooperative Principle (Grice 1989) Make your contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.
Unless there is reason to think otherwise, speakers are usually taken to abide by the Cooperative Principle. Grice further proposed that speakers therefore also adhere to four related maxims (quantity, quality, relation, manner), although these have been reformulated over the years. Despite substantive and presentational differences among researchers, one idea is usually adopted: that speakers should be as informative as required, make the strongest claim possible, and should therefore not make a weaker statement rather than a stronger alternative, unless there is a good reason for doing so (Grice 1957;Harnish 1976;Levinson 1983;Horn 2004). The meaning that is derived from this principle is generally referred to as a "quantity implicature", or more specifically as a "scalar implicature" (SI). Horn (1972) proposed that certain quantificational terms may be ordered along an entailment-based scale in order of strength. For example, might and must are alternatives on a scale of probability or modality, think, believe, and know are alternatives on scale of knowledge, or beliefs, and some, many, and all are alternatives on a scale of quantity. Importantly, in each case, the strongest term (must, know, all) semantically entails the weaker ones, and any assertion with a weaker term has the potential to be compatible with a stronger statement. However, the extra layer of meaning provided by pragmatics carries the implication that if a speaker delivers an assertion containing the weaker term, then s/he does not know that a stronger term holds -or knows that it does not hold. Thus, if comedian Jerry Seinfeld asserts that he has 47 Porsches, he invites the hearer to calculate the scalar implicature that he does not, alas, have 50.
This interplay between the semantic and pragmatic meaning has proven a hotbed for child language acquisition research in the last few decades, especially where scalar implicatures are concerned. (This is to say nothing of the intense focus on SIs in adult psycholinguistic studies, which we do not review here.) Researchers around the globe have investigated whether children are able to calculate such implicatures (Noveck 2001), and if so, whether their ability to do so depends on the lexical items featured in the experiment (Papafragou & Musolino 2003;Hurewitz et al. 2006;Pouscoulous et al. 2007), the nature of scalar implicature being targeted (Papafragou & Tantalou 2004;Barner et al. 2011;Stiller et al. 2015), the amount of contextual and pragmatic support for implicature calculation (Papafragou & Musolino 2003;Guasti et al. 2005), the type of judgment children are permitted to provide and the type of task administered (Miller et al. 2005;Katsos & Bishop 2011;Foppolo et al. 2012;Syrett et al. in press), whether children are given evidence for the importance of contrasting scalar alternatives within an experimental session (Vargas-Tokuda, Gutiérrez-Rexach & Grinstead 2008;Foppolo et al. 2012;Skordos & Papafragou 2016;Syrett et al. in press) and whether implicature calculation is delayed relative to the generation of semantic material (Huang & Snedeker 2009).
While the vast majority of child language acquisition experiments on SIs have investigated how monolingual children approach scalar implicatures, a few have extended the focus to bilingual children (Siegal, Matsuo & Pond 2007;Siegal, Iozzi & Surian 2009). Although these studies report a slight advantage for bilinguals over monolinguals in certain areas of pragmatic reasoning, which is attributed to the enhanced executive function abilities conferred by bilingualism (Bialystok et al. 2009), the documented chance-level pattern of bilinguals in SI calculation in these tasks, and the lack of a multilingual advantage for pragmatic abilities found in other research (Antoniou et al. 2013) leaves open the question of whether and how exactly monolinguals and bilinguals differ in the way they approach SIs. At the same time, whether or not bilingualism confers any advantage, there is also the possibility of finding cross-linguistic influence at the interface between syntax and pragmatics (Hulk and Müller 2000;Müller and Hulk 2001;Serratrice, Sorace & Paoli 2004;Serratrice 2007). This influence may result in a different response pattern for bilinguals than for monolinguals, given the specific lexical items targeted. Since previous studies did not compare two languages that have contrasting lexical entries for the words under investigation, the question of how the bilingual child fares when juggling competing lexical entries across (and perhaps also within) languages is also left open. We find just such a circumstance with Spanish-English bilinguals, and take this opportunity to investigate how the bilingual child interprets the diverging Spanish quantifiers algunos and unos, contrasted with each other and some in English.
In this paper, we report two experiments investigating how Spanish monolingual and Spanish-English bilingual children age 4-5 approach sentences that should trigger SIs. We anchor these responses against those from adult fluent Spanish heritage speakers living in the same geographic region. 1 The two tasks we present provide participants with little contextual support relative to other tasks that have been shown to boost pragmatic reasoning and contrast among interpretations of scalemates. 2 As a consequence of removing such support, we place both monolingual and bilingual children at a disadvantage for calculating SIs. We therefore predict that we will observe in all child participants suppressed SI calculation and relatively high acceptance rates for 'some, if not all' interpretations that reflect the semantics without a pragmatically-induced upper bound, and in adults a high rate of SI calculation and relatively lower acceptance for the 'some, if not all' interpretation. However, given the challenges presented by the overlap of lexical items expressing the meaning of 'some' in Spanish and English, we further predict that the bilingual children will be at a disadvantage relative to the monolingual children, as they are in the process of sorting out the distinct lexical entries for these items within and between the two languages.
The rest of the paper proceeds as follows. In section 2, we present background on Spanish indefinites unos and algunos, in comparison to English some, along with evidence that Spanish monolinguals can compute implicatures with algunos in certain tasks. In section 3, we present methodological background common to both experiments. We present our two experiments in sections 4 and 5. Finally, we conclude and discuss directions for future research in section 6.

Background: Spanish unos and algunos
English some is compatible with both a 'some, but not all' implicature in which there is an upper bound, and a 'some, if not all' existential interpretation, in which there is no upper bound (Horn 2008). The linguistic and extralinguistic contexts in which an utterance with some appears, as well as the prosodic realization of some, can favor one reading or the other. For example, in (2), the implication is that some, but perhaps not all, archaeologists engage in this kind of work, while others do not.
(2) Today, some archaeologists work with linguists and poets to preserve the once-lost Mayan language. 3 The fact that this 'not all' upper bound can be canceled as a pragmatic implicature, even with a partitive, is illustrated in (3).
(3) American Airlines has stopped some of (if not all of) its on-board duty free shopping. 4 Other times, SIs appear not to be relevant (or not available), and only the existential interpretation is available, as in both occurrences of some in (4).
(4) Supermarkets are overwhelming and intimidating. You're in a rush and you pop into a supermarket for some basics, and you end up spending $200 (and making some bad shopping decisions along the way). 5 In contrast to English, Spanish has two separate lexical entries for 'some': unos and algunos. Like English, both are weak indefinites, carrying existential quantificational force (Gutiérrez-Rexach 2001;Martí 2008;Fabrégas 2010). The two terms have overlapping distribution, which highlights their similarity with some, as shown in (5) and (6)  However, these two terms diverge from each other in that algunos requires context dependence and reference to a salient set in the discourse, whereas unos does not, as shown in the following example from Martí (2009, also her (7)). Imagine a context in which A and B are university mathematicians, and have not been thinking or talking about children for a long time, and A comes running up to B saying, ¿Sabes qué? 'You know what?' The utterance in (a) is licensed, while the one in (b) is not, since algunos requires that a set of children is salient in the discourse context, which it is not.  The consequence of this difference for SIs is that algunos, by virtue of the way it is linked to a salient set in the discourse, carries the 'some, but not all' implicature. Although it is part of the lexical entry, it is still cancelable. Thus, algunos and unos are distinguished from each other, but algunos also differs from some in its tight connection to this SI. (See Syrett et al. in press for more in depth discussion of the differences between unos, algunos, and some, and for discussion of the different interpretations of some in English.) Monolingual children age 4-6 appear to be sensitive to both the semantics/pragmatics of algunos as well as the way in which it differs from unos. In act-out tasks with directions containing algunos, they appear to consistently assign a 'some, but not all' upper limit (Miller et al. 2005), and in some judgment tasks, they appear to calculate the SI for algunos but not for unos, and suspend SI calculation when the lexical trigger appears in a downward entailing environment (a linguistic environment that dissolved SIs) (Vargas-Tokuda, Gutiérrez-Rexach & Grinstead 2008). Although there are no previous results directly speaking to this fact, we might predict that monolingual children -who have it in their capacity to calculate SIs with algunos in certain circumstances -would perhaps fail to do so in tasks that offered much less contextual and methodological support.

Participant background information
All bilingual child participants were recruited from two Spanish-English bilingual preschools in central New Jersey (USA). The children were simultaneous bilinguals, who had been exposed to Spanish from birth and English between birth and 36 months of age, and were exposed to both English and Spanish in school at the time of data collection. All monolingual children were native Spanish speakers who were living in a Spanish-speaking environment (Spain or Peru, depending on the group). Each child was tested individually in a quiet room, separate from the child's class. Each child's parent provided consent for him/her to participate, with an additional layer of consent to record the experimental session. Video recordings were used for later transcription and verification of the child's responses.
Twenty adults (18-34 years, mean: 22.5 years) participated; one was excluded for missing more than half of the control items on two of the experiments. All adult participants were undergraduates at a public university in Northern New Jersey who had been recruited through the participant pool of their Psychology courses. All were bilingual heritage speakers of Spanish who had been exposed to Spanish since birth. They were proficient in both languages as verified by self-report (Spanish and English) and proficiency exam (Spanish). Most of the bilinguals spoke Caribbean dialects of Spanish (from the Dominican Republic or Puerto Rico), but participants also spoke dialects from Peru, Ecuador, Spain, Argentina, Colombia, El Salvador, and Mexico. 6 These participants began learning English when they were between zero and three years old (n = 5), four to seven years old (n = 10), or eight to fifteen years old (n = 5).

Proficiency measure for bilingual children
To determine inclusion of the bilingual children, we used a Spanish proficiency exam adapted from the Bilingual English Spanish Assessment (BESA) exam (Peña et al. 2014), which included a forced-choice picture task measuring children's knowledge of number and gender agreement, as well as their understanding of unos and todos. For example, we showed the children two pictures side by side, one depicting a single bear sleeping in a cave and the other with several sleeping bears, and then asked, ¿Dónde duermen unos osos? ('Where are some bears sleeping'), or a display with four bikes, three of which were being ridden by a girl, and asked if all of the girls were riding bikes. We also elicited names of the lexical items used in our tasks by presenting children with pictures of the entities (animals and inanimate objects, such as an apple, horses, a shoe, cows, etc.) to test their familiarity with those words in Spanish, and their production of gender and plural marking. We used the responses in this proficiency task and performance in the individual tasks as a means to exclude from the data any children who demonstrated lack of mastery with grammatical gender and number (below or at chance level), and did not know the names of the lexical items used in the experiments (either not at all, or not in Spanish). Those children who only responded in English were not included in the participant count.

Background
Experiment 1 was inspired by Noveck's (2001) sentence evaluation task (his Experiment 3), which was based on Smith (1980). This work was also replicated and extended by Guasti et al. (2005) with Italian speakers (their Experiment 1). Noveck (2001) implemented an important change to Smith's original design, transforming her questions into declaratives, which expressed a proposition that could be assigned a truth value. Noveck then presented French children age 8-10 and adults with a series of sentences, which express possible facts about the world, asking participants to judge them as True or False. All of the sentences had the same form ([Some X / All X] verb Y), as in (8). (8) [Some/All] giraffes have long necks.
Among these sentences were sentences that were True, but infelicitous (as in the 'some' version of the sentence above, French certains in his task), those that were True (as in the 'all' version), and those that were false (e.g., All chairs tell time). The prediction was that participants would judge such a sentence to be False if they calculated the 'some, but not all' SI (since all giraffes have long necks, and saying that 'some' do is underinformative), but True if they found some to be (semantically) compatible with all (tous), interpreted as 'some and possibly all'. (See Pouscoulous et al. 2007, however, for evidence of a difference for increased SI calculation with quelques relaties to certains, both 'some' in French.) In this previous study, children diverged from the adults on the underinformative (True, but infelicitous) items. While adults were split between accepting and rejecting these statements, averaging 41% overall agreement (a pattern highlighted in follow-up analysis and experimentation by Guasti et al. 2005 in Italian), 26 of the 31 children agreed with four or more of the five "True, but infelicitous" statements, thereby indicating that they had not found 'some' to be underinformative (and therefore not calculated the implicature). The replicability of this task and the clear baseline it provides vis-à-vis the split between children and adults therefore serves as a means to compare the performance of both the monolingual and the bilingual children to an adult group of heritage speakers and to each other.

Materials and procedure
Noveck ran his task with children age eight to ten, and presented them with five exemplars of six categories of sentences, totaling 30 trials. Guasti et al. ran their task with seven-year-olds, and also included 30 items. Given the younger age of our target bilingual child population, we instituted five changes to the task. First, as we were targeting a much younger population, we scaled back the number of items to 18 (while still balancing sentence and quantifier type) so that the length of the experimental session was conducive to testing this age range. The session lasted approximately 15-20 minutes. Second, given the younger population, we made sure to include items that drew upon world knowledge with which we were fairly confident three-to five-year-olds would be familiar.
Third, for children (but not for adults) the sentences were delivered by a puppet, played by a trained experimenter. The premise was that the puppet was learning, and sometimes made silly or incorrect statements. The child's job was to assess the puppet's statements by telling him whether the utterance sounded 'good/okay' (suena bien) or 'bad/weird' (suena raro). After responding to the puppet, the child was occasionally asked to offer a justification. Adults were presented with an automated version of the study presented via Superlab experimental software (Cedrus corporation).
Fourth, for child participants, we began the experimental session with a brief training session, so that the children could get used to interacting with the puppet (Señor Ratón, or Mr. Mouse), and also see how to correct his 'silly' statements. We used the informal form of address (tú) because we expected that preschoolers would have more experience with this register than the formal usted. This training session was similar to the one appearing before Experiment 2 in Papafragou & Musolino (2003) and before Experiments 2 and 3 in Guasti et al. (2005), and included supporting images with some items. The purpose of the items was to have children judge sentences that were False (9), and to evaluate sentences that were (strictly speaking) True, but infelicitous (10). Sample items are included below.
Note that the training session was the only part of the experimental session that involved support from visual stimuli. The purpose of these stimuli was to help the children become acclimated to the task and feel comfortable about responding. After the puppet's reply, the experimenter would ask the child, ¿Suena bien o suena raro? ¿Hay otra manera de decirlo? 'Is that good or bad? Is there another way to say it?'. ¡Yo sé! ¡Es una cosa peluda con cuatro patas! I know is a thing furry with four leg:pl 'I know! It's a furry thing with four legs!' Finally, we followed Guasti et al. (2005), and presented the sentences as one complete list, rather than as two separate lists based on the two quantifiers ('some', 'all'), as Noveck had done. Guasti et al. report that there was no effect of this manipulation. We further pseudorandomized all of the items as one set to reduce the possibility of children perseverating with their judgments across quantifier sets -a manipulation that may actually have mattered for adults.
Participants were presented with 18 sentences. These included six sentences corresponding to each of the three quantifiers under investigation: algunos, unos, and todos. Within each set of six sentences, there were two tokens of each of the following types of sentences: True, felicitous (algunos, unos, todos); True, infelicitous (algunos and unos) or False (todos); and False, bizarre (algunos, unos, todos). The design was therefore a 3 (participants groups, between) x 3 (quantifier type, within) x 3 (sentence type, within). Examples of each are presented in Table 1. The full set of stimuli is presented in Appendix 1.

Results
The results are presented in Table 2, with standard error in parentheses to the right of each percentage. Following both Noveck (2001) and Guasti et al. (2005), we treated the dependent measure as the percentage of 'logically correct' responses (i.e., acceptances whenever the sentence was true).  Wilcoxon Tests were run on planned comparisons of the target items within each age group. As participants were overwhelmingly successful in rejecting both types of False/bizarre todos sentences, and accepting those that were True (with the exception of an outlier, which we discuss in the earlier footnote), we focus our attention on the target algunos and unos items.
There was also no significant difference between any of the groups' acceptance rates of the unos and algunos "True, felicitous" statements (Z = -0.462, p = 0.644; Z = -0.546, p = 0.585; Z = -0.000, p = 1.000). While neither group of children exhibited a 7 We present and analyze here with results for only one of the "True and felicitous" items for todos, because participants responded quite differently to the two items in this category: Todas las mariposas tienen alas ('All butterflies have wings') (monolingual children: 95.0%; bilingual children: 84.2%; adults: 100% bien) and Todos los gatos tienen bigotes ('All cats have whiskers') (monolingual children: 35.0%; bilingual children: 36.8%; adults: 63.2% bien). We realized only after reviewing the responses that bigotes not only means 'whiskers' but also 'mustache'. It is quite possible that participants may have understood this sentence as All cats have a mustache, which is, of course, false. We did not encounter similar issues with the "True and felicitous" algunos and unos sentences, and therefore present and analyze both trials for these.  difference in acceptance rate between the unos and algunos "True, infelicitous" statements (monolingual children: Z = -0.905, p = 0.37; bilingual children: Z = -0.714, p = 0.48), adults were unexpectedly more likely to accept the infelicitous algunos statements than the infelicitous unos statements (Z = -2.121, p < 0.04). 8 Using a Mann-Whitney U test to compare the differences in acceptance rates between the 3 different groups (monolingual children, bilingual children, and adults), we find that there are no significant differences in their acceptance of the algunos infelicitous statements (monolingual, bilingual children: U = 226.5, p = 0.29; monolingual children, adults: U = 175.5, p = 0.62; bilingual children, adults: U = 198.5, p = 0.14;) or of the unos, infelicitous statements (monolingual, bilingual children: U = 220.5, p = 0.20; monolingual children, adults: U = 149.0, p = 0.17; bilingual children, adults: U = 245.5, p = 0.78).

Discussion
As Noveck noted about his own experiment, "This task is relatively difficult. Participants are required to evaluate quantified statements while drawing on working memory. However, one finds children operating rather competently and in line with the prediction: pragmatic interpretations become evident subsequent to logical interpretations" (Noveck 2001: 182). Our results are entirely consistent with this assessment. The task itself is challenging, because children are not asked to respond to a context presented for them visually, but rather to call upon their independently-gathered knowledge of the world to evaluate the truth value of the proposition expressed. That this was challenging can be seen in the percentage of children's bien responses to the todos statements.
The participants in our task (perhaps surprisingly, for adults) did not show evidence of calculating the SI for algunos. Rather than judge sentences such as (11) as raro the majority of the time, they instead accepted them as bien. In fact, 14 of the 19 adults accepted the algunos statements 100% of the time. 9 (11) Algunos perros tienen ojos. some.pl dog.pl have:they eye.pl 'Some dogs have eyes.' For the bilingual children (but not the monolingual children) there was a correlation between age and proportion of bien responses (r = .546, R 2 = .298, p < .002). Interestingly, the older the child in the bilingual group, the more likely s/he was to respond affirmatively to the "True, but infelicitous" statement (an observation that is perhaps related to footnote 9). See Figure 1.
That a raro response was in their repertoire is evident from their responses to the False and False/bizarre cases. For these sentences, the children were happy to inform the puppet that he was wrong or was being silly. Indeed, acceptance for (12) was 15.8%, and acceptance for (13) was 0%. 8 As a reviewer points out, any potential crosslinguistic influence on the bilingual children's performance should not be isolated to one quantifier. Indeed, the performance by the bilingual children appears to reflect a suppressed percentage of bien responses relative to monolinguals with performance -especially with algunos and unos -hovering closer to chance level. 9 It is possible that since algunos needs to be anchored in a salient discourse set, adults readily drew upon their knowledge of the world, and easily considered the possibility that there are exceptions to this statement. In this case, a statement such as Algunos perros tienen ojos. Otros pueden haberlos perdido ('Some dogs have eyes. Others might have lost them') is perfectly acceptable. Thus, a high rate of acceptance does not necessarily signal a failure to compute an SI.
(12) Todas las vacas son café. all the cow.pl are brown 'All cows are brown.'

(13)
Unos carros toman leche. some car.pl drink:they milk 'Some cars drink milk.' Like the older French children in Noveck (2001) and Italian children in Guasti et al. (2005), the Spanish monolingual and the Spanish-English bilingual children in this task exhibited two familiar patterns. First, they distinguished between True and False statements -and most clearly between True statements and the False, bizarre statements that were incongruent with world knowledge. Second, they failed to calculate a 'some, but not all' SI with either unos or algunos. Acceptance rates for sentences containing these "True, infelicitous" items were high. This result held in spite of the training participants received prior to the task.
It may be surprising that the percentage of acceptance for the plain-vanilla "True, felicitous" items was not higher, but we think this can be attributed to the fact that these young children's judgments were not anchored in a visual display, but rather depended on an abstract representation of the world. Thus, in a task that called upon participants to recruit their real-world knowledge to render a judgment on the truth and felicity of sentences, these children floundered. However, their performance was not shaken enough to yield chance-level acceptance across the board; that they distinguished between True and False propositions within the task, but did not calculate SIs, demonstrates that they recognized a difference between the sentence types. The second task provides participants with a visual display, against which they can evaluate similar statements. We predict that such a methodological change will benefit children, but that in such a case, we may see a difference between the monolingual and bilingual children surface. Figure 1: Correlation between proportion of bien response to "True but infelicitous" items and child's age (in months) for Experiment 1.

Participants
Thirty-six bilingual children (16 boys, 20 girls age 3;4 to 5;5, mean age: 4;4), 34 monolingual children from Spain (14 boys, 20 girls age 4;5 to 5;4, mean age: 5;0), 22 monolingual children from Peru (7 boys, 15 girls age 3;3 to 4;9, mean age: 4;1), and 19 adults participated. 10 Two additional bilingual children were excluded (behavioral or attention issues n = 1, language proficiency reasons n = 1). The monolingual children from Peru participated in the same study as the bilingual children; the monolingual children from Spain participated in an abbreviated version of this task as part of the data collection for an unrelated study.

Materials and Procedure
The purpose of this experiment was to give children an explicit choice between scenes that did or did not render a target sentence True, and within the True scenes for algunos, to pit a scene representing the pragmatically derived, implicature-based interpretation against the semantically derived one. To achieve this goal, we constructed a forced-choice picture selection task, similar to that seen in some visual world paradigms and other related offline tasks (see, e.g., Hurewitz et al. 2006;Huang & Snedeker 2009). In this task, children were given the choice between four scenes, presented in four quadrants on a computer screen, as illustrated in Figure 2, for the sample sentence in (14). (14) Muéstrame dónde algunos caballos tienen zanahorias. show:imp:me where some.masc.pl horse.masc.pl have:they carrot.fem.pl 'Show me where some (of the) horses have carrots.' There were three versions of such sentences types, involving each of the target quantifiers (algunos, unos, or todos). There were four scenes presented in the quadrants. One (top right) illustrated the "True, felicitous" reading for algunos: some, but not all, of the horses have carrots. This was the target quadrant, and we refer to it as the "subset". One quadrant (bottom right) illustrated the "True, but infelicitous" reading for algunos: all of the horses have carrots. This was the competitor quadrant, and we refer to it as the "whole set". (Note that for todos, the target and competitor were therefore reversed, but based on truth/ falsity rather than felicity.) The other two quadrants were distractors that rendered the sentence false. One (top left) shows a group of similar horses, none of which has carrots. 10 There is no attested dialectal or SES difference between the two monolingual groups that should affect their suitability as control groups, or their comparison with each other. The last (bottom left) shows a similar group of horses, where some, but not all, of which have something (a birthday cake) that is not carrots. The quadrants were designed in this way for all test items to ensure that children were attending not just to the indefinite/quantifier, but also to the head noun in subject position (here, caballos) and the object in the predicate (here, zanahorias). That is, we wanted to ensure that they listened to the entire sentence, and chose the scene based on the entirety of the sentence content. Participants were presented with the sentence aurally while a blank scene was displayed. The scene with the four choices was then displayed. The puppet then repeated the sentence, with the experimenter calling the participant's attention to each of the four choices.
Participants saw 12 items each: nine test items (three items for each of algunos, unos, and todos) and 3 fillers. 11 The position of the target scenes was pseudorandomized within the experimental session. Fillers were structured similarly. A full set of verbal stimuli is presented in Appendix 2. The experimental session lasted less than 10 minutes. Note crucially that with this experiment -and not with Experiment 1 -the quantifiers have the potential to link back to a salient set of objects relevant to the discourse.

Results
The results are presented in Table 3 (standard error in parentheses). Since participants rarely selected any of the distractor quadrants (reflecting the fact that they attended to the content of the sentence -in particular, the subject and the predicate), we provide only the percentage of responses in which participants selected the "subset" (e.g., the scene where two of the three horses had carrots) or the "whole set" (e.g., the scene where all three horses had carrots) for each test item type.
We begin with the todos items. The adults and the monolingual children were significantly more likely to select the "whole set" scene than the "subset" scene, as expected (monolingual children, Spain: W = -581, z = -4.96, p < .0001; monolingual children, Peru: W = -199, z = -3.22, p = .001; adults: W = 3.81, p < .0001). Surprisingly, however, the bilingual children were no more likely to select one of these two scenes over the other (W = -154, z = -1.37, p(two-tailed) = .17, p(one-tailed) = .09). We return to this finding below.
The adults exhibited no difference in their choice of the "subset" and "whole set" items for unos (W = 62, z = 1.24, p = .23). However, the bilingual children were more likely to select the "whole set" than the "subset" for unos (W = -282, z = -2.63, p < .01), and the monolingual children from Peru trended in this direction (W = -102, z = -1.65, p = .099). 12 This finding is interesting against the backdrop of the algunos responses, which were distributed evenly between the "subset" and "whole set" scenes. It is possible that we are witnessing these children on the cusp of distinguishing between algunos and unos, and having their responses for algunos eventually pulled more toward the "subset" scene, although these results only invite speculation.
Recall that the bilingual children were the only group that was no more likely to select the "whole set" scene for todos than the "subset" scene. We found these results for todos reported for the bilingual children to be quite unexpected, and therefore chose to probe these results more closely. Recall that children chose mainly between the "subset" and "whole set" for these sentences, and did not choose the distractors. Thus, their choices were not randomly distributed among all four cells, but were rather concentrated on the two competitors. The children therefore knew that the animals in the pictures had to have some amount of the target property (e.g., having carrots). They were also aware that they had to select one of the four quadrants, since they had the same training on the task and had the task administered in the same way as the monolinguals. In addition, their individual selections were not split between cells.
We fed the results into a regression analysis, which revealed a positive correlation between age and selection of the "whole set": the older the child, the more likely the child was to select the "whole set" (r = .441, R 2 = .194, p(two-tailed) = .007). These results are presented in Figure 3.
Dividing the children into those 48 months (4 years) and younger (n = 15) and those older than 48 months (4 years) (n = 21) highlights this contrast. Forty-two percent (young) versus 62% (older) selected the "whole set", respectively. This correlation is only marginally significant for unos (r = .317, R 2 = .100, p = .06), and is not significant for algunos (r = .015, R 2 = .0002, p = .94). Thus, while the bilingual children deviated from their monolingual counterparts with this particular item, the results indicate that their performance is not haphazard, and improves with age.

Discussion
Faced with a choice between four different scenes as potential matches for a target statement containing algunos, unos, or todos, participants in this experiment were able to narrow down the choices to just the 'subset' and 'whole set' scenes, demonstrating that  they attended to the denotation of the subject NP and the predicate. The adults and the monolingual children consistently selected the "whole set" scene for todos, but only the adults consistently selected the "subset" scene for algunos, although the monolingual children seemed pulled in this direction. The bilingual children, however, appeared to have difficulty with this task, remaining at chance between the target and competitor for both algunos and unos. As in Experiment 1, there was not a clear difference between algunos and unos for children, although the results did trend in the expected direction, with both the bilingual children and the monolingual Spanish-speaking children from Peru. Although the findings for todos for the bilingual children were surprising, they are not entirely inconsistent with other studies in which preschoolers were asked to search for a scene matching an all sentence. For example, three-year-olds in Hurewitz et al. (2006)'s study were given a choice between four scenes involving an alligator presented as four different quadrants: (a) an alligator, (b) an alligator near a plate with one cookie on it, (c) an alligator holding two cookies near a plate with two cookies on it, and (d) an alligator holding four cookies near an empty plate. When asked to select the scene where The alligator took all of the cookies, children correctly chose (d) close to 80% of the time. However, 14 out of the 17 documented incorrect responses involved selection of (b) (n = 5) or (c) (n = 9). Did these three-year-olds not understand all? That is possible, although it seems a bit unlikely. It seems more likely that the cognitive demands of the task led them astray, or they interpreted the visual stimuli differently. It is therefore possible that the bilingual children in our task were in the same position. The older children in our task experienced a higher rate of success, because they were able to exercise a greater degree of control when filtering out distractors and competitors -a cognitive, and not a linguistic, advantage that they had over the younger children.

Conclusions
In this paper, we presented two experiments investigating Spanish-English bilingual children's ability to calculate scalar implicatures (SIs). The performance of this group of children was compared against Spanish monolinguals, and adult Spanish heritage speak- ers from the same geographic region. Our tasks were intentionally rather slim on the contextual support offered and as such, may have succeeded in placing children at a computational disadvantage by not offering them contextual and methodological support for SI calculation.
In both experiments, adults patterned in the expected direction, correctly assigning todos universal quantification. However, they only robustly calculated the SI for algunos in Experiment 2. (Although see footnote 11 for one possible explanation.) A follow-up on Experiment 1 in English reveals that blocking the quantificational items within the experiment drastically affects the likelihood that an SI will be calculated: adults who are presented with the all items before the some items are more likely to reject "True, infelicitous" some items. (Noveck 2001 andGuasti et al. 2005 had also blocked their items.) The monolingual children in our tasks differentiated algunos from todos, but not from unos.
We thus did not replicate findings by Vargas-Tokuda, Gutiérrez-Rexach & Grinstead (2008), where monolingual Spanish children displayed such knowledge. However, their task was preceded by dedicated training on todos, and every single scenario had the same template: all or all but one character jumped over something. Thus, the pre-experiment quantifier training paired with the repetitiveness of the task could have facilitated differentiation of the interpretation of the target lexical items based on contrast (Clark 1987) or mutual exclusivity (Markman & Wachtel 1988) alone. In addition, our tasks were intentionally stripped of the contextual support other tasks might provide.
In related research with the same Spanish-English bilingual population further investigating their ability to calculate implicatures, both in Spanish and in English, we have obtained complementary findings. Even with a context-rich truth value judgment task, the bilingual children and the monolinguals are on par, but bilinguals also appear to struggle with the interpretation of todos. However, there is evidence that this performance is tied to the interpretation of specific lexical items, and does not speak about children's general inability to calculate implicatures. With items that involve particularized conversational implicatures, bilingual children seem to assign an upper bound without difficulty (Austin et al. 2015).
Taken together with these other findings, the current experiments reveal that the challenge for the developing bilingual child is comparable to that of the monolingual, where pragmatic implicatures are concerned. However, the challenge faced by bilingual children may at times be greater, as they work to successfully distinguish specific lexical entries from each other, based not only on semantic representations, but also on the force of these terms in a conversational context. Happily, these difficulties are overcome as proficiency increases, as documented by the results of the adult heritage speakers, who are bilingual themselves.