How children attend to events before speaking: crosslinguistic evidence from the motion domain

How do children talk about the dynamic world around them? In this eyetracking study, we demonstrate language-specific patterns in the way 3and 4-year-old speakers of English and Greek inspect motion events prior to speaking and describe such events in their native language. Across age and language groups, children were more likely to mention manners of motion than paths, but English-speaking children were more likely to provide manner information than Greek-speaking children were. Comparison of eyegaze patterns from the linguistic (description) task to eyegaze patterns observed during a nonlinguistic (memory) task with a different group of Englishand Greekspeaking 3and 4-year-olds revealed effects of language background on event inspection. These effects suggest that by the age of 3 years, children exhibit sensitivities to language-specific patterns of motion event encoding that influence the way they gather information from the visual world during the process of language production. ANN BUNGER


Introduction
During language production, we select information from our conceptual representations and encode it in linguistic strings. When we talk about events that are happening in the world around us, we usually mention only a subset of the information that is available to us. The process of selecting which information to include may be guided by the structure of our conceptual representations, by our communicative goals, and by the language that we speak (Levelt 1989; see also Papafragou & Grigoroglou 2019, for a recent review).
When describing an event in which a person moves from one location to another we tend to mention aspects of the event that are central to making sense of the motion-for example, where the person is going (into the cafe, across the pool, up Mount Everest) and how they are moving (walking, swimming, climbing)-and to omit other information (e.g., what the person is wearing). Path (where) and Manner (how) information correspond to conceptual components of motion events that are understood from a very early age. Evidence for the conceptual basicness of these components comes from the fact that some of the earliest words that children tend to use describe paths ("up") and manners ("dance") of motion (Fenson et al. 1994). Furthermore, the same meanings are encoded by the early gestures produced by deaf home signers who have not been exposed to a conventional language model (Zheng & Goldin-Meadow 2002). Additionally, there is experimental evidence that young children can discriminate both paths and manners of motion in nonlinguistic tasks by the time they are 14 months old (Pulverman et al. 2003;Pruden et al. 2004;Pulverman & Golinkoff 2004;Pulverman et al. 2006;Pruden et al. 2008;Pulverman et al. 2008;Göksun et al. 2017). In one study, when 14-to 17-month-old children learning different languages were habituated to an animated character that moved with a particular manner (e.g., rotating along a horizontal axis) along a particular path (e.g., over a stationary shape), their responses to following stimuli indicated that they were sensitive to both the manner in which the character moved and the path that was followed (Pulverman et al. 2008).
Despite these commonalities, the way motion event information is encoded in language is subject to robust language-specific factors. According to an influential proposal, most languages tend to fall into one of two typologically distinct classes that can be distinguished by where information about the core conceptual component of Path is encoded (e.g., Talmy 1975;1985;. Speakers of so-called satellite-framed languages, like English and German, tend to describe motion events by encoding information about manner of motion in the main verb of a sentence and path information in mostly non-verb elements (satellites; cf. Slobin & Hoiting 1994;Slobin 1996a). Consider the English sentence in (1): the verb "sailed" provides information about the manner of motion and the prepositional phrase "into the harbor" provides information about the path. In contrast, speakers of so-called verb-framed languages, like Modern Greek and French, often describe motion events by encoding information about the path of motion in the main verb and manner information (if it is included) in satellites, especially when describing motion events that involve boundary crossing (Aske 1989;Slobin & Hoiting 1994;Papafragou, Massey & Gleitman 2003;Hickmann & Hendriks 2006;Selimis & Katis 2010;Özçalişkan 2013;Georgakopoulos, Hörtl, & Sioupi 2019). The Greek sentence in (2) provides an example of this pattern: the verb "bike" ('entered') provides information about the path and the prepositional phrase "me to skafos tu" ('with his boat') provides information about how the man got there. (1) A man sailed into the harbor.
A.nom human.nom entered in-the.acc harbor.acc with the.acc boat.acc his.
'A person entered the harbor with his boat' As several commentators have noted, the verb-framed vs. satellite-framed distinction is not an absolute dichotomy but allows for degrees of convergence on a single pattern depending on lexical, morphosyntactic and even pragmatic aspects of event encoding (Skopeteas 2008;Beavers et al. 2010;among others). Within the class of verb-framed languages, there is considerable variation in how frequently path and manner information is encoded during production and how this information is distributed across the sentence (e.g., Slobin 2004; Soroli & Verkerk 2017). Similarly, within individual languages in the typology, there is considerable variation in attested patterns of motion encoding: for instance, Greek sometimes exhibits mixed preferences by encoding manner information in the verb and/or packaging path information in prepositional phrases or particles (Talmy 2000;Selimis & Katis 2010;Soroli 2012). Nevertheless, crosslinguistic differences in motion event encoding predicted by the verb-framed vs. satelliteframed divide have been documented extensively in adults (e.g., Talmy 1975;1985;Aske 1989;Talmy 1991;Slobin & Hoiting 1994;Slobin 1996a;Naigles et al. 1998;Papafragou et al. 2006;Özçalişkan 2013), and are known to emerge early during development. Papafragou and Selimis (2010b) demonstrated that 5-year-old Greek-and English-speaking children have already begun to prioritize manner and path elements in motion event descriptions like adult speakers of their target language, with English-speaking children more likely than Greek-speaking children to describe motion events with verbs that encode manner of motion and Greek-speaking children more likely than English-speaking children to use path verbs (cf. Papafragou et al. 2002;2006;Papafragou & Selimis 2010a for a similar finding in older children). Even earlier effects of language environment on motion event description have been demonstrated in experimental studies comparing 3-year-old speakers of the verb-framed languages Turkish (Özçalişkan & Slobin 2000;Allen et al. 2007;Özyürek et al. 2008), French (Hickmann & Hendriks 2006;Hickmann et al. 2009;Hickmann et al. 2018) and German (Hickmann et al. 2018) to agematched English speakers, as well as in studies comparing the spontaneous speech of 2-year-old Korean-and English-speaking children (Choi & Bowerman 1991). Crosslinguistic differences in the encoding of event components have been documented in the verb learning patterns of speakers as young as 3 years of age (Maguire et al. 2010;Skordos & Papafragou 2014). Across many (though not all ;Slobin 2004;Soroli & Verkerk 2017) of these studies, speakers of satelliteframed languages were overall more likely than speakers of verb-framed languages to mention manners of motion: in one study (Papafragou et al. 2006), Greek speakers added manner of motion modifiers when the manners were novel or unexpected (cf. A man went up the stairs running) but English speakers encoded manners of motion in the verb regardless of typicality.
These language-specific biases in event description are reflected in systematized differences in the way adult speakers of typologically different languages select motion information to talk about during speech planning. Tracking speaker eyegaze during event viewing provides a window onto this process of information gathering as it unfolds over time (e.g., Griffin & Bock 2000;Bock et al. 2004;Griffin 2004;Meyer 2004;Papafragou et al. 2008;Trueswell & Papafragou 2010). As speakers inspect static or dynamic events with the intention to describe them, their patterns of eyegaze reveal the visual elements that they inspect, as well as the time course along which event information is gathered. These eyegaze patterns can, in turn, be linked to the content and form of the event descriptions that speakers eventually produce, providing insight into the mapping between conceptual and linguistic event representations. Critically, patterns of event inspection observed during the planning stages of language production (message planning, lexical selection, grammatical encoding) differ from those observed when people are engaged in nonlinguistic tasks.
In general, it is known that when adults view an event while planning to talk about what they see, they direct their attention very quickly to components of the scene that they plan to talk about, usually in the order that they plan to mention them (Griffin & Bock 2000). Papafragou and colleagues (2008) demonstrated, moreover, that while adult speakers of English and Greek were engaged in the process of describing motion events, they exhibited language-specific differences in event inspection that reflected differences in motion event description in these languages. Specifically, they found that when describing bounded motion events (i.e., motion that involves a goal like Figure 1), English speakers were more likely than Greek speakers to use manner verbs, and the opposite held for path verbs. Consistent with these linguistic choices, when planning their event descriptions, adult speakers of these two languages directed their attention very early to event components that they planned to encode in the verb of their sentence: English speakers to event elements that provided information about the manner of motion (i.e., vehicles, instruments) and Greek speakers to elements that defined the path (i.e., Ground objects). Crucially, these crosslinguistic differences in event inspection only surfaced when participants recruited linguistic resources to accomplish a task: when they were presented with a free-viewing task that did not require the use of language, adult speakers of English and Greek did not show the same crosslinguistic differences in eyegaze patterns (see also Trueswell & Papafragou 2010). Thus, language-specific differences in the way information is gathered from motion events are driven by the process of "thinking for speaking" (Slobin 1996b;2006), and not by fundamental differences in nonlinguistic cognition between the two language groups (see also MacDonald 2013; Norcliffe et al. 2015;Skordos et al. 2020).
The way experience with a particular language affects production and attention patterns in children is not as well understood. Bunger and colleagues (2012) demonstrated that, by the time they are 4 years old, English-speaking children exhibit the same fundamental link between attention allocation and linguistic output that has been observed in adults. They found that not only do English-speaking children of this age tend to mention manners of motion more often than paths when they describe motion events, but like English-speaking adults, they also tend to direct more attention to the manners of motion events while they plan those event descriptions. It is as yet unknown, however, whether children of this age will show the same crosslinguistic differences in their attention to motion events during the planning stages of language production that adults do. Recall that crosslinguistic differences in the encoding of event components have been documented in speakers as young as 3 years of age. However, very little is known about the extent to which these differences are reflected in attention patterns as young speakers plan those utterances. The study of online attention patterns during language planning provides insight into the process that speakers are going through when selecting information from the visual world to talk about. By investigating similarities and differences in the way preschool-aged speakers of different languages prepare event descriptions before verbalizing them, we begin to tease apart behaviors during language planning that are shared from those that are specific to the acquisition of a particular language.
In the current study, we ask whether children exhibit language-specific differences in attention allocation during speech planning, and if so, whether those differences are linked to what they actually say about the events. This is one of the first studies to combine eyegaze measures with crosslinguistic event description in preschool-aged children. By investigating how early in development crosslinguistic differences in event description and attention begin to emerge, we add to the growing body of knowledge about the kinds of linguistic differences that children are sensitive to. Moreover, we begin to fill gaps in our understanding of how developmental and crosslinguistic differences influence the way information is selected during the process of language production.
Specifically, we ask whether children learning English and Greek exhibit systematic crosslinguistic patterns of attention as they plan descriptions of motion events. We look for evidence of language-specific biases 1) in the information that 3-and 4-year-old speakers of English and Greek provide when they describe dynamic motion events and 2) in their patterns of event inspection as they plan those descriptions. Following the research we summarized previously, we expect to find English and Greek-speaking children to differ in their tendency to provide information about manners (typically prioritized in English descriptions of motion events) and paths (typically prioritized in Greek descriptions of motion events) of motion. By comparing eyegaze patterns across language groups in conjunction with production of event descriptions, we fill a gap in the understanding of the way children in these age and language groups gather information about motion events in real time. Specifically, we aim to investigate whether the way they direct their attention during speech planning is linked to the information they provide about a motion event (manner vs. path), as has been demonstrated for adult speakers of these languages. To the extent that preschool-aged speakers of the two languages differ in their tendency to mention manner and/or path information in their event descriptions, we also expect them to allocate their attention differently while preparing those descriptions. Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1210 In addition, we compare eyegaze patterns during motion event description by children in these age and language groups to their eyegaze while viewing the same motion events in a nonlinguistic (memory) task. Here, we expect to find that, as for adult speakers of these languages, children exhibit different patterns of attention allocation to motion events when they are engaged in the process of language production versus when they are viewing the events in preparation for a memory task. As mentioned previously, adult speakers of these languages show similar patterns of attention to motion events when viewing them in preparation for a memory task (e.g., Papafragou et al. 2008;Trueswell & Papafragou 2010). This experiment will allow us to investigate whether patterns of attention to motion event components during nonlinguistic tasks also converge for 3-and 4-year-old speakers of these languages.
Greek-speaking children were recruited through public (n = 5) and private (n = 35) preschools in and around Ioannina, Greece. Children had no parent-reported history of visual, cognitive, or language impairments. Data from an additional 19 children were excluded from the analysis for the following reasons: unwillingness to cooperate (n = 4), experimenter error or equipment failure (n = 6), failure to calibrate (n = 1), production of linguistic data that were not compatible with our coding rubric (n = 3; see "Coding of event descriptions" for more information), or significant trackloss during stimulus viewing (n = 5; see "Analysis of eye movement data" for trackloss criteria). Sample size was determined on the basis of previous eye tracking studies of motion descriptions in adults (e.g., Papafragou et al. 2008;Trueswell & Papafragou 2010).

Apparatus
Stimulus presentation and data collection were carried out using either a Tobii 1750 (77 children) or a Tobii T60 (2 children) remote eyetracking system (we used two systems because of a switch in lab equipment). The T60 is an updated version of the 1750 system: both systems track binocular eyegaze using optics embedded in a 17-in TFT flat panel monitor with a display size of 33.5 (width) × 26.8 (height) cm (31.2 deg × 25.1 deg visual angle at viewing distance of 60 cm). Both systems were set to a screen resolution of 1024 × 768. In our Tobii 1750 setup, two laptop computers running the Windows XP operating system controlled the eyetracking system: one computer displayed stimuli on the 1750 monitor (via the ClearView software from Tobii Technology); the other collected data from the eyetracker at a consistent 50 Hz sampling rate (via the TET-server software from Tobii Technology). The T60 uses an embedded server to collect data at a consistent 60 Hz sampling rate. In our T60 setup, we used a laptop computer running the Windows 7 operating system to control the display of stimuli (via the Tobii Studio software from Tobii Technology). To increase timing accuracy, all laptops in both systems were disconnected from the internet. To reconcile differences in sampling frequencies across the two systems, eyegaze data were analyzed as proportions of looking to various regions of interest during 1-s windows of the test period.

Materials
Stimuli consisted of short (9-s) videos that were created by animating clip-art images. Twelve target event videos depicted motion events in which a human or animal agent used an instrument or vehicle to move toward a stationary object (see Figure 1 for a sample, and Appendix A for the full list). To assess familiarity with these instruments or vehicles in English and Greek speakers, The 4-year-old English-speakers in this study are the same children described in Bunger, Trueswell and Papafragou (2012). In this paper, we take a different approach to the analysis of their eye movements and event descriptions.
Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1210 we asked 10 adult speakers of each language to indicate their own familiarity with each item on a 5-point scale (1 = not very familiar, 5 = very familiar). There was no difference in the average familiarity of the vehicles between language groups (average familiarity score of 4.7 for both).
Because our goal was to assess attention to motion event components, Manners and Paths of motion in target events were represented by distinct objects in the scene. A simple, contextually appropriate background was also created for each video (e.g., a body of water). All Manners of motion were associated with the instrument or vehicle used by the agent (e.g., boat, ice skates, airplane), and clipart images were constructed so that the instrument was spatially separated from the torso and face of the agent, allowing looks to these two components to be distinguished in the analysis of eyegaze data. (We did not use spontaneous motion events just as walking or jumping, where the Manner of motion region cannot be reliably separated from the agent region.) All Paths involved movement of the agent toward a goal object (e.g., island, fishing hut, cave) that determined the Path endpoint for each event. Trajectories of agent motion were never marked by visual paths like winding roads or wake trailing a boat. Goal paths were chosen for all events because they are known to be more salient than source paths in both conceptualization and description of motion events (e.g., Regier 1996;Lakusta & Landau 2005). The specific representations of Manner and Path information in these events represent a limited set of what may be conceptually or linguistically encoded as manner or path-many motion events do not include instruments, and paths typically include more than the goal of a trajectory; these choices were made to create clear regions of interest for the eyetracking analysis that included only visible items in the stimuli within any given frame and made no additional assumptions about how the viewers conceptualized each visible item.
Twelve filler event videos depicted animate agents and inanimate objects involved in events that did not include specific endpoints (e.g., flying a kite; see Appendix A for a full list). The animation in all videos lasted for 3 s, and then the final frame of the event remained visible on the screen for an additional 6 s. When the animation ended (at 3 s), participants heard a beep; aside from this beep, all videos were silent. Clipart animations were first created in Microsoft PowerPoint and then modified and exported as Audio Video Interleave (avi) files using Apple's Final Cut Pro software. When presented on the screen of either Tobii system, stimulus videos were 23.6 (width) × 16.7 (height) cm (22.2 × 15.9 deg visual angle at a viewing distance of 60 cm).

Procedure and experimental design
All children were tested in their preschools by a native speaker of their own language. During the experiment, children sat unconstrained in a car seat firmly attached to a stationary chair placed approximately 60 cm from the eyetracker screen. The experimenter adjusted the angle of the screen for each child to obtain robust views of both eyes that were centered in the tracker's field of view. Calibration was carried out using Tobii's default 5-point calibration scheme. If the calibration was incomplete (data for fewer than 4 points were captured) or was judged by the experimenter to be otherwise unacceptable, the calibration routine was repeated, with adjustments made to the position of the child or the eyetracker, as necessary. As mentioned previously, one child who failed to calibrate was excluded from the analysis.
After the calibration routine, participants were given instructions for their task. There were two experimental tasks; participants were assigned to tasks at random, based on a rotation through an experimenter-generated list. Half of the participants in each age and language group were assigned to a Linguistic task, and the other half were assigned to a Nonlinguistic task. Instructions were presented to children in their native language. In both of the tasks, participants were informed that they would be viewing "cartoons" with "people and animals doing things." 2 Children in the Linguistic task were asked to tell the experimenter "what happened in the cartoon" as soon as they heard the beep that signaled the end of the animation. 2 We chose these general instructions because "doing something" can refer to either a manner ("sailing") or a path ("entering"). See Papafragou and Selimis (2010a) for evidence that people's interpretation of this phrase in a different, more ambiguous (categorization) task can take on either path or manner nuances.
Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1210 Participants in the Nonlinguistic task were asked to "watch the cartoons very carefully" because the experimenter would be asking them "some questions about them later." 3 Participants in each task viewed the same progression of stimuli presented in a fixed semirandom order: 4-year-olds viewed the entire set of 24 items (12 targets and 12 fillers), and, because pilot testing suggested that the full set of stimuli was too long for them, 3-year-olds were presented with a subset of 16 of these items (8 targets and 8 fillers). A recentering animation in which colorful objects (e.g., stars and smiley faces) flew around the screen was shown between all stimulus items. This animation allowed the experimenters to recapture the gaze of inattentive preschoolers while at the same time avoiding directing their attention to any particular location on the screen. Participants in the Linguistic task provided their event descriptions aloud, and these sessions were audio-recorded. Participants in the Nonlinguistic task were discouraged from engaging in linguistic encoding of the events: children in this condition who began to give descriptions were reminded to "watch quietly."

Data coding and analysis 2.5.1 Coding of event descriptions
Descriptions of stimulus events collected from participants in the Linguistic condition were transcribed and coded by native speakers of the language under consideration. Event descriptions were not available for 11 of the 392 Linguistic trials: 1 trial was skipped due to experimenter error, and 10 trials did not elicit intelligible event descriptions. These trials were excluded from all analyses. For the remaining trials, descriptions of target items were assessed for mention of the Manners and Paths of motion depicted in the event. Words or phrases that referred to instruments (e.g., "boat") or the agent's manner of motion (e.g., "sailing," "floating," "driving") were coded as Manner mentions, and those that referred to either the path endpoint (e.g., "island," "beach"), the agent's trajectory of motion (e.g., "went to"), or the relationship between the agent and the path endpoint (e.g., "reached") were coded as Path mentions. In addition, to ensure that we were coding motion event components rather than just information about objects, all utterances included in the dataset mentioned motion and/or boundary crossing. For example, an utterance like (3a) would be coded as including both Manner and Path information, whereas (3b) includes only Manner information and (3c) includes only Path information.
He went to an island in a sailboat.

b.
He was sailing the boat.
c. He went to an island.
Event descriptions that did not include information about either the Manner or the Path of the associated target event were coded as "Neither." Moreover, because we were interested in children's mention of event components rather than the instrument and goal objects that depicted them, we excluded from the analysis 59 event descriptions that consisted of no more than labels for instruments ("this is a ship") or goals ("house"): 15 of 71 items from Englishspeaking 3-year-olds, 27 of 75 items from Greek-speaking 3-year-olds, and 17 of 119 items from Greek-speaking 4-year-olds. Across languages, 69% of these labels referred to vehicles and 7% to goal objects; the remaining 24% referred to agents or to background elements. As mentioned in "Participants," in addition to these individual trials, three additional children (all Greek-speaking 4-year-olds) were excluded from the analysis for producing a majority of event descriptions of this type.
3 These questions about the cartoons were posed during a memory task that is not described in this paper because it is not relevant to the questions under investigation. Every child participated in the memory task after he or she had completed one of the Linguistic or Nonlinguistic tasks described in the text. Children in the Nonlinguistic task were informed in advance about the "memory game" to motivate them to pay attention to the stimuli when these were first presented and were asked during the task to indicate whether each of a new set of dynamic events was the "same" or "different" as the events they had seen before. Based on prior evidence (e.g., Hagen & Kingsley 1968;Reese 1975;Hitch et al. 1991;Flavell et al. 1996;Palmer 2000; Kahn & Snedeker 2010), we thought it was unlikely that young children would use language strategically to encode stimuli in a memory task. There were no significant differences across language groups and task for memory accuracy for either event component (all p values < 0.10).
Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1210 2.5.2 Analysis of eye movement data Eye movement data were analyzed to assess the effects of language background, age, and task on encoding of motion event components. Data samples from target trials (50 per second from the Tobii 1750, 60 per second from the Tobii T60) were time-locked to the onset of the video, and analyses were performed on raw eyegaze coordinates from each sample. Trackloss was determined separately for each eye by Tobii's eyetracking software (Clearview for the 1750, Tobii Studio for the T60). Our data set includes samples for which the system is certain that it has recorded the correct coordinates for at least one eye (i.e., samples with a validity score of 0 or 1 on a scale from 0 to 4). Missing data (samples with validity >1) were counted as trackloss for a given eye. For samples with available data from both eyes, we used an average of the gaze coordinates from the two eyes. Trials with global track-loss of >30% were excluded from the analysis (n = 22 from the Linguistic task, n = 20 from the Nonlinguistic task). Four-year-old participants with more than four excluded target trials (n = 2) and three-year-old participants with more than three excluded target trials (n = 3) were replaced in the design.
To assess attention to motion event information in our stimuli, two dynamic spatial scoring regions were defined for each target video: a Manner region, which included the instrument used by the agent as the means of motion (e.g., the sailboat), and a Path endpoint region, which included the stationary path endpoint (e.g., the island). The Manner region never included the head or torso of the agent; these visual elements were included in an additional Agent scoring region that is not reported here. Trajectories were omitted from the Path region because they were never visible in our stimulus events, and previous work in our labs has demonstrated that although viewers of motion events like these do make anticipatory eye movements that project an agent's trajectory toward a visible path endpoint, they rarely fixate empty regions of space (Papafragou et al. 2008). On average, Manner regions subtended 6.90 (width) × 2.67 (height) deg visual angle, and Path regions subtended 9.22 × 8.98 deg visual angle. The size of each region for each stimulus is given in Appendix B (cf. also Bunger et al. 2012).
During the animation in our motion event videos, the instrument moved across the screen toward the path endpoint. To keep track of looks to this dynamic event component, an automated data analysis procedure was used to update the coordinates of the Manner region in the eyetracking analysis file as the event unfolded. Manner and Path regions were first defined by hand based on the position of instruments and path endpoints in the first frame of each target video. The Manner region was then repositioned by hand for each frame of the video, and the coordinates of this region in each successive frame were recorded to a file. The size of the Manner region remained constant across frames, as did the size and position of the Path region. For the analysis, an eyegaze sample was defined as being within a region of interest if its coordinates fell within the region as defined for the corresponding video frame. As instruments moved toward path endpoints near the end of events, Manner regions sometimes partially occluded Path regions. Overlap of this sort was resolved by assigning gaze to the Manner region, a step that follows directly from our choice to code looks only to items that were visible in the stimuli in a given frame. Eyegaze data are reported as the proportion of samples (averaged across subjects) for looks within these predefined regions of interest (out of all looking), averaged into blocks of 1 second. Any looks within a region were included in the analysis, regardless of duration.

Statistical analyses
Multilevel mixed logit modeling with crossed random variables for Subjects and Items was used to assess the reliability of trends observed in the data (cf. Baayen et al. 2008;Barr et al. 2013). Eyegaze data (proportions of samples whose coordinates match those of a given region of interest) in statistical analyses were elogit-transformed following Barr (2008). Best fitting lmer models for each analysis were chosen through stepwise comparisons of log likelihood values. Fixed factors (Language, Age, Task, Motion Component, as appropriate) were included as random slopes in Item effects structures when they did not perfectly correlate with the intercept. All p values reported for factors within analyses are vs. an empty model with no fixed effects.  Table 1 provides information about the proportion of utterances in which the preschoolers in this study mentioned the Manner or Path of our target motion events (or neither), regardless of the syntactic position in which those event components were encoded. 4 Across age groups, English-speaking children were more likely to mention either motion event component than Greek-speaking children were, and across language groups, older children were more likely to mention either motion event component than younger children were. Across language and age groups, children were more likely to provide information about Manners of motion than about Paths of motion. These trends were confirmed by multi-level modeling of categorical values at the trial-level for mention of motion information (0,1), with Language (English, Greek), Age (3yo, 4yo), and Motion component (Manner, Path) as first-level fixed factors. The best fitting model (p < 0.001; Table 2) includes main effects of Language, Age, and Motion component, as well as an interaction between Language and Motion Component. The significant interaction between Language and Motion Component is representative of the fact that English-speaking children were significantly more likely than Greek-speaking children to mention Manners (p < 0.001), but the two groups were equally likely to mention Paths (p = 0.86). 5 Table 3, even though children sometimes combined Manner with Path information in their target event descriptions, Manner information across languages and age groups mostly appeared in the absence of Path (see (5a) for an example from English and (5b) for an example from Greek). 6 Beyond this broad pattern, English-speaking children were almost twice as likely 4 Proportions for each group in Table 1 do not add up to 1 because some utterances included information about both event components, and some included information about neither. See Table 3 for a breakdown of the data that includes these details.

5
Across age and language groups, children were more likely to encode Manner information in verbs like "sailing" rather than in subject position (e.g., "man with a boat") or in post-verbal positions (e.g., "in a boat"). 6 Table 1 presents gross information about the proportion of event descriptions that include either Manner or Path information. In Table 3, this information has been reorganized to communicate the full semantic content of the descriptions. So, for example, the values in the "Manner Only" and "Both" columns in Table 3 add up to the values in the "Manner" column in Table 1 (and likewise for the Path columns).

Manner Path
English  Overall, the language-specific pattern of omissions we observe in young learners of English and Greek is consistent with trends in adult production in the two languages: adult speakers of English tend to mention Manners of motion more often than adult speakers of Greek (Papafragou et al. 2002;2006, among others). Unlike those prior cross-linguistic reports, however, Greekspeaking children also mentioned Manners more often than Paths. This finding is reminiscent of studies pointing out that Greek presents some variation in how frequently Manner is expressed in motion descriptions (e.g., Soroli 2012). Notice that the stimuli used in this experiment included vehicle-defined Manners of motion (e.g., steering a boat, driving a car, flying a plane). The movement of these vehicles was the only motion that occurred in the animated stimuli, and many of these vehicles were interesting themselves (parachutes, hot air balloons, sailboats, ice skates). In the next section we present eyegaze patterns that suggest that children found these vehicles particularly engaging. It is likely that this interest in the vehicles (or their dynamic motion) that defined Manners of motion in our stimuli led children in both language groups to talk about them. Critically, the English bias to talk about Manners of motion went beyond this baseline interest in features of our stimuli: despite an overall preference across language groups to talk about Manners, English-speaking children mentioned Manners of motion even more often than Greek-speaking children did.
In summary, we found both similarities and differences in the way preschool-aged speakers of English and Greek described our motion events. Not surprisingly, older children in both language groups tended to provide more motion information than younger children did (see Henriks 2006 andHickmann et al. 2018 for similar developmental findings in English-, German-, and French-speaking children). Both English-and Greek-speaking children mentioned Manner information more often than Path information (cf. also Soroli 2012), a fact that may have been due to features of our dynamic stimuli. Consistent with adult trends in production, however, we found that across age groups, English-speaking children were more likely to mention Manners of motion than were Greek-speaking children. This finding indicates that, by the time they are 3 years old, children learning English and Greek are sensitive to the way adult speakers of their own language describe motion events and have already started to follow these patterns in their own language use. This conclusion is consistent with prior work on how children describe motion (e.g., Özçalişkan & Slobin 2000;Papafragou et al. 2002;Allen et al. 2007;Özyürek et al. 2008;Papafragou & Selimis 2010a) while recognizing some variation in how cross-linguistic differences are manifested (cf. also Selimis & Katis 2010;Soroli & Verkerk 2017).

Eye movements
Given these similarities and differences in the way English-and Greek-speaking preschoolers describe motion events, we next ask whether children of this age gather information from the visual world during speech planning in similar ways or in language-specific ways. We have collapsed across age groups in our assessment of these eyegaze patterns because, although older  children tended to say more about target events than younger children did, the kind of motion information they were providing did not change with development (i.e., everyone tended to talk more about Manners).

Attention to Manner
In a first analysis, we look at patterns of attention to Manner in our motion events for trials on which children mentioned or did not mention this specific component. We focus on Manner since mention of this component was the locus of a strong cross-linguistic difference (see Table 1). To probe for eyegaze patterns that are specific to the process of language production, we compare attention to motion event components by participants who completed the Linguistic task to those who completed the Nonlinguistic task. We will discuss this data with respect to two research questions: First, do we see differences in the way children direct their attention when engaged in linguistic and nonlinguistic tasks? And second, do English-and Greek-speaking children show the same patterns of attention across these tasks? Figure 2 depicts the attention that English- (Figure 2A) and Greek- (Figure 2B) speaking preschoolers directed to the Manner elements of our motion events. Because we are interested in the way speakers gather information in preparation for speaking, these graphs depict just 5 s of the total 9 s viewing period, including the 3 s before children were signaled to speak (by the beep) and 2 sec after this signal.
We used multilevel mixed elogit modeling as described above to compare patterns of attention to Manner elements in our stimuli across tasks and language groups. Elogit-transformed proportions of looks to Manner regions were modeled separately within five 1-s windows beginning at stimulus onset, and Language (English, Greek) and Task (Nonlinguistic, Linguistic) 12 were entered as first-level fixed factors. We will refer to these analysis windows by their start and end times: thus, we analyzed data for the following five 1-s blocks of time: the 0-1 s window, the 1-2 s window, the 2-3 s window, the 3-4 s window, and the 4-5 s window. Data for the Linguistic task were assessed separately for trials on which Manner had (Table 4) and had not ( Table 5) been mentioned.
When assessing attention to Manner regions for trials in the Linguistic task on which participants did mention Manner information (Table 4), we found effects of both Task and Language. A significant effect of Task was found for the 1-2 s analysis window (p < 0.001), such that during this time period children in the Linguistic task who went on to mention Manner information directed more attention to Manner regions than children in the Nonlinguistic task, regardless of language background. A significant interaction between Task and Language was found for the 2-3 s analysis window (p < 0.05): In this analysis window, only English-speaking children who went on to mention Manners showed a significant increase in attention to Manners in the Linguistic task vs. the Nonlinguistic task (p < 0.05). Additionally, Greek-speaking children in the Nonlinguistic task directed more attention to Manners during this window compared to English-speaking children (p < 0.05). Finally, a significant effect of Language was found for the 3-4 s analysis window (p < 0.01), such that Greek children directed significantly more attention to Manner regions than English-speaking children did. No effects of Task or Language, or interactions between them, were found for the other two analysis windows. 7 7 Although visual examination of the eyegaze data presented in Figure 2 may suggest that there should be task effects for Greek-speaking children in the 2-3 s analysis window, this pattern does not reach statistical significance in the elogit-transformed data on which the analyses were performed, perhaps because variance in the 2-3s window is greater than that in the 1-2s window.   When assessing attention to Manner regions for trials in the Linguistic task on which participants did not mention Manner information (Table 5), a significant effect of Language was found for the 2-3 s, 3-4 s, and 4-5 s analysis windows (all p < 0.05). In all cases, Greek-speaking children were directing more attention to Manner regions than were English-speaking children.

Effect
No effects of or interactions with Task were found in these windows, and no effects of Task or Language, or interactions between them, were found for the other two analysis windows.
To return to the questions we set out at the beginning of this section, these data do show differences in the way children direct their attention to dynamic motion events when engaged in linguistic and nonlinguistic tasks, with both similarities and differences across language groups. First, we found that by the time they are 3 years old, children, like adults, have begun to direct their eyegaze during the process of language production in ways that are linked to what they are planning to talk about. Specifically, our results demonstrate that when they were planning event descriptions that included Manner information, preschool-aged speakers of both languages devoted more attention to Manners of motion in the visual world than they did in a Nonlinguistic task. This increase in attention to Manners while planning event descriptions that included Manner information is consistent with a strategy in which children directed more attention to event components that they were planning to talk about. Critically, the increase in attention to Manners that we observed began within the second second of event viewing, i.e., as children were planning their event descriptions and before they had actually begun to produce them. There was no equivalent increase in attention to Manner information for trials on which children in the Linguistic task did not mention Manners in their motion event descriptions.
Moreover, we found that this increase in attention to Manner regions in the Linguistic task was less consistent for Greek-speaking children than it was for English-speaking children. This finding may be due to the fact that Greek-speaking children were already directing a considerable amount of attention to Manner regions, as demonstrated by their high level of attention to Manner regions even in the Nonlinguistic task. That is, if the interest that Greekspeaking children exhibited in Manner regions was already near ceiling, the process of planning to talk about those Manners might not have been able to boost attention to them beyond the baseline preference. If this is true, then it leaves open the possibility that Greek-speaking children in the Linguistic task were led to mention Manner information more than in prior reports (e.g., Papafragou et al. 2003Papafragou et al. , 2006 because those event elements were salient to them. We return to this finding in the General Discussion.

Attention to Manner over Path
In our first analysis, we focused on whether Manner was mentioned or omitted in event descriptions. In our second analysis, we pursue a more specific link between utterance content and attention allocation: we ask whether the relative attention allocated to Manner over Path within the Linguistic task changed depending on whether children encoded Manner exclusively or not. For this analysis, we compare trials on which children offered only Manner information (by far the most prevalent option in both languages, and more prevalent in English than in Greek) to trials for which they offered combinations of Manner and Path information (see Table 3). As in our previous analysis, we expect to see differences in the way children gather information from our motion events that are consistent with differences in the linguistic encoding biases in each language. Figure 3 depicts the way English-( Figure 3A) and Greek-( Figure 3B) speaking preschoolers directed their attention to motion event components in our events, split by the context in which Manner information was given (i.e., alone, or in conjunction with Path information). As before, the graphs depict just the 3 s before children were signaled to speak (by the beep) and 2 s after this signal. We used multilevel mixed elogit modeling as described above to compare patterns of attention to motion event components in our stimuli across language groups and types of event descriptions. Difference scores were calculated for each trial in the Linguistic task on which Manner information had been mentioned, whether alone or in conjunction with Path information by subtracting elogit-transformed proportions of looks to Path regions from elogittransformed proportions of looks to Manner regions in five 1-s windows beginning at stimulus onset. As in our description of Figure 2, we will refer to these analysis windows by their start and end times: thus, we analyzed data for the following five 1-s blocks of time: the 0-1 s window, the 1-2 s window, the 2-3 s window, the 3-4 s window, and the 4-5 s window. Difference scores Glossa: a journal of general linguistics DOI: 10.5334/gjgl.1210 were modeled separately within each 1-s analysis window, with Language (English, Greek) and Motion Information (Manner, Manner+Path) entered as first-level fixed factors.
For this analysis, we found effects of both Language and Motion Information on attention to motion event components ( Table 6). A significant effect of Language was found for the 1-2 s analysis window (p < 0.05): Greek-speaking children demonstrated a stronger preference for Manner regions over Path regions than English-speaking children did, regardless of the type of event description they were preparing. Additionally, a significant interaction between Language and Motion Information was found for the 2-3 s (p < 0.01) analysis window. In this analysis window, when English-speaking children mentioned only Manner information, they showed a preference to look at Manners that was significantly greater than that of Greek-speaking children who mentioned only Manners (p < 0.01). In addition, when English-speaking children mentioned both Manner and Path information, their preference for Manner regions was significantly lower than that shown by English-speaking children who mentioned only Manner information (p < 0.05); this pattern did not hold for Greek-speaking children. No effects of Motion Information or Language, or interactions between them, were found for the analysis windows not described. Greek (  This pattern of results reveals significant similarities in the relative attention that children paid to Manner and Path information in our motion events as they planned event descriptions but also two language-specific differences. In early stages of event apprehension and sentence planning (second analysis window), Greek-speaking children demonstrated a preference to look at Manner regions that exceeded that shown by English-speaking children regardless of whether learners planned to mention Manner alone or a combination of Manner and Path information.
As mentioned previously, this overall preference for Manners in Greek learners may be related to apparent interest in the dynamic vehicles depicted in our stimuli. In later stages of event apprehension and sentence planning (third analysis window), English-speaking children who mentioned only Manner information allocated more attention to Manner regions compared to Greek-speaking children who mentioned Manners exclusively. Furthermore, English-speaking children were more likely to shift their attention toward Path regions when planning to mention both Manner and Path compared to cases in which only Manner was mentioned, unlike their Greek-speaking peers who overall attended primarily to Manner. This pattern shows a tighter coupling between sentence content and attention allocation in English compared to Greek learners that reflects the very stable bias in motion encoding observed in adult English speakers. The presence of a more diffuse pattern in Greek learners is the result of an overall bias to attend to Manner (perhaps also coupled with some flexibility in motion lexicalization preferences; Selimis & Katis 2010; Soroli & Verkerk 2017).

General discussion
In this study, we used a combination of linguistic and online methods to investigate the way preschool-aged speakers of English and Greek inspect and describe dynamic motion events.
Our goal in investigating the way children of this age describe motion events was to investigate how early children begin to exhibit the kind of language-specific biases in the encoding of motion event information that have been previously reported for adult speakers of these languages. Specifically, we asked whether preschoolers' tendency to mention Manner and Path information when describing motion events mirrors that of adult speakers of their language. Additionally, we used eyetracking to carry out a novel investigation of the way "thinking for speaking" operates across young speakers of different languages. More specifically, we asked whether preschool-aged speakers of English and Greek, like adult speakers of these languages (e.g., Papafragou et al. 2008), exhibit language-specific patterns of event inspection when they are involved in the process of selecting motion information to talk about.
Our assessment of children's event descriptions confirms that by the time they are 3 years old, children learning English and Greek are already beginning to show differences in the way they prioritize motion event information in event descriptions. Specifically, we saw that, consistent with adult patterns, English-speaking preschoolers mentioned the Manners of our target motion events more often than Greek-speaking preschoolers did. Additionally, we found that older children tended to provide more information about our motion events than younger children did. These findings are consistent with crosslinguistic data on motion event encoding in very young children (e.g., Özçalişkan & Slobin 2000;Papafragou et al. 2002;Özyürek et al. 2008;Papafragou & Selimis 2010a), and are reminiscent of recent crosslinguistic work on the description and inspection of complex causative events (Bunger et al. 2016). We also found that children from both language groups were more likely to provide Manner information about our motion events than Path information. This finding points to the fact that motiontypological patterns apply with some flexibility within individual languages, and is consistent with evidence that Greek speakers sometimes uses satellite-framed motion encoding (Soroli & Verkerk 2017; cf. also Selimis & Katis 2010). As mentioned previously, we think that this crosslinguistic preference to talk about Manners had to do with properties of our stimuli that included interesting, dynamic, and visually salient means of transportation such as parachutes, sailboats, planes and hot air balloons (see Appendix A). 8 The amount of attention that children directed to Manner regions of our stimuli supports this conclusion: across language groups, children devoted a considerable amount of attention to Manner regions even when they were not preparing descriptions of the events. Additionally, we found that young speakers of English and Greek demonstrated subtle differences in the way they inspected motion events while preparing to talk about them. Specifically, English-speaking children demonstrated a high attentional preference for the Manner regions of our events while they were preparing event descriptions that included Manner information. When they were preparing event descriptions that also included Path information, they shifted this attentional preference in the direction of the Path regions. Greek speaking children, on the other hand, did not demonstrate differences in their attention to motion regions as they planned different kinds of motion event descriptions: regardless of the motion information they planned to mention, they demonstrated a consistent preference to attend to the Manners of those events. We have suggested that both of these patterns are consistent with a sensitivity toward adultlike patterns of motion event description in each language. English speakers are more likely to gather Manner information from a motion event while planning to talk about it (most likely in a verb), and so when they also plan to mention Path information, their relative preference for Manner over Path decreases. Greek speakers, on the other hand, are less likely to mention Manner information (even though they sometimes present a more balanced typological pattern with respect to the description of motion events; Selimis & Katis 2010;Soroli 2012), and Greekspeaking children in this study did not show differences in their relative preference for Manner and Path elements as they planned event descriptions with these different kinds of information.
Finally, we found that young speakers of English and Greek, like adult speakers of their languages, direct their attention to motion events in different ways when they are preparing to talk about them compared to their attention patterns while inspecting the same events in a nonlinguistic task. Specifically, we found that children from both language groups increased their attention to the Manners of our motion events when they were planning sentences that included Manner information compared to the attention they paid to these regions when viewing them in preparation for a memory task. Previous work has demonstrated this pattern of thinking for speaking in 4-year-old speakers of English (Bunger et al. 2012); here we extend it not only to younger speakers but also to speakers of a language that is typologically different from English. Again, these findings suggest that the emerging linguistic biases that we saw in the preschoolers in this study have already begun to have online effects on the way they gather information from motion events.
It remains to be determined why Greek-speaking children showed a stronger preference for Manners in the nonlinguistic task than English-speaking children did. 9 One possibility is that this preference is related to the way Greek speakers tend to encode motion information in language. Papafragou and colleagues (2008) reported that, when they were asked to remember a motion event in a Nonlinguistic task, adult speakers towards the end of each trial directed more attention to motion components that were not encoded in verbs in their language: English speakers to Path endpoints and Greek speakers to Manners of motion. Later work (Trueswell & Papafragou 2010) suggested that this eyegaze pattern was due to a potential for covert linguistic encoding during the Nonlinguistic task: when adult speakers had access to linguistic processing resources, they (silently) encoded information in language to support memory. It is, therefore, conceivable that Greek-speaking children covertly encoded Manners using language in the current nonlinguistic task in preparation for the upcoming memory test. This possibility is not fully satisfying: as argued earlier, covert linguistic encoding is unlikely in children of this age because of limitations on the use of language to support memory before the age of

Conclusion
This study breaks new ground by demonstrating that children as young as 3 years of age demonstrate crosslinguistic patterns of eyegaze during the process of language production that are consistent with Slobin's thinking for speaking hypothesis (Slobin 1996b;2006). As reviewed in the Introduction, previous studies have demonstrated that adult speakers of English and Greek prioritize Manner and Path information differently when describing motion events, and that they also direct their attention to motion events in language-specific and task-specific ways. Here, we expand our understanding of the development of these cross-linguistic patterns by demonstrating that 3-and 4-year-old speakers of these languages have already begun to prioritize information about motion events like adult speakers of their languages do when describing such events, that they demonstrate language-specific patterns of event inspection as they plan those descriptions (by directing their attention by and large to things that they plan to talk about), and that they direct their attention to motion events depending on whether the task is linguistic (language production) or nonlinguistic (memory). Together these results enrich our knowledge of how sentence production proceeds in speakers of different languages (Levelt 1989) and illustrate the rich processes that allow children to transform their thoughts into utterances.