On the road to somewhere: Brain potentials reflect language effects on motion event perception

Recent studies have identiﬁed neural correlates of language effects on perception in static domains of experience such as colour and objects. The generalization of such effects to dynamic domains like motion events remains elusive. Here, we focus on grammatical differences between languages relevant for the description of motion events and their impact on visual scene perception. Two groups of native speakers of German or English were presented with animated videos featuring a dot travelling along a trajectory towards a geometrical shape (endpoint). English is a language with grammatical aspect in which attention is drawn to trajectory and endpoint of motion events equally. German, in contrast, is a non-aspect language which highlights endpoints. We tested the comparative perceptual saliency of trajectory and endpoint of motion events by presenting motion event animations (primes) followed by a picture symbolising the event (target): In 75% of trials, the animation was followed by a mismatching picture (both trajectory and endpoint were different); in 10% of trials, only the trajectory depicted in the picture matched the prime; in 10% of trials, only the endpoint matched the prime; and in 5% of trials both trajectory and endpoint were matching,


Introduction
Research studying language differences in the domain of motion events has relied mainly on behavioural measures obtained during verbal and non-verbal tasks.Various studies of language production, analyzing motion event descriptions elicited with pictures, animations, or storybooks show cross-linguistic differences in information selection and organization.Variation in the lexical and grammatical concepts available for expressing motion influences the extent to which speakers attend to specific elements of motion events, when construing utterances that describe them (cf. the 'thinking for speaking' hypothesis, Slobin, 1996).For example, speakers of languages that provide manner of motion verbs (e.g., English, 'to run/ walk/stroll') will pay more attention to the aspects of the event that provide manner information, and they will encode this linguistically (e.g., Papafragou, Hulbert, & Trueswell, 2008).Speakers of languages that provide verbs expressing path-or direction-information (e.g., French, 'entrer dans' to enter, 'se diriger vers' to direct oneself towards), pay less attention to manner; the linguistic encoding of manner of motion in sentences is optional (cf.Talmy, 1985).This questions the extent to which such language differences in event conceptualization may also show in non-verbal measurements, such as eye movements recorded during motion event scene perception, in the absence of explicit description instructions (Flecken, von Stutterheim, & Carroll, 2014;Papafragou et al., 2008), and in performance on various types of categorization, matching, and recognition tasks (e.g., Gennari, Sloman, Malt, & Fitch, 2002;Kersten et al., 2010;Papafragou, Massey, & Gleitman, 2002).Overall, results are mixed with language effects occurring primarily in paradigms involving verbally-mediated working memory (see Athanasopoulos & Bylund, 2013;Papafragou & Selimis, 2010).
Recently, researchers studying relativity effects in the domains of colour perception and object categorization have begun to use neurophysiological measures to gain insights into effects of language on visual processing (Boutonnet, Dering, Vinas-Guasch, & Thierry, 2013;Liu et al., 2010;Mo, Xu, Kay, & Tan, 2011;Thierry, Athanasopoulos, Wiggett, Dering, & Kuipers, 2009).These studies provide the strongest support to date for the claim that conceptual distinctions expressed linguistically impact cognitive processing, at least in part automatically, and in the absence of explicit linguistic manipulations.Such language effects can be detected by event-related brain potentials known to index attention processing and stimulus evaluation.Here, we test whether previously attested behavioural differences in attention to and saliency of aspects of motion events can be characterised in a non-verbal task.
The cross-linguistic comparison is based on language differences relating to temporal properties of motion events.The type of events involves volitional motion of an entity (e.g., vehicle, person) along a trajectory (road, path), in the direction of potential endpoint-objects or locations (house, tunnel).Crucial for the analysis of information selection is the fact that the endpoints are not reached by the entities in motion in motion event stimuli; they are thus optional for selection, and dependent on the viewpoint of the speaker1 .A growing body of literature studying linguistic encoding patterns shows how the presence or absence of grammatical aspect in a language affects the extent to which speakers pay attention to and describe endpoints of such motion events (von Stutterheim & Nüse, 2003;von Stutterheim, Andermann, Carroll, Flecken, & Schmiedtová, 2012): Speakers of non-aspect languages (that do not encode aspect systematically on the verb, such as Afrikaans, German or Swedish) show a linguistic bias towards action goals and motion event endpoints (Bylund, Athanasopoulos, & Oostendorp, 2013;Schmiedtová, von Stutterheim, & Carroll, 2011; e.g., 'a woman walks towards a building', when describing a stimulus showing a person walking in the direction of, but not reaching, a building).Speakers of aspect languages, i.e., languages that provide grammatical means to provide aspectual contrasts (e.g., the progressive in English, perfective and imperfective aspect in Russian; cf.Comrie, 1976;Dahl, 2000) tend not to mention endpoints of motion events when they are not explicitly focused and not unambiguously represented as part of the event ('a woman is walking [along the road/past the church, etc.]', for the same stimulus).Speakers of these languages thus take a different perspective on the same event: Aspect-language users (e.g., speakers of English) focus on the inner temporal contours of the situation and are thus more sensitive to the specific ongoing phase of the event depicted, thereby taking an 'inside' view of a situation ('a person is walking').Non-aspect language users (e.g., speakers of German) typically take a holistic perspective (Athanasopoulos & Bylund, 2013;von Stutterheim & Nüse, 2003) and refer more frequently to the endpoint of a motion event, when boundary-crossing is optional and dependent on the perspective taken by the viewer ('a person walks towards X').The empirical evidence covers a wide variety of languages and shows the same interrelation between aspect and event endpoints in linguistic encoding, tested for the non-aspect languages IsiXhosa, Afrikaans, Dutch, German, Swedish, and the aspect languages Arabic, English, Russian, Spanish (Bylund & Athanasopoulos, 2014;Bylund et al., 2013;von Stutterheim et al., 2012).
To tap into conceptualization processes and attentiondirecting mechanisms more directly, recent studies have included measurements of co-verbal processing, such as the registration of eye movements (fixations on endpoints in motion scenes, design cf.Papafragou et al., 2008), in comparison to fixation patterns registered in non-verbal tasks (Flecken et al., 2014).Findings show a higher degree of attention allocated to event endpoints by non-aspect language users (German) compared to speakers of English or Arabic, in verbal as well as non-verbal tasks.Other recent studies have investigated non-verbal motion event categorization preferences using a triads-matching task.Speakers of non-aspect languages (Swedish) were more prone to rely on endpoints as the main criterion for event categorization (Athanasopoulos & Bylund, 2013).Interestingly, in a number of follow-up studies on bi-and multilinguals, this endpoint categorization-criterion was modulated by the specific degree of exposure to the aspect language (English) (see Bylund & Athanasopoulos, 2014;Bylund et al., 2013).Crucially, all the above-mentioned studies on grammatical aspect used naturalistic, dynamic motion event stimuli (live-recorded video clips), which arguably are easily amenable to implicit verbalization, as attested by the fact that the cross-linguistic differences were abolished under a verbal interference condition (Athanasopoulos & Bylund, 2013).At this point, we thus cannot make definitive claims regarding the online involvement of language, i.e., the use of verbal strategies when task conditions allow for it, vs. the existence of different representations of events in speakers of different aspect systems, as would be assumed in strong relativistic views (Lucy, 1992;Lucy, 1997;overviews in Gentner & Goldin-Meadow, 2003;Gumperz & Levinson, 1996).
We take the aspectual contrast between German and English as our test case for the linguistic relativity hypothesis, and extend it to the neuropshyiological domain.In line with recent studies of language-perception interactions using ERPs (Athanasopoulos, Dering, Wiggett, Kuipers, & Thierry, 2010;Boutonnet et al., 2013;Thierry et al., 2009) we used a visual oddball paradigm to investigate language effects on perceptual and attentional processing of motion events.To our knowledge, this is the first study investigating language effects on perception beyond the word-level and rooted in a core grammatical category using ERPs.
In Experiment 1, we engaged native speakers of German and English in an event matching task, in which they saw animated schematic motion events followed by target pictures.On each trial, they were asked to pay attention to both the trajectory and the endpoint of the animation (e.g., a dot moving along a straight trajectory towards a square) and then match these features of the animation to those depicted in the target picture.Prime-target associations defined four conditions: Fully incompatible in terms of trajectory and endpoint (mismatch), fully matched for trajectory and endpoint (full match), matched for endpoint but not trajectory (endpoint match), and matched for trajectory but not endpoint (trajectory match).The frequency of the conditions was manipulated in order to elicit a P3 ERP response in the full match (5% of trials), endpoint match (10%) and trajectory match (10%) conditions.The P3 component is known to reflect attentional processing, stimulus evaluation, and target detection (see Polich, 2007 for a review).Therefore, we expected to see largest P3 amplitudes in the full match condition (response trials), and P3 modulations in the endpoint and trajectory match conditions, since they were presented with a low local frequency and tempted participants to respond.Differences in P3 amplitude between the two partial match conditions would index the participants' attentional bias towards the specific motion element.In the German group, we anticipated a larger P3 in the endpoint than the trajectory match condition, indicating a higher degree of attention devoted to this feature.On the other hand, English speakers should devote similar levels of attention to endpoint and trajectory, leading to a P3 of similar amplitude in the two conditions.
In Experiment 2, native English and native German participants performed a similar non-verbal motion matching task, without an oddball manipulation (behavioural variant of the ERP paradigm).This time, the picture preceded presentation of the animation, and participants were instructed to indicate on each trial whether or not the animation fully matched the preceding picture.We analyzed the number of responses as well as reaction times.Experiment 2 was conducted to test whether potential attentional biases reflected in ERPs can be measured behaviourally during a higher order similarity judgement task.Performance on motion matching and similarity judgement tasks have shown that the involvement of verbal working memory may well be the cause of language effects (Athanasopoulos & Bylund, 2013).A comparison of results in the two tasks, given their specific conditions, will be informative with respect to the mechanism underlying language effects in non-verbal paradigms.This is especially interesting in light of current questions that relate to whether or not the language system exerts online (i.e., in the form of online verbal encoding processes which are at play also during non-verbal tasks) or structural (i.e., in the sense of representational differences in speakers of different languages) feedback to the perceptual system during processing.In both experiments we used schematic stimuli (basic shapes, repeated numerous times) that show abstract motion, in response to which participants were asked to make prompt judgements.In contrast to previous paradigms using real-world motion video clips, here, online linguistic encoding would be challenging, given the speed of presentation, the absence of naturalistic, volitional motion, and the schematic nature of the stimuli.If language effects are not obtained in the behavioural variant of the non-verbal ERP paradigm, we can with more certainty state that any differences found in ERPs between the two groups of speakers concern purely nonverbal motion event cognition.
If German participants are biased towards motion event endpoints during visual processing, greater P3 amplitudes should be elicited by endpoint match than mismatch pictures, and crucially, than trajectory match pictures.If such attentional biases also affect overt judgements of the same schematic motion stimuli in a behavioural task, reaction times as well as false alarm rates could be affected: German participants could be slower on endpoint match trials, and they may be tempted to judge those as full match trials (resulting in false alarms).If language effects emerge in both experiments, this would imply that, for schematic motion stimuli, verbal event encoding processes are active, affect attention processing, stimulus evaluation, and overt comparison and similarity judgement processes.If, however, language effects are only obtained in the matching task, but are not reflected in the P3 component of ERPs, there are still reasons to assume that online verbal encoding processes take place -results would imply that, in principle, the stimuli allow for verbal encoding regardless of their non-verbal nature.However, the implicit construal of a fullfledged event description may require more time than the time captured by the ERPs recorded here, which may thus capture processing stages prior to sentence formulation.These stages could encompass 'purely' visual and/or attentional processing, potentially followed by lexical retrieval processes related to individual objects or elements of motion.Effects of the actual linguistic construal of events are then reflected in matching behaviour.If, as a third option, language effects are only obtained in the ERPs, but not in its behavioural variant, the cause of the effect is less likely to be due to verbal encoding of the stimuli.Instead, this would render stronger grounds to assume that language drives an attentional bias in the visual domain in an entirely non-verbal context, implying that language can affect certain non-verbal levels of representation.

Participants
Twenty native speakers of English (students at Bangor University, UK), and twenty native speakers of German (currently studying at Radboud University, the Netherlands) took part in our study.All participants were right-handed and between the ages of 19 and 25.English participants were tested in Wales (13 female; M age 23, range 19-25), and German participants were tested in the Netherlands (11 female; M age 21, range 18-25).
Language background was evaluated by means of a questionnaire.The native speakers of English reported basic to intermediate knowledge of a second language which was never German.The native speakers of German were intermediate to advanced speakers of Dutch, and all subjects reported some knowledge of English.Proficiency in Dutch was estimated at B2 level as rated by the Common European Framework of Reference for languages (Council of Europe, 2001), which is the level required for German and other foreign students to be able to study at Radboud University; they had all taken an intensive Dutch course before the start of the first semester of Psychology, instructed through the medium of Dutch.Knowledge of Dutch is not considered a confound given that Dutch largely overlaps typologically with German and does not encode progressive aspect grammatically on the verb for motion events (Behrens, Flecken, & Carroll, 2013).Knowledge of English in German participants was assessed using two standardized proficiency tests (Oxford Quick Placement Test, 2001;DIALANG, Alderson & Huhta, 2005); the average score on the QPT was 43.92 (out of 60, corresponding to B2 level), and the level captured by Dialang was similar (17 participants: B2, 3 participants: C1).None of the participants reported daily or frequent exposure to English; at the time of testing their dominant language was German with Dutch their second, late-learned language.All participants received course credits or cash for participation in the experiment, which lasted 1-1.5 h.

Materials and procedure
In each experimental session, participants watched 492 prime-target pairs.Primes are one-second animated video clips depicting basic motion events, in which a dot moved along a trajectory towards a geometrical shape.The dot never reached the endpoint shape during the video play time.There were four different animations (see example in Fig. 1).In all cases, the target pictures provided a symbolic representation of the animated primes.
The experiment conformed to a classic oddball paradigm in which prime-target pairs belonged to four conditions: Only 5% of trials were full matches (24 trials).Seventy-five percent (372 trials) were full mismatch trials in which neither the trajectory nor the endpoint depicted in the target picture matched the preceding animated prime.In the remaining two conditions either trajectory or endpoint of the animated sequence matched the target picture and these conditions represented 10% of trials each (see Fig. 2).Target pictures were presented on a white background on a 19-in.CRT monitor.Participants were seated about 1 m from the screen and stimuli subtended approximately 8°of visual angle.
The animated prime was displayed for 1000 ms, followed by a blank screen for 200 ms and then the target picture for 600 ms.During the 800 ms inter-trial interval the screen was blank.Experimental conditions were presented in a pseudorandomized order such that rare trial types or deviants (full match and partial matches) were never presented in immediate succession and separated by 3-5 frequent trial types or standards (full mismatch).Participants were instructed to press a button in the full match condition (press the button only if the picture exactly matches the preceding animation).Participants were also informed that the picture would look smaller than the video and only symbolise movement.The proportion of correct button presses was high in both groups and not different between groups ($90%).
Both groups of participants were instructed in their native language by the same experimenter, a highly proficient speaker of German and English, assisted by a native speaker assistant in each case.Stimuli were run using the same Presentation (Neurobehavioral systems™) script, ensuring similar task conditions and the activation of a German or English language context in our participants (see for effects of language context on attention to motion elements, Lai, Rodriguez, & Narasimhan, 2014).

EEG recording and analysis
For adequate sampling purposes, electrophysiological data were recorded in two different laboratories, ensuring that English participants were as close to monolingual as possible and ensuring homogeneity of the German group.English participants were tested in Bangor, Wales (Bangor University) and data acquisition was implemented with Neuroscan 4.4™.EEG was recorded in reference to Cz. German participants were tested in Nijmegen, the Netherlands (Donders Centre for Cognition).Given the local technical procedures and system parameters, the data were recorded using BrainVision Recorder 1.1™ in reference to the left mastoid.In both laboratories, sampling was set to 1 kHz and data were recorded from 32 electrodes placed according to the 10-20 convention.Impedances were kept below 10 k Ohms.EEG activity was filtered offline with a bandpass zero phase-shift filter (0.1 Hz, 12 dB/oct -30 Hz, 48 dB/oct).Eye blinks were mathematically corrected based on the procedure advocated by Gratton, Coles and Donchin (1983).Automatic artefact rejection discarded all epochs with an activity exceeding ±75 lV.Individual ERPs were computed from epochs ranging from À100 to 1000 ms after stimulus onset and baseline corrected in reference to 100 ms of prestimulus activity.Individual averages were re-referenced to the average of the left and right mastoid sites to make the two datasets fully comparable.
P3 analysis was based on individual ERPs elicited in the four conditions.The P3 was maximal over central-parietal scalp areas in both groups and studied at electrodes Cz, CP1/CP3, CP2/CP4, P3, Pz, P4, reflecting a typical P300 scalp topography (alternative electrodes sites reflect differences between electro-caps used in Wales and the Netherlands).Visual inspection of the maximum P3 effect for the full match response condition in relation to the frequent mismatch condition showed a slightly earlier peak in the English dataset (around 520 ms post stimulus onset) compared to the German dataset (around 610 ms).To ensure that general task processing and attention devotion (related to spotting full match trials) was similar in each group, a condition analysis was performed on the time window of the maximum P3 effect for the response condition in each group.Furthermore, because of our interest in the positive peaks post 300 ms for the two critical conditions (endpoint match and trajectory match), an additional time window was analyzed: To incorporate for each group the peaks for the full match condition, but especially also the peaks for the two critical conditions (endpoint match and trajectory match), a further condition by group analysis of the P3 component was performed on a broader time window covering 350 to 700 ms after stimulus onset.
Mean amplitudes during the time span of the maximum P3 effect for the full match condition compiled from 6 electrode sites (Cz, CP1/CP3, CP2/CP4, P3, Pz, P4) were analyzed using a repeated measures ANOVA with condition (full match, endpoint match, trajectory match, mismatch) as a factor for each group separately.Mean P3 amplitudes between 350 and 700 ms averaged across the same 6 electrodes were subjected to a mixed 4-by-2 ANOVA, with condition (full mismatch, endpoint match, trajectory match, full match) as within-subjects factor and language (German, English) as between-subjects factor.In addition, millisecond-by-millisecond paired samples t-tests comparing mean amplitudes in the endpoint match and trajectory match conditions were computed for the entire segment, to determine more precisely the exact time windows of differences between critical conditions in each group.
In the condition analysis in the time spans around the maximum P3 peak for the full match condition in each group, we expected significantly larger P3 amplitudes for the full match condition, compared to all other conditions, reflecting a similarly high degree of attention devoted to the task of detecting full matches between animation and picture in German and English participants.This analysis is important in ensuring similar patterns with respect to task-related attention in our participants.We were also interested to find out whether there were any differences between critical conditions during this window, even though the time windows do not necessarily cover the P3 peaks for the critical conditions.
In the 350-700 ms time window covering P3 peaks also for the two critical conditions we expected a significant main effect of condition, again given the oddball manipulation (full match condition being the most infrequent and the response condition, expected to elicit the largest P3 amplitude), but also a language-by-condition interaction, indicative of differential responses to the trajectory and endpoint match conditions in the two groups.

ERP results: P3
Fig. 3 below plots P3 amplitudes, subtracting the frequent (mismatch) condition from the rare response condition (full match) in each group, visualizing the P3 effect for the response condition.In the English dataset, the peak occurs around 520 ms and the analysis was performed on a time window covering 100 ms preceding and following this peak (420-620 ms).In the German dataset, the peak appears to be later (around 610 ms) and the analysis was performed on the time window covering 510-710 ms.
For the English data, a repeated-measures ANOVA showed a significant condition main effect, F(1.954, 37.133)2 = 25.211,p < .001,g 2 p = 0.570, with Bonferroni corrected posthoc tests showing that the P3 elicited in the response condition (full match) was significantly larger than that elicited in all other conditions (full match vs. mismatch: p < .001; vs. trajectory match: p < .05; vs. endpoint match: p < .05).The P3 for the two critical conditions in this window were also significantly more positive than the P3 for the mismatch condition (both comparisons p < .05).There was no difference between the trajectory match and endpoint match conditions (p = 1.00, n.s.).In the German dataset, a similar pattern was found (F(1.775,33.731) = 23.842,p < .001,g 2 p = 0.557): the full match condition elicited most positive P3 amplitudes (full match vs. full mismatch p < .001; vs. trajectory match p < .05; vs. endpoint match p < .05), the two critical conditions were in between the mismatch and full match conditions, with no significant difference between the two (p = 1.00, n.s.).Fig. 4 below plots P3 amplitudes for all conditions for the entire segment.P3 amplitude analyses in the 350 to 700 ms time window rendered a main effect of condition (F(1.901,72.228) = 41.946,p < .001,g 2 p = 0.525) and a main effect of group (F(1, 38) = 17.991, p < .001,g 2 p = 0.321), such that the P3 for the full match condition was larger than that for the three other conditions, and the German participants showed more positive P3 amplitudes overall.Importantly, the condition by group interaction was significant (F(1.901,72.228) = 3.057, p < .05,g 2 p = 0.074).To explore the interaction further, a separate condition analysis was conducted for each group in the 350-700 ms time window.In the German group, there was a condition main effect (F(1.851, 35.165) = 23.209,p < .001,g 2 p = 0.550): Posthoc tests showed that the full match condition elicited more positive P3 amplitudes than the other three conditions (all comparisons p < .001).P3 amplitudes for the endpoint match condition were more positive than the mismatch condition (p < .05),but there was no difference between the trajectory match and the mismatch condition (p = .073,n.s.).Crucially, the endpoint match condition differed from the trajectory match condition (p < .05).In the English group, we also found a main effect of condition (F(1.900,36.093)= 20.701,p < .001,g 2 p = 0.521).The P3 for the full match condition was significantly more positive than the full mismatch P3, as well as the P3 elicited by the two critical conditions (all comparisons p < .001).The mismatch condition furthermore differed from the two critical conditions (all comparisons p < .05).Critically, the endpoint match and trajectory match conditions were not significantly different from one another (p = .766,n.s.).
Millisecond-by-millisecond t-tests support and specify the analyses on average amplitudes for the 350-700 ms time window: Whereas in the English group only a small time window in which the endpoint match condition displays higher amplitudes than the trajectory match condition could be identified (350-396 ms), this pattern is significant throughout most of the analyzed P3 time range for the German participants, and even for a short time window (from 278 to 302 ms) before the actual P3 peak.In English participants, endpoint match trials thus drew more attention than trajectory match trials only for a brief interval early in the P3 range, whereas the German participants differentiated between the two critical conditions throughout the entire time span of stimulus evaluation.
In addition, average amplitudes in the P1-N1 range were also analyzed in a similar fashion (in line with Boutonnet et al., 2013;Thierry et al., 2009), rendering no main, nor interaction effects.

Participants
Native English-speaking participants were students at Reading University (UK) and tested there (N = 15).German participants were again recruited and recorded at Radboud University, Nijmegen, the Netherlands (N = 19; both groups balanced for gender).Participants in this experiment had similar educational and socio-economical backgrounds as the participants tested in Experiment 1.

Materials and procedure
The same materials were used as in Experiment 1.This time, participants watched sequences of a still picture implying a motion event (600 ms), followed by a short motion event animation (1000 ms) with a blank screen (200 ms) in between.The inter-trial interval was a 800 ms blank screen.Each animation was preceded by each picture (thus giving 16 combinations in total), and each condition was repeated 10 times, rendering 160 trials in total.There were 40 trials per condition (full match, mismatch, endpoint match, trajectory match).Experimental conditions were presented in a fully randomized order and run using a Psychopy script (Peirce, 2007).
Participants were instructed to press one of two buttons on each trial to indicate whether the animation fully matched the preceding picture, yes or no.The order of picture and animation was reversed when compared to Experiment 1, giving participants sufficient time to respond by pressing a button3 .Reaction times and response types ('yes' or 'no') were registered.

Results 4,5
We analyzed accuracy, overall reaction times, as well as reaction times on correct 'no' responses to (full and partial) mismatch trials, and false alarm rates, with mixed ANOVAs of group by condition.
Table 1 below lists the average proportion of correct responses for each condition in each group.A mixed ANOVA of group by condition rendered a significant effect of condition (F(1.130,36.160)= 5.711, p < .05,g 2 p = 0.151), no main effect of group (F(1, 32) = 2.703, p = .110,n.s.) and no interaction between the two factors (F(1.130,36.160)= .981,p = .339,n.s.).Overall, there was a higher correct performance for the mismatch condition than the full match condition (p < .001,all p-values Bonferroni corrected), no difference between the mismatch and the endpoint match conditions (p = .605,n.s.), and a marginal difference between the mismatch and trajectory match conditions, with fewer correct responses in the latter condition (p = .054).There was no difference between the critical conditions (p = .140,n.s.).
Table 2 below gives mean reaction times for all responses.
Table 3 shows mean reaction times for correct 'no' responses to motion mismatch trials.
In the German group, there was a significant condition effect (F(1.433, 25.796) = 14.327, p < .001,g 2 p = 0.443).Posthoc comparisons showed that participants were faster to reject mismatch trials than endpoint match (p < .001)and trajectory match trials (p < .05).There was no difference between the endpoint and trajectory match conditions (p = .196, n.s.).In the English group, there was a non-significant trend for an effect of condition (F(2, 28) = 2.814, p = .077,g 2 p = 0.167).Below, the average proportion of false alarms for the endpoint match, trajectory match and mismatch conditions are listed (Table 4).Note that yes responses on the full match trials are the only correct responses; positive responses registered for the other conditions as listed below are thus false alarms.
A mixed ANOVA on the number of false alarm button presses revealed a significant condition effect (F(1.057, 33.812) = 7.227, p < .05,g 2 p = 0.184) and a nonsignificant trend for a group effect (F(1, 32) = 2.987, p = .094,g 2 p = 0.085).The interaction of condition by group did not reach significance (F(2, 64) = 1.036, p = .320,n.s.).Overall, there were more false alarms for endpoint and trajectory match trials than mismatch trials (mismatch vs. endpoint match: p < .05,mismatch vs. trajectory match: p < .05).There was a non-significant trend for more false alarms in German participants than in English participants, for the trajectory match condition.
The behavioural results show no significant group differences or interactions of group with condition with respect to overall performance on the task, as reflected in accuracy rates and speed of processing.This shows that both groups perform roughly the same when making overt judgements related to the degree of match between motion pictures and animations: Performance was fastest and most accurate on the mismatch condition, and worse on the trajectory match condition.In German participants reaction times on rejection (motion mismatch) trials were slower for the two partial match conditions, compared to the complete mismatch condition.

Discussion
English and German differ in a domain of grammar, relevant for motion event encoding: In English, an aspect language, temporal contours and phases of events are highlighted, whereas in German, a non-aspect language, endpoints of motion events are focused in event encoding.We investigated whether these grammatical differences affect visual attention allocation to elements of motion events in a non-verbal context using brain and behavioural measures.
The visual oddball paradigm used elicited typical P3 brain responses to the rare response condition (full match) in both groups.The maximum difference between the P3 for the full match and the mismatch (frequent control) conditions, i.e., the P3 effect for the response condition, occurred slightly earlier in the English dataset.Overall, the P3 peak in the English data seems steeper and more short-lived than the P3 peak in the German dataset, which lasts longer and is more positive in general.Findings from the behavioural task show that English participants were not overall faster in performing the task of detecting full match trials (bearing in mind that a different participant sample was tested); however, we did find German participants to be slower at rejecting partial mismatch (in endpoint or trajectory) trials.These results could tentatively indicate general lower abilities in inhibiting responses to distracter trials in German participants.This could, at least in part, explain the slightly differing shapes of the P3 waves.Nevertheless, the important finding here is that during the task-relevant P3 peak (full match peak time window) all conditions pattern similarly in each group.This shows that task processing and the attention devoted to the execution of the task was very similar in both German and English participants, a necessary precondition for our interpretation of differences related to the two critical conditions.The next set of analyses aimed at unravelling potential language effects in relation to the critical conditions: In the 350-700 ms time window which covers the P3 peak for the task-relevant condition (full match), as well as the P3 peaks for the critical endpoint match and trajectory match conditions, we find that, in German participants, the endpoint match condition elicited a significantly more positive P3 than the trajectory match condition.This indicates that in this group, the endpoint was processed with more attention, and perceived as more relevant or salient than the trajectory when matching animations with pictures.In addition, we found that the differentiation between the endpoint and trajectory match conditions already started before the analyzed P3 time window in German participants.
In English participants, there were no sustained differences in P3 amplitude between endpoint and trajectory match conditions, suggesting that there was no attentional bias and that both elements were similarly attended to.By comparing each oddball condition to the mismatch (control) condition, we can evaluate actual P3 condition effects.P3 effects were obtained for the full match and endpoint match conditions in both groups, but only in English participants a P3 effect for the trajectory match condition was found.The P3 effect obtained for the full match condition again underlines equal task processing and attention as this is the only condition which is relevant  for the task and the condition which participants were explicitly instructed to attend to.The P3 effect for the endpoint match condition in both groups may reflect a potential language-independent bias towards motion event endpoints, which has been reported previously in verbal as well as non-verbal task paradigms on motion event processing (see e.g., Slobin, 2006;Zacks & Tversky, 2001).However, the attentional bias towards endpoints over trajectories of motion lasts for a longer interval in German participants.English participants show an endpoint-overtrajectory bias for a brief interval early during the P3 window analyzed.In addition, we also find a P3 effect for the trajectory match condition in English, meaning that this effect is present for both partial match conditions, underlining that no differentiation is made at the level of attention processing between the two elements of motion events in this group.The different patterns within each group are thus crucial in interpreting these findings as a language-derived effect on attention processing.
These P3 findings suggest that individuals transfer grammatically driven ways of speaking about event scenes to non-verbal visual processing.Importantly, this language effect concerns complex grammatical structures which are relevant for perspective-taking and sentence structure, rather than basic categories such as colour, related to terminological differences between languages, and which have previously been shown to affect perception (Thierry et al., 2009).Here we show that such language patterns affect online visual attention allocation.Importantly, there were no differences in participants' judgements of the similarity of the same abstract depictions of motion, and no differences in the speed with which judgements were made, providing evidence for a conception of the stimuli as non-verbal.This makes the potential online construal of sentences describing the events (verbal encoding strategies) unlikely.We argue that performance on an overt stimulus matching task without concurrent linguistic interference can be prone to verbalization strategies and reliance on verbal working memory to aid the matching of pictures to preceding animations, as previously proposed (Athanasopoulos & Bylund, 2013;Trueswell & Papafragou, 2010); the present null result on the behavioural task thus indicates that the abstract schematic stimuli are non-verbal, i.e., they are not amenable to verbalization strategies related to event encoding.This is hypothetically caused by a lower real-life value of the animations, a necessary concession when recording ERPs to avoid signal contamination by eye movements.The processing time normally required for the conceptualization and formulation of full-fledged event descriptions supports our assumption regarding the absence of verbal event encoding strategies (cf.Griffin & Bock, 2000).
Taken together, these results have implications for existing theoretical accounts of the mechanisms underlying language effects on perceptual and attentional processes, which assume online feedback from the language system (the 'label-feedback hypothesis', cf.Lupyan, 2012, theorizing online modulation of perception caused by linguistic labels), and accounts that assume structurally different representations in participants of different cultural and linguistic backgrounds (a strong relativistic view, see Gumperz & Levinson, 1996;Lucy, 1992).We hypothesize that the specific demands of the non-verbal ERP paradigm, requiring rapid inspection and explicit evaluation of the animation sequences, might enhance the likelihood of involvement of automatized routines of visual processing.These routines draw on motion representations, focusing more attention to endpoints in German participants, arguably because endpoint match trials represent more compatible or typical trials, i.e., they share typical values on dimensions learned to be important when speaking about motion.In this interpretation, the biasing role of language in attention processing would be durable rather than transient (not to say that it cannot be overridden given specific task conditions, this is something for future research to explore): Linguistically-entrained processing routines are recruited when task demands allow for it/require it in an automatic fashion; they are based on structurally different (non-verbal) motion representations (in line with views put forward for grammar specifically, Lucy, 1992;Lucy, 1997).
There is a possibility that the activation of languagedependent motion representations resulted in the retrieval of linguistic forms (labels) encoding specific motion elements, i.e., the endpoint-and/or trajectory-shapes, in   German but also English participants.The low degree of variability in shape-types, as well as the one second time window of animation viewing, would leave room for lexical retrieval of labels (e.g., 'square', 'hexagon'; 'straight', 'curved').Such a verbal strategy could have been employed either automatically, or strategically, to aid task fulfillment: Memorizing labels for the endpoints in the animations, for example, would allow for efficient and fast comparison with the objects in the pictures and indeed, effects of labels on perception and categorization of objects have been reported previously (Lupyan & Casasanto, 2014;Lupyan & Ward, 2013).Also, studies in the field of perception show that lexical information can be retrieved very rapidly and can affect perceptual processing (review in Herrmann, Fründ, & Lenz, 2010).The observation that this process may take place specifically in relation to endpoints in German participants, but that English participants make no difference between endpoints and trajectories is thus crucial.Indeed, if the participants resorted to labelling trajectory and/or endpoint, such implicit verbalisation would still be oriented by the grammatical characteristics of the native language.An interesting follow-up study could make use of motion stimuli consisting of unusual shapes, which cannot readily be labelled in the specific languages tested, thus blocking the fast retrieval of labels from the mental lexicon; this would allow a clearer view on the potential role of label feedback.Another line of follow-up research could aim at conducting the same experiment, this time using realistic motion event stimuli (pictures of events in the real world), carefully controlling for potential EEG artefacts.Nevertheless, the present absence of language effects in the behavioural variant of the task is strongly in support of the idea that our ERP results involve non-verbal motion event cognition, with no room for online verbal encoding of the motion event.Moreover, our neurophysiological evidence related to attention processing does not cover the time span nor the polarity of components typically associated with semantic processing (N400 time window, cf.Kutas & Federmeier, 2011).

Conclusions
All in all, we report electrophysiological evidence that participants attended to motion non-verbally in a way that reflects their habitual encoding of motion events for verbalization, thus providing evidence for language modulation of non-verbal perceptual processes, and extending the case made for objects and labels (for colour and object terminology, Boutonnet et al., 2013;Thierry et al., 2009) to grammatical aspect and sentence-level information, i.e., the structural properties of language.Viewing an object has been shown to immediately activate a linguistic label; here, the viewing of a motion event rapidly activated a representation which involved goal-orientation in German ('where to?') and which appeared to be less holistic and more 'immediate' in English ('what's going on now?').In general, the present study is in line with work showing how semantic knowledge can be accessed fairly rapidly during time spans associated with attention processing (see, for effects of semantic coherence on scene perception reflected in evoked gamma band responses, Oppermann, Hassler, Jescheniak, & Gruber, 2012;review in Herrmann et al., 2010), especially when the system is biased towards specific knowledge types, given priors and prior expectations (see for perspectives from neuroimaging, e.g., Kok, Brouwer, van Gerven, & de Lange, 2013;Rahnev, Lau, & de Lange, 2011).Functional imaging research (Kok et al., 2013), for example, showed that participants' prior knowledge regarding the characteristics of an upcoming stimulus resulted, upon viewing, in the immediate integration of this prior information with the available sensory input in the visual cortex of the brain, rather than in other areas that are associated with perceptual decision-making, activated later down the processing stream.The fact that we find a highly specific attentional bias (i.e., only for endpoints in the German speakers) suggests a role for a 'long-term' prior, i.e., a long-term memory representation for the domain of events associated with and entrained by language.Our evidence suggests that such long-term prior knowledge may also be integrated fairly rapidly into the perceptual processing stream.For the first time, we show relativity effects related to complex linguistic knowledge, in the domain of grammar and its perspectivizing function for motion events, a well-researched area for the investigation of cross-linguistic differences.

Fig. 1 .
Fig. 1.Example of animated prime and target picture used in the experiment (trial depicts a match condition -response required).Cartoon provides a representation of the animated sequence.

Fig. 3 .
Fig. 3. P3 difference waves, obtained by subtracting amplitudes in the mismatch condition from those in the full match condition (linear derivation of electrodes Cz, CP1/CP3, CP2/CP4, P3, Pz, P4) in the English and the German groups.

Fig. 4 .
Fig. 4. Top panel (German); Bottom panel (English): (Top) ERP peak of the P3 component (linear derivation of electrodes Cz, CP1/CP3, CP2/CP4, P3, Pz, P4) in the English and the German groups.(Bottom) p-values of millisecond by millisecond t-tests, comparing amplitudes of trajectory match and endpoint match conditions for the entire segment (with the current degrees of freedom, a t-statistic of 2.093 was considered statistically reliable at p = .05).

Table 1
Accuracy (average proportion of correct responses for each condition, SD in brackets).

Table 3
Mean reaction times for correct motion mismatch trials ('no' responses for mismatch, endpoint match and trajectory match conditions) (ms), mean (SD).

Table 4
False alarm rates (average proportion of yes responses for mismatch and partial match conditions, SD in brackets).