Concurrent use of animacy and event-knowledge during comprehension: Evidence from event-related potentials

In two ERP experiments, we investigated whether readers prioritize animacy over real-world event-knowledge during sentence comprehension. We used the paradigm of Paczynski and Kuperberg (2012)


Introduction
People can process different forms of semantic information during sentence comprehension in an incremental and sometimes predictive fashion (e.g., Altmann and Mirković, 2009). There is evidence that one particular form of semantic information, namely animacy, plays an important role in incremental processing. ERP results suggest that comprehenders rapidly assess the animacy of incoming arguments and its compatibility with selectional restrictions imposed by preceding verbs (Nieuwland et al., 2013;Nieuwland & Van Berkum, 2005Paczynski and Kuperberg, 2012;Szewczyk and Schriefers, 2013). However, it is an open question whether or not animacy selection restrictions are prioritizedthat is, whether they are processed prior to and independently of other forms of semantic information, such as general real-world event-knowledge that allows people to evaluate sentence plausibility (e.g., De Swart and Van Bergen, 2019;McRae and Matsuki, 2009;Milburn et al., 2016;Nieuwland and Van Berkum, 2006;Paczynski and Kuperberg, 2012).
To investigate this question, we used the paradigm developed by Paczynski and Kuperberg (2012;henceforth, PK12), who investigated whether animacy-selection restrictions are prioritized over real-world event-knowledge. Their approach was to test whether a word's semantic relatedness to sentence context has a different impact on how people process violations of animacy compared to violations of real-world event-knowledge. Participants read passive sentences with post-verbal arguments that rendered the sentences plausible or implausible (Table 1), when asked to evaluate sentence plausibility as in PK12 (Experiment 1) or when not (Experiment 2). In the following sections, we briefly review relevant research on semantic aspects of sentence comprehension and then outline our predictions for the current study.
Research in psycholinguistics has shed light onto how semantic information is used during sentence comprehension (e.g., Marslen-Wilson et al., 1988;Marslen-Wilson and Tyler, 1980;Pickering and Traxler, 1998). It has been suggested, for example, that sentence comprehension occurs in an incremental fashion, in that the meaning of each new word is integrated into the unfolding meaning of the sentence as soon as it is encountered (e.g., Marslen-Wilson, 1973). Incremental processing enables comprehenders to rapidly realize when a word is semantically incompatible with prior context, a situation that leads to a processing cost relative to compatible words. Previous research has tried to elucidate whether certain types of semantic incompatibility between a word and its sentence context incur greater processing costs, or are more rapidly detected than others (e.g., Marslen-Wilson et al., 1988;Milburn et al., 2016;Staub et al., 2007;Warren and McConnell, 2007). Here, we consider the role of animacy constraints and event-knowledge and how semantic relatedness interacts with both forms of semantic information. Our introduction focuses on studies that used Event-Related Potentials (ERPs), the technique employed by PK12 and by us.

Effects of animacy
Most ERP research on animacy processing has used the N400 (Kutas and Hillyard, 1980), an ERP component strongly associated with semantic retrieval and integration processes (Kutas and Federmeier, 2011;Van Berkum, 2009). For example, words that meet the animacy requirements or preferences imposed by the preceding context elicit smaller N400 amplitudes than words that do not, suggesting a processing advantage for words that meet animacy restrictions or preferences (e.g., Nieuwland et al., 2013;Nieuwland and Van Berkum, 2005;Paczynski and Kuperberg, 2012). Sentence-initial animate nouns elicit smaller N400s than appropriately matched inanimate nouns (Weckerly and Kutas, 1999), a finding that is consistent with the tendency of English speakers to start sentences with animate entities (e.g., Branigan et al., 2008; see also Clark, 1965), and which therefore leads to a preference for sentence-initial animate nouns in listeners and readers. Similarly, Paczynski and Kuperberg (2011, Experiment 1) reported reduced N400s for inanimate patients (e.g., At the homestead the farmer plowed the meadow …) compared to equally plausible animate patients (e.g., At the homestead the farmer penalized the labourer …), consistent with the preference for patient roles to be filled by inanimate entities rather than animate entities.
In a study on Polish sentence comprehension, Szewczyk and Schriefers (2011) explored whether animacy violations are processed differently from other semantic violations. In Polish, masculine nouns are marked for animacy through inflectional morphology. They compared ERPs elicited by words that violated both animacy and semantic constraints of the context ('knitted an employee') or by words that violated only semantic constraints ('knitted a medicine'), versus congruent control words ('knitted a scarf'). The authors reported similar N400 effects to both types of violation, but larger P600 effects for the animacy violations. They concluded that animacy information is processed differently from other forms of semantic information.
In a later study, Szewczyk and Schriefers (2013) concluded that people anticipate the animacy of upcoming words in Polish. They used discourse contexts that were more predictive of either an animate or an inanimate direct object noun in the story-final sentence. In this sentence, pre-nominal adjectives (which in Polish are marked for animacy) that matched the presumed animacy of the upcoming direct object noun elicited smaller N400s than adjectives that did not match the presumed animacy. This is suggestive of animacy prediction, with the results being similar to those in studies investigating other forms of prediction such as gender (e.g., Fleur et al., 2020;Nieuwland, Arkhipova & Rodríguez-Gómez, 2020a;Van Berkum et al., 2005;Wicha et al., 2004; for a review, see Pickering and Gambi, 2018). However, because animacy information is available through the adjective marking, these findings may merely reflect the incremental use of animacy information as it becomes available, rather than actual pre-activation of animacy features.
These findings highlight the prominent role of animacy information in sentence processing. However, the extent to which animacy is Table 1 Summary of the sentence types and sample sentence, adapted from Paczynski and Kuperberg (2012).

Sentence Type Description of critical word (CW) Example
The prescription for the mental disorder was written by the … on paper.
(1) Control Semantically related to the sentence context, meets expectations about the most likely agent to perform the action described by the verb (real-world knowledge), rendering the sentence plausible.
psychiatrist Violation sentences: (2) Animate-Related Semantically related, meets the animacy-selection restriction of the verb, but violates real-world knowledge expectations, thus rendering the sentence implausible.

schizophrenic
(3) Animate-Unrelated Semantically unrelated, meets the animacy-selection restriction of the verb, but violates real-world knowledge expectations, thus rendering the sentence implausible.

guard (4) Inanimate-Related
Semantically related, violates the animacy-selection restriction of the verb, thus rendering the sentence not only implausible but also impossible.

pill (5) Inanimate-Unrelated
Semantically unrelated, violates the animacy-selection restriction of the verb, thus rendering the sentence not only implausible but also impossible.
fence Note: We use these condition names because they refer to clear properties of the critical words and they are shorter and simpler than the names used in PK12. Our conditions (1)-(5) correspond to PK12's conditions (1) Control (2) Related Real-World Knowledge Violations (3) Unrelated Real-World Knowledge Violations (4) Related Animacy Selection Restriction Violations (5) Unrelated Animacy Selection Restriction Violations prioritized over other forms of semantic information remains unclear, particularly when it is difficult to disentangle whether these effects could reflect processing of (im)plausibility (see also Milburn et al., 2016). Here, like PK12, we address this issue by investigating whether semantic relatedness has the same impact on how people process violations of animacy and those of real-world event-knowledge.

Semantic relatedness
Words that are semantically appropriate in their context are facilitated during sentence processing (for a review see Schumacher, 2012; for behavioral evidence see e.g., Ehrlich and Rayner, 1981;Schwanenflugel and Shoben, 1985). In ERP studies, implausible words that are related to the context elicit smaller N400s than unrelated, equally implausible words (Kutas and Federmeier, 2011). An early demonstration of this 'related anomaly' effect was reported by Federmeier and Kutas (1999). Their participants read a context that presumably led to an expectation for a certain word (e.g., At the dinner party, I wondered why my mother wasn't eating her soup. Then I noticed that she didn't have a …) followed by the predictable word (spoon), an implausible word from the same semantic category (knife) that shared semantic features with the predictable word, or an implausible word from a different semantic category (bowl). Although both implausible words elicited a larger N400 effect relative to the predictable word, the N400 effect was reduced for the words that shared more features with the predictable word, particularly in highly constrained contexts that led to strong expectations for a particular word. Metusalem et al. (2012, Experiment 1) showed that such effects do not hinge on semantic category information but can arise from general, event-based relatedness. They presented participants with passages such as A huge blizzard ripped through town last night. My kids ended up getting the day off from school. They spent the whole day outside building a big … which then continued with a highly predicted word (snowman), an implausible word that was related to the described event (jacket), or an implausible word that was unrelated to the described event (towel). Again, event-related anomalous nouns elicited reduced N400s compared to unrelated anomalous nouns (see also Nieuwland, 2015).
These studies show that semantic processes initiated by an incoming word are affected by the semantic relationship between that word and the context (or words in the context; for discussion see Otten and Van Berkum, 2008). Such processing costs therefore do not always reflect the ease of integration of a word's meaning into the context (in which case those costs would presumably merely reflect plausibility, i.e., whether a word renders a sentence thus far plausible; see (Nieuwland et al., 2020b). The semantic processing costs associated with a word can also depend on the extent to which its semantic features (e.g., animacy) are pre-activated by the event described in the context (see also Ito et al., 2016). The related-anomaly paradigm, i.e., comparing processing costs of anomalous words that have or do not have a strong semantic relationship with the context, therefore offered a suitable approach for PK12 to investigate whether animacy information is prioritized over real-world event-knowledge. They investigated whether the related-anomaly effect differs for violations of animacy and of event-knowledge. Paczynski and Kuperberg (2012) As we outline below, we used the paradigm used by PK12 to investigate whether animacy information is prioritized over real-world knowledge information. PK12 examined whether semantic relatedness facilitates the semantic processing of animacy violations in the same way as real-world event-knowledge violations. To do this they recorded ERPs while participants read sentences such as (1)-(5) in Table 1. The authors reported three main findings. Most crucially, they found N400 attenuation for related words compared to unrelated words, but only when words met the animacy requirements of the preceding verb (2 compared to 3, see Table 1), not for inanimate words that did not meet the animacy requirements of that verb (4 compared to 5). Secondly, they found no N400 effect for animate-related words compared to control words, despite the differences in plausibility between these conditions. Thirdly, they observed enhanced P600s for inanimate words (4 and 5) compared to control words (1) and animate words (i.e., a P600 effect of animacy), which they took as evidence that animacy violations incurred continued processing problems because they render a sentence impossible, not just improbable.

Evidence for priority of animacy information reported by
Based on these results, PK12 argued that animacy information is used before semantic relatedness comes into play, and that relatedness facilitates further processing only if the animacy constraint is met. We call this the animacy-first hypothesis, in which animacy is used as an early filter on further processing. PK12 considered two accounts for their findings. In the first, prediction-based account, readers use animacy restrictions along with context-information to pre-activate a subset of animate verb-argument candidates. The argument is that animacy-based predictions are supposedly easier than predictions from eventknowledge because animacy is a binary variable, and therefore have an earlier impact on processing. On this account, inanimate arguments were not pre-activated and therefore no attenuation by semantic relatedness occurred when these arguments were encountered (4 compared to 5).
In their second, integration-based account, animacy is prioritized over real-world event-knowledge during semantic integration. Our interpretation of this account is that comprehenders construct a compositional interpretation of a verb and an argument only if the verb's animacy requirements are compatible with the animacy of the argument. They therefore combine a verb that requires an animate agent with an animate agent, but not with an inanimate agent. If the agent is animate, they are then affected by the extent to which the actual agent is semantically related to plausible agents. In comprehending the sentences in Table 1, comprehenders would first determine whether the agent was animate or not. If not (i.e., sentences 4 and 5), they would not integrate the agent into the unfolding semantic representation for the sentence, and so relatedness would have no effect (thus explaining the lack of an N400 attenuation for 4 compared to 5). If it was animate (i.e., in sentences 1-3), they would integrate the agent into the sentential representation, and so relatedness would have an effect (thus explaining the attenuation for 2 compared to 3).
PK12's results therefore appear to support a functional distinction between violations of animacy selection restrictions (inanimate versus animate words) and those of real-world event-knowledge (unrelated versus related words). In addition, their results suggest that animacy information is used either to generate expectations about upcoming nouns, in a more constraining way than real-world knowledge, or as an initial selection restriction filter during sentence processing.
At face value, however, PK12's results appear inconsistent with other related anomaly effects. For example, in Van Berkum (2005, 2006), animacy violations did not elicit an enhanced N400 compared to correct words when the animacy violations were strongly related to the context. Such results are consistent with a fully incremental, constraint-based account (e.g., MacDonald et al., 1994;Marslen-Wilson et al., 1988;Trueswell et al., 1994; for a review, see MacDonald and Seidenberg, 2006). In such an account, animacy violations are more disruptive simply because they are less plausible than violations of real-world event-knowledge, but animacy information and real-world event-knowledge become available together. In such an account, event-relatedness has an impact regardless of animacy. For example, the sentence "The prescription for the mental disorder was written by the …" facilitates processing of related animate nouns such as 'doctor' or 'patient', but may also facilitate related inanimate nouns such as 'pill' or 'medicine', because their concepts are implicitly part of the described event (e.g., Ferretti et al., 2007;Gerrig and McKoon, 1998;Gerrig & O'Brien, 2005;Metusalem et al., 2012;Nieuwland, 2015). Also relevant, in the study by Metusalem et al. (2012), about a third of the related anomalies did not match the animacy constraints of the context. Animacy was not analyzed, therefore we do not know the contributions from different items, but the fact that they observed a related anomaly effect at all could be telling, because a substantial portion of animacy violation items would have made it rather hard to observe a related anomaly effect if those items did not contribute to the effect.
We also wish to highlight one important result of PK12 that appears to have been overlooked by the authors and that is inconsistent with their conclusions. The observed interaction between animacy and relatedness was entirely driven by differences between the related words (see Figs. 2 and 3 in PK12). In their follow-up tests after observing an interaction effect, PK12 only examined the effect of relatedness for animate and inanimate words separately, but they did not test the effect of animacy separately for related and unrelated words. If they had, they would have found that there was no animacy effect for unrelated words (the bar graphs show roughly identical estimates for animate-unrelated and inanimate-unrelated words). This result directly contradicts their conclusion that animacy selection restriction violations evoked an N400 effect. More generally, this result is inconsistent with a prioritized role of animacy, because under an animacy-first account animacy must have an initial effect on processing regardless of relatedness.
In sum, PK12's conclusions are inconsistent both with their own data and with other studies in the literature. Because no other studies to date have addressed the same issues as PK12, and because it remains unclear whether animacy and real-world event-knowledge are indeed processed independently (see also Milburn et al., 2016), we conducted a replication of PK12. If animacy information is indeed prioritized, as suggested by PK12 in their animacy-first hypothesis, our study might strengthen the evidence for a functional distinction between violations of animacy selection restrictions and violations of real-world event-knowledge. But on a more interactive, constraint-based view on incremental semantic processing (e.g., Filik and Leuthold, 2008;Marslen-Wilson et al., 1988;Matsuki et al., 2011;Metusalem et al., 2012;Van Berkum, 2005, 2006), our study should find facilitation of related words regardless of animacy.

Predictions for the current study
We adopted the experimental paradigm of PK12 with some changes (see method section) to carry out a replication of their study. This allowed us to firstly assess the interplay between processing of different forms of semantic information (see Table 1) during sentence comprehension. Therefore, our predictions focused on the N400 component elicited by the critical words in the five conditions. According to the animacy-first hypothesis (suggested by PK12), we should observe an absent or smaller N400 effect of semantic relatedness for words that do not meet the animacy-restriction of the verb (inanimate agents) compared to those that do (animate agents). But according to the constraint-based hypothesis, comprehenders make use of all different sources of semantic information in a rapid and incremental fashion, irrespective of the type of information (Marslen-Wilson et al., 1988; see also Milburn et al., 2016). In that case, we expect to see the same effect of relatedness for animate and inanimate agents.
Although the focus of the present study is on the N400 component, we will also report the ERP effects in a later time window to examine possible post-N400 positive ERP effects (P600). PK12 and Szewczyk and Schriefers (2011) reported P600 effects for animacy violations but not for other violations, and concluded that these P600 effects reflected an attempt by readers to deal with a proposition that is semantically impossible (not just implausible). If that conclusion is correct, we expect to replicate the P600 effects for the inanimate conditions irrespective of semantic relatedness. 1

Participants
Forty-eight 2 English native speaking right-handed participants who had normal or corrected-to-normal vision were paid to take part. All participants gave informed consent. After excluding participants who had too many artifacts (see results section for information on rates of discarded trials), the final sample consisted of 40 participants (27 women, mean age 23.4, SD = 4.8).

Materials
The stimuli consisted of English passive sentences. Each sentence was composed of an introductory context that introduced the patient followed by a verb requiring an animate agent noun (henceforth referred to as the critical word or CW), and two subsequent words. In the control condition (1), the CWs were animate agents that were plausible and semantically related to the preceding context. In the animate-related condition (2), the CWs were implausible, semantically related, animate agents. In the animate-unrelated condition (3), the CWs were implausible, semantically unrelated, animate agents. In the inanimaterelated condition (4), the CWs were implausible, semantically related, inanimate agents. In the inanimate-unrelated condition (5), the CWs were implausible, semantically unrelated, inanimate agents. As the verb always required an animate agent, the inanimate conditions but not the animate conditions involved selection-restriction violations.
We developed our stimuli from the 120 items from PK12 together with 61 novel items. We created the novel items in order to have a large enough item set to present at least 30 trials per condition without presenting the same context sentence twice (as occurred in PK12). We reasoned that presenting each context sentence twice could cause participants to process each second context sentence and critical word in relation to the first encounter of that context sentence and critical word, possibly leading to strategic expectancies about the animacy of the upcoming CW.
We adapted the original materials and wrote the novel materials in the following ways. First, we rephrased some of the original sentences, changed both spelling and some words that are used only in American English to British English. Second, to quantify the degree of semantic relatedness between each critical word and its preceding context we followed the procedure used by PK12 to calculate Semantic Similarity Values (SSV) using Latent Semantic Analysis (LSA; http://lsa.colorado. edu ;Landauer, 1998;Landauer et al., 1998). The SSV of each sentence 1 Like PK12, we also computed ERPs for sentence-final words to investigate the downstream processing consequences of semantic anomalies. We report those findings as supplementary materials and offer a brief description in footnote 7.
2 A previous version of this manuscript had a sample size of 22 participants, of which only 20 were ultimately included in the analysis. To increase statistical power and the number of observations, we tested an additional number of participants. The pre-registration of sample increase can be found at https://osf. io/se3pc and results from the initial sample in Version 1 of this paper at https:// psyarxiv.com/2qbmp/.
was obtained by averaging the LSA values between the critical word and the content words in its preceding context (http://psych.paczynski.net/ lsa.html), but we imposed the further restriction that, for each item, neither of the unrelated conditions had a higher LSA value than either the control or the related conditions. Third, two words followed the critical noun, whereas PK12 used a varied number of sentence-final words. Fourth, we wrote sentences such that the implausible critical words could not be plausible when interpreted figuratively or could not easily form part of a plausible compound noun, and there was no sentence completion that would render the sentence plausible. All sentences, truncated after the critical word of each condition, were rated for plausibility by 6 native speakers of English, who did not take part in the EEG study, using a 1-7 plausibility Likert scale. Because we planned to carry out an additional study with the same materials on native-Spanish speakers, we also obtained plausibility judgements from 6 native-Spanish speakers to fit both groups' ratings, which were also used to create our materials (see below). Participants were shown all five critical words following each item and rated each critical word for plausibility. To balance the number of plausible sentences containing animate and inanimate nouns, we also included 30 passive filler sentences, with three implausible animate critical words and two plausible inanimate critical words per sentence. The 1-7 scale for each critical word contained a question mark as an additional option, which participants were asked to circle if they did not know the critical word. Each rating participant saw one of two different lists, which only differed in the order of appearance of the same items. Based on these plausibility pre-ratings, we either removed or rephrased (and then re-tested for plausibility) any sentence that contained words not known by nonnative speakers or sentences that did not match our plausibility expectations by either group, and thus selected a final set of 155 sentences (Table 2).
In the total set of items for the ERP experiment, 137 animate and 137 inanimate critical words appeared in two different violation conditions of different items (related in one item, unrelated in another item), but none of the critical words appeared in the same condition of different items. We used counterbalanced stimulus lists so that, in a given list, words in the control condition only appeared once, and at most 10 words in the violation conditions appeared twice in the experiment. We added 90 plausible sentence fillers (adapted from PK12) using verbs that were compatible with either animate or inanimate agents. Sixty fillers had inanimate agents and thirty had animate agents.
For the plausibility ratings analysis, a one-way ANOVA with all five sentence types showed a main effect of sentence type (F 4,770 = 1637.4, p < .001). Follow up pairwise comparisons with Bonferroni correction showed that sentences in the control condition were rated higher than all 4 violation conditions (all ps < .001). 3 A 2-way ANOVA with animacy and relatedness as within subject factors showed that inanimate nouns were rated as more implausible than animate nouns (F 1,154 = 190.6, p < .001), and unrelated nouns were rated as more implausible than related nouns (F 1,154 = 54.0, p < .001). An interaction between animacy and relatedness (F 1,154 = 5.5, p = .020) followed up with simple effects revealed that animate-related words were rated as more plausible than animate-unrelated words (F 1,154 = 31.7, p < .001), and inanimaterelated words were rated as more plausible than inanimate-unrelated words (F 1,154 = 30.8, p < .001), but that the difference was greatest for the animate nouns.

Procedure
We constructed five lists, each containing one version of each item and a similar number of items per condition. The order of the 155 experimental sentences and 90 fillers was pseudorandomized so that there were never more than two trials in a row involving the same condition and never more than four experimental or control trials in a row, while keeping a fixed trial order per list.
Participants were randomly assigned to one of the five lists. They sat in front of the computer monitor in a dimly lit room. Sentences were presented word-by-word in black font on a white-background screen and the experiment was run using E-Prime 2.0®. Each trial started with a fixation point on the center of the screen for 450 ms, followed by a 200 ms blank screen. Following the procedure of PK12, each word appeared for 450 ms in the center of the screen followed by a 100 ms blank screen. At the end of the sentence there was a 750 ms blank screen and a question mark appeared on the center of the screen. Participants had been instructed to indicate as quickly as possible whether the sentence was plausible or implausible using one of two buttons on a control pad. The next trial started after participants had given their answer. The experiment began with 7 practice trials and was split into 6 equal-length blocks with optional short breaks between blocks. The experiment lasted about 45 min, together with about 30 min of EEG preparation time. Before the EEG recording, participants completed a language background questionnaire.

Electroencephalogram (EEG) recording and data processing
The EEG was recorded at a sampling rate of 512 Hz using a BioSemi ActiveTwo system (http://www.biosemi.com) with 64 EEG electrodes in an international 10-20 electrode configuration, two additional mastoid Note. Mean values (Standard Deviations in parentheses). CW = Critical word CW length = number of letters CW Frequency given as log-transformed word frequency per million words. Pre-ratings plausibility scale: 1-7.
electrodes and four EOG electrodes (left and right horizontal cantus, and above/below the right eye), referenced to the common mode sense (CMS; active electrode) and grounded to a passive electrode. The EEG was re-referenced to the average of the left and right mastoid electrode offline, filtered (0.05-20 Hz band-width filter 4 plus 50 Hz notch filter). Data was segmented into epochs that started 100 ms before word onset, and that lasted until 1100 ms after CW onset. All epochs were corrected for ocular artifacts (Gratton and Coles correction; Gratton, Coles and Donchin, 1983), baseline-corrected using the 100 ms pre-CW time window, and then automatically screened for artifacts (allowed amplitude: minimal = − 75 μV, maximal = 75 μV) before being entered into condition-averages per participant.

ERP statistical analysis
We selected the same time windows for N400, P600, and sentencefinal effects as PK12. We compared average ERP amplitude within each window per condition and clustered groups of electrodes into Regions-of-Interest (ROIs). These ROIs were organized into four groups (see Nieuwland, 2014), comparable to the 4-column approach reported by PK12: Lateral (LAF/RAF, LLFC/RLFC, LLCP/RLCP, LPO/RPO), Medial (LMFC/RMFC, LMCP/RMCP), Midline (MAF/MFC/MCP/MPO), and additional crossline (LLC/LMC/RMC/RLC) (Fig. 1). To simplify the presentation of our results, we only report analyses for the medial ROIs here. These medial ROIs are most important for observing the N400 and P600 components of interest, and correspond to the electrode regions where the effects observed by PK12 were maximal. Like PK12, we report two sets of analyses. The first set compares each violation condition directly to the control condition, through pairwise comparisons between each violation condition and the control condition using repeated measures Analyses of Variance (ANOVA) with sentence type as a 2-level factor (Sentence Type: control condition, violation condition) and the factors 2 (Hemisphere: left, right) by 2 (Anteriority: Frontal-Central, Central-Parietal) to test for scalp distribution effects.
The second set tests for interactions between animacy and relatedness in the four violation conditions, using 2 (Animacy: animate, inanimate) by 2 (Relatedness: related, unrelated) ANOVAs along with the same distribution factors as used in the first section. Where appropriate, Greenhouse-Geisser (Greenhouse and Geisser, 1959) p-values were applied and original F values are reported here. Only statistical results with p < .1 are reported. Interactions were followed up using simple effects analyses. We resolved interactions only when the involved factors were not part of higher-order significant interactions, which were resolved step-wise. Sentence-final effects are reported in supplementary materials (see also footnote 7).

Behavioral results
2.2.1.1. Accuracy. The accuracy of the plausibility judgments is shown in Table 3. There was an overall effect of sentence type (F 4,156 = 31.1, p < .001). Post-hoc pairwise comparisons with Bonferroni correction showed that animate-related sentences (2) were rated as less implausible than the remaining three violation conditions (3-5) (all ps < .001), whereas the inanimate-unrelated sentences (5) were rated as more implausible than the three other violation conditions (2-4) (all ps < .001). There were no differences in plausibility judgments between the  Note. SD given in parentheses. Note: SD given in parentheses. 4 Upon reviewer request, we report an additional analysis with a 0.1-20 Hz filter setting in Appendix I.   Post-hoc pairwise comparisons with Bonferroni correction showed that participants responded significantly slower to animate-related sentences (2) than inanimate-unrelated sentences (5) (p < .001) and inanimate-related sentences (4) (p = .003). Responses to the inanimateunrelated sentences (5) were significantly faster than the other violation conditions (3-4) (all ps ≤ .004). The animate-unrelated sentences (3) did not significantly differ from the animate-related (2) (p = .422) or inanimate-related (4) Table 5). With regards to distributional effects, additional sentence type by anteriority interactions were found for the comparisons between control sentences and both animate conditions (Table 5)

Summary of N400 results.
Control words elicited significantly smaller N400s than all violation sentences. In the 2 × 2 analyses, animate nouns elicited smaller N400s than inanimate nouns, and likewise, semantically related nouns elicited smaller N400s than unrelated nouns. Importantly, no reliable interaction between animacy and semantic relatedness was found. μV, SD = 3.68; animate-related M = 6.29 μV, SD = 3.77, M diff = − 1.47 μV, η p 2 = 0.114). A sentence by anteriority interaction for inanimate-  (Table 6) followed up with animacy by relatedness analyses on each quadrant only confirmed main effects of animacy and relatedness at all quadrants but no animacy by relatedness interactions at any quadrant. related sentences did not show any effects of sentence type at either anterior (F < 1) or posterior regions (F 1,39 = 1.5, p = .226, η p 2 = 0.037). A three-way sentence by hemisphere by anteriority interaction was observed for the animate-unrelated and inanimate-unrelated sentences (  Table 6 shows, we found no main effects of animacy (Animates M = 5.76 μV, SD

Summary of P600 results.
In the pairwise comparisons, we observed that the animate-related condition elicited more positive ERPs than the control condition. In addition, the inanimate-unrelated condition elicited more positive ERPs than the control condition at the posterior ROIs. This result differs from that of PK12, who observed P600s to both animacy selection restriction violations compared to the control condition.
In the animacy by relatedness interaction analyses, we observed larger positivities to related than unrelated words at anterior regions. Likewise, we observed more positive ERPs to animate than inanimate words at anterior but not at posterior regions, in contrast to PK12, who observed larger P600s for animacy selection restriction than real-world knowledge violations with a posterior distribution.

Bayes factor analysis.
To quantify the evidence for the nullhypothesis that animacy and relatedness did not show an interaction pattern on the N400, we performed a JZS Bayes factor repeated measures ANOVA (JASP Team, 2017; see also Morey and Rouder, 2015;Rouder et al., 2012) with default prior scales, using the data from Experiment 1 with the same factors as in the previous analyses. This analysis revealed that a model with main effects was preferred to the model with the interaction by a Bayes factor of 4.5. The data therefore provide substantial evidence (Jeffreys, 1961) against the hypothesis that animacy and relatedness interact in the N400 window.

Discussion of experiment 1
This experiment investigated whether animacy information is prioritized or privileged during sentence comprehension, as suggested by PK12 in their animacy-first hypothesis or whether comprehenders use animacy and other sources of semantic information concurrently, as predicted by a constrained-based hypothesis.
Pairwise comparisons showed that all violation conditions (animaterelated, animate-unrelated, inanimate-related, and inanimateunrelated) elicited enhanced N400s compared to the control condition. In addition, participants showed effects of animacy and of relatedness, evidenced by smaller N400s elicited by animate nouns compared to inanimate nouns and by smaller N400s elicited by related nouns compared to unrelated nouns. Importantly, our N400 results did not reveal an interaction between animacy and semantic relatedness, in contrast with PK12, which suggests that participants recruited semantic information (both animacy and semantic relatedness) during sentence comprehension in an incremental way, that is, without giving animacy information priority. Unlike PK12, we did observe a statistically significant N400 effect for animaterelated words compared to control words. We offer a fuller discussion of these findings in the General Discussion, along with a discussion of the differences between our study and PK12 that could be relevant to understand the discrepant findings.
The main finding in the post-N400 time window was that participants showed more positive ERPs for related words than unrelated words at anterior regions. Furthermore, unlike PK12, who observed P600 effects for animacy violations, our pattern went in the opposite direction, namely larger positivities for animate words compared to inanimate words with an anterior distribution. Due to the anterior distribution of our positive ERP effects, they could be qualified as a frontal post-N400 positivity (PNP) or anterior positivity instead. Therefore, we do not think that the observed post-N400 modulations relate to the detection of propositional impossibility (in contrast to PK12). Instead, these effects may arise from the task demands in the experiment, as the P600 deflection across conditions roughly patterned with the conditiondifficulty that was evident from the behavioral responses, with greater P600 deflections for conditions with slower responses and lower accuracy. This raises the question to what extent the ERP findings result from the processes of interest (i.e., the processing of different forms of semantic knowledge associated with the experimental manipulations) or from the meta-linguistic, decision-related processes induced by the task.
To address the potential effects of task demands on the observed ERP results and the potential induced strategies during sentence evaluation, we performed a further experiment (Experiment 2). As we will outline below, recent ERP research on language comprehension has shown that task-demands can influence the results in both qualitative and quantitative ways, for example, with presence or absence of ERP effects depending on the task participants are asked to perform (e.g., Chwilla et al., 1995;Kolk et al., 2003;Nieuwland, 2014;cf. Nieuwland, 2015cf. Nieuwland, , 2016. In Experiment 2, we therefore addressed the potential impact of task demands by repeating Experiment 1 without the plausibility judgment task.

Experiment 2
In many ERP studies on language comprehension, participants are asked to perform a meta-linguistic judgment task such as judging sentence acceptability (for reviews, see e.g., Bornkessel-Schlesewsky and Schlesewsky, 2008; Kuperberg, 2007). This approach has several benefits. It enables researchers to avoid analysis of sentences that are not interpreted in the intended way (e.g., sentences in an implausible condition that are considered to be plausible). It also enables researchers to make a direct comparison between online ERP effects and the 'final interpretation' of a sentence as reflected in an overt evaluation (e.g., Nieuwland, 2016). Also, a judgment task might enhance engagement in or attention to the linguistic materials, which could boost observed effects compared to experiments without such a task.
However, the use of sentence acceptability judgment tasks also has some disadvantages. Asking participants to evaluate a sentence may alter how they understand the sentence. For example, the task may cause participants to pay more attention to specific parts of information contained in a sentence (e.g., animacy). In this respect, the task itself may introduce additional decision-based effects or strategies and thus limit on the generalizability of the results. Moreover, a particular problem arises when decision-related processes, such as those involved in taskbased evaluation or response preparation, elicit ERP effects (e.g., decision-based P300 component) that spatially and/or temporally overlap with the ERP effects elicited by the experimental manipulation (for discussion, see Nieuwland, 2019;Roehm et al., 2007). The ERPs may then not reflect the effects of interest.
One way to understand the contributions of the task is to run the same experiment without a metalinguistic task (e.g., Kolk et al., 2003;Nieuwland, 2014;2015;2016;Roehm et al., 2007). Importantly, results of previous studies using this approach suggest that the N400 and the P600 ERP component can be affected by a judgment task. A well-known phenomenon is the depth-of-processing effect on the N400 observed in word-level experiments, in which the same linguistic materials elicit larger N400 effects if the task taps into their meaning rather than superficial aspects such as whether they contain a certain letter (e.g., Brown and Hagoort, 1993;West and Holcomb, 2000). Likewise, sentence plausibility judgments may boost N400 effects as they require participants to process the meaning of the sentences more deeply, although there are several studies where no clear boost is observed (Kolk et al., 2003;Nieuwland, 2015Nieuwland, , 2016. The N400 effects that we observed for animacy and relatedness in Experiment 1 may thus be reduced in an experiment without the plausibility judgment task. P600 effects are often enhanced by a judgment task, such as a grammaticality judgment task (e.g., Osterhout and Mobley, 1995). While syntactic anomalies typically elicit P600 effects with or without a judgment task, this may not or not consistently be the case for semantic anomalies that elicit P600 effects. A few ERP studies reported that P600 effects elicited by semantic anomalies are not only boosted by an acceptability/plausibility judgment task but are not elicited in absence of such a task. For instance, Kolk et al. (2003) showed a biphasic N400-P600 effect in Dutch sentences with selection restriction violations (e.g., The tree that in the park played/stood …) with an acceptability judgment task, but only an N400 and no P600 without that task. Similarly, Schacht et al. (2014) showed absence of P600s and only an anterior negativity to Spanish semantically-violated sentences (e.g., El sentimiento peludo/profundo emociona; English: The hairy/deep feeling moves) when using a word probe task, whereas another study reported an N400-P600 effect on the same stimuli using acceptability judgements (Martin-Loeches, Nigbur, Casado, Hohlfeld and Sommer, 2006). This pattern of results led the authors to conclude that the semantic processes associated with P600 effects are not automatic, but task-or attention-driven (see also Kuperberg, 2007;Bornkessel-Schlesewsky and Schlesewsky, 2008).
At present it is unclear whether boosting of P600s effects by a task means the linguistic processes reflected in P600 are enhanced or that there is spatial-temporal overlap with decision-based positive components such as the P300 (see also Osterhout, 1999;Sassenhagen and Bornkessel-Schlesewsky, 2015;Sassenhagen et al., 2014). Regardless, based on previous results we expected that repeating Experiment 1 without the judgment task would lead to a reduction or disappearance of the observed P600 effects. In order to reduce any potential task-induced effects, we removed the plausibility sentence evaluation task and instead had participants read for comprehension.
In Experiment 2, participants did not perform an acceptability judgment task, but they answered occasional comprehension questions that were independent of the manipulations of interest. These questions were included to maintain participants' attention to the sentences. We hypothesized that we would observe smaller N400 effects than we observed in Experiment 1, and smaller or even absent P600 effects. An important caveat, however, is that in contrast to Experiment 1 in which participants evaluated each sentence for plausibility, trials in this experiment were not selected based on whether a participant found the sentence plausible or implausible, diluting the basic effects of plausibility. Therefore smaller effects are to be expected simply because the analysis includes implausible sentences which -had there been a taskparticipants may judge to be acceptable (and vice versa for plausible sentences).

Participants
Twenty-one native speakers of English who had normal or correctedto-normal vision took part in this study after signing informed consent. Participants received monetary compensation for their participation and they did not take part in the study described in Experiment 1. After removing participants who had too many artifacts or incomplete data (see results section, for details), the final sample consisted of 17 participants (12 women, mean age 21.9, SD = 5.2).

Materials and procedure
The sentence stimuli and experimental procedures were identical to those in Experiment 1, with the important exception of the task instruction. Instead of evaluating sentence plausibility, participants answered 80 yes/no comprehension questions. These questions were only added to ensure attention throughout the experiment, and probed knowledge about the sentences that was independent of sentence plausibility (e.g., On the lake, the boat race was won by the (1) rower, (2) sunbather, (3) sheriff, (4) paddle, (5) revolver …. Question: Was the race at the lake?) Of these questions, 50 were for experimental items and 30 for fillers, with an even distribution of yes/no correct answers. Each trial began immediately after participants pressed any key on the keypad on seeing the fixation point at beginning of each sentence.

Electroencephalogram (EEG) recording and data processing
The EEG recording parameters and data processing procedures were identical to those in Experiment 1. Based on the cut-off of 16 artifact-free CW epochs per condition, we excluded data from 3 participants due to excessive artifacts, and data from 1 participant due to incomplete data.

ERP statistical analysis
Electrode-clustering and statistical analysis was identical to those reported in Experiment 1. Analyses were performed in 2 time windows: 300-500 ms and 700-900 ms relative to onset of the critical word. Additional analyses at the 300-500 ms relative to onset of the sentencefinal word are reported in supplementary materials (see also footnote 6). As in Experiment 1, the first section compares each violation condition directly to the control condition, through pairwise comparisons, and the second section tests the effects of animacy and relatedness in the four violation conditions.

Behavioural results
Mean accuracy percentage of condition-congruent responses to the yes/no comprehension questions was 96 (SD = 3.8) for Experimental items, and 95 (SD = 4.7) for Filler items.

ERP results
Visual inspection of the data indicates that the control condition elicited smaller N400s in the 300-500 ms time window than implausible conditions (see Fig. 3). These N400 effects were widely distributed and visible at most channels. For most conditions, the observed N400 effects extended, at least visually, beyond 500 ms and even into the later 700-900 ms analysis window (see footnote 6).  (Table 7). An interaction between sentence type and hemisphere for inanimate-related words showed smaller N400s for control words than inanimate-related words at both hemispheres, but the largest difference was observed at the right hemisphere (Left:  Table 8).

N400 results
No main effect of relatedness or animacy by relatedness interactions were found. 3.2.2.1.3. Summary of N400 results 6 . Results at this time window showed smaller N400s to the control condition compared to the inanimate-related and inanimate-unrelated conditions. In the animacy by relatedness analyses, animate nouns elicited smaller N400s than inanimate nouns, but no effects of relatedness or animacy by relatedness interactions were obtained.

Fig. 3.
ERPs elicited by critical words in Experiment 2 (no plausibility judgement task). Topographical voltage maps show the difference with the control condition (i.e., anomaly minus control).

P600 results.
In the pairwise comparisons, no sentence effects were observed. No effects of or interactions between animacy and relatedness were observed either (Table 8).

Discussion of experiment 2
We repeated Experiment 1 with only one change in the procedure, namely that participants did not evaluate sentence plausibility, but rather, answered occasional yes/no comprehension questions. We observed reliable effects of animacy, but not of relatedness. Inanimate nouns elicited N400 effects compared to the control nouns and compared to animate nouns, whereas N400s for the animate nouns did not differ from the control nouns. The N400 effects of animacy extended beyond the 300-500 ms time window. We did not observe reliable effects of semantic relatedness or interactions between animacy and relatedness.

General discussion
In two ERP experiments, we investigated whether animacy information is prioritized over real-world event-knowledge during sentence comprehension. We used the paradigm employed by Paczynski and Kuperberg (2012, PK12), who concluded that animacy information is prioritized (i.e., used as a filter on further semantic processing), such that retrieval of lexical-semantic information proceeds differently for words that do or do not violate animacy selection restrictions. Participants read passive sentences in English, with plausible (control) agents (e.g., The prescription for the mental disorder was written by the psychiatrist), or implausible agents that were either animate or inanimate and either semantically related or unrelated to the sentential context (schizophrenic/guard/pill/fence). We also examined the role of taskdemands by performing the same experiment twice, once with the plausibility judgment task used by PK12 (Experiment 1) and once without (Experiment 2).
Before discussing our own results, we briefly summarize the three main results from PK12 to remind the reader of their evidence in support of the animate-first account. (1) The crucial finding was an interaction wherein related words elicited smaller N400s than unrelated words when these words were animate, but not when they were inanimate. (2) Animate-related words did not elicit a reliable N400 effect compared to control words. (3) Inanimate words elicited a P600 effect compared to control words, whereas animate words did not.
In our own two experiments, we found a very different pattern of results from PK12 on each of these three issues, and our results yielded no such evidence for the priority of animacy. We will discuss our results in comparison to the three core findings of PK12 listed above.

N400 results
Most importantly, in neither experiment did we observe an interaction between animacy and semantic relatedness on N400 activity. In Experiment 1, we observed two main effects on the N400: Animate nouns elicited smaller N400s than inanimate nouns, and related nouns elicited smaller N400s than unrelated nouns. The effect of animacy was stronger than that of relatedness. In Experiment 2, we observed only the main effect of animacy. These results are not compatible with the animacy-first account proposed by PK12, but lend support to an interactive, constraint-based account in which animacy influences processing at the same stage as real-world knowledge (MacDonald et al., 1994;Marslen-Wilson et al., 1988;Trueswell et al., 1994). In addition, these patterns could reflect the facilitation of semantic retrieval, or easier integration of the animate nouns and related nouns compared to inanimate nouns and unrelated nouns. These results are consistent with those of other studies that reported reduced N400s for words that match the sentence context in terms of animacy (e.g., Nieuwland and Van Berkum, 2006;Szewczyk and Schriefers, 2013), or for words that are semantically related to the discourse context compared to unrelated words (e.g., Ito et al., 2016;Ito et al., 2017;Metusalem et al., 2012;Nieuwland, 2015).
One way to interpret our results is in terms of semantic prediction (e. g., Federmeier and Kutas, 1999;Ito et al., 2016). Coarse-grained semantic predictions work through the activation of semantic information that is associated with the described event, which facilitates semantic retrieval processes elicited by the noun (e.g., Van Petten and Luka, 2012). Semantic prediction is thought to underlie the 'related-anomaly effect', the finding that implausible words that are related to the described event of the predicted word lead to smaller N400s than unrelated words (e.g., Ito    . The selection restrictions available in each sentence context in our experiment could lead to pre-activation of animacy, whereas the described event as a whole may pre-activate general, event-related information. For example, after reading "the prescription was written by the", people may expect the next upcoming word to be animate (e.g., Szewczyk and Schriefers, 2013). At the same time, activation of semantically related concepts arises from event-knowledge, for example the activation of 'pill' due to knowledge about prescription medicine, even if this word constitutes an anomalous continuation at that point in the sentence due to its animacy. Thus, both sources of information may influence knowledge activation simultaneously, and ultimately have the same facilitatory effect on the semantic retrieval processes elicited by the noun. An alternative explanation is that the results reflect ease of integration (e.g., Brown and Hagoort, 1993; for discussion of ease of integration accounts, see Ito et al., 2016;Nieuwland et al., 2020b;Pickering and Gambi, 2018). The facilitation of animate words and semantically related words, reflected in the reduced N400s, could occur primarily because these words are more plausible sentence continuations. Our data are compatible with this account, because the pattern of N400 amplitude per condition (inanimate-unrelated > inanimate-related > animate-unrelated > animate-related > plausible control) follows the pattern of the plausibility judgments (see Tables 2 and 3), with larger N400s found for increasingly implausible conditions (albeit only when participants performed the plausibility judgment task in Experiment 1). However, the observed relationship between N400 activity and plausibility does not appear to be a linear function, because big differences in plausibility between the control words and the animate-related words elicited only small N400 differences, while much smaller differences in plausibility (e.g., between the inanimate-unrelated words and the animate-related words) elicited larger N400 differences.
It is of course possible that the semantic prediction account and the integration account are both partly right (for discussion, see Van Berkum, 2009;Baggio and Hagoort, 2011;Lau et al., 2016;Nieuwland et al., 2020b). There are various demonstrations of semantic prediction that are hard to explain in terms of integration (e.g., Federmeier and Kutas, 1999;Ito et al., 2016, but see also Pickering and Gambi, 2018), and demonstrations of integration that are not straightforwardly explained in terms of prediction alone (e.g., Fleur et al., 2020;Nieuwland et al., 2020b;Wang et al., 2011). But prediction and integration are not mutually exclusive, and the effects of semantic prediction and those of integration may both impact N400 activity in a cascading manner (e. g., Nieuwland et al., 2020b). Crucially, regardless of whether our results reflect effects of prediction and/or integration, we did not obtain evidence that animacy information is used as a filter or impacts processing before other real-world knowledge does.
Animacy did generate larger effects than other factors: animacy violations had a larger effect on N400s than real-world knowledge violations and animacy had a larger effect on N400s than semantic relatedness. Animacy violations elicited strong effects in both experiments, whereas real-world knowledge violations elicited a relatively small effect in Experiment 1 and no clear effect in Experiment 2. At this point we cannot be certain why animacy had the strongest effect, and we note that other studies have not found such effects (Szewczyk and Schriefers, 2011;Zhang et al., 2012). It could matter that animacy violations in our study were also more implausible violations than real-world event-knowledge violations, and that the impact of animacy on plausibility was greater than that of relatedness. In this regard, animacy violations could be considered 'special' compared to other real-world knowledge violations (for discussion, see also Hung and Schumacher, 2014;Szewczyk and Schriefers, 2011). However, it is also possible that our participants simply paid extra attention to animacy, it being a salient, binary feature that occurred frequently in the experiment. The crucial point, however, is that a larger effect size in and of itself is not evidence for priority of animacy information.
Finally, we provide a note on the specific N400 effect of animate-related words compared to control words. This effect was smaller compared to effects of other violations but nevertheless statistically significant. We can only speculate why PK12 observed little evidence for such an effect. Many studies have reported no N400 effect for relatedimplausible or related-anomalous words (Nieuwland and Van Berkum, 2005;Van Herten, Chwilla and Kolk, 2006; for reviews, see Bornkessel-Schlesewsky and Schlesewsky, 2008;Brouwer et al., 2012;Kuperberg, 2007), but in all these cases these words elicited a P600 effect instead (the semantic P600). PK12 did not observe differential ERP activity between control words and animate-related words until the sentence-final word, and this appears to be a rather unique finding within the broader literature perhaps that their participants simply did not consistently notice anything problematic about the animate-related words until later in the sentence. 7 This may also have been the case in our Experiment 2, perhaps because participants paid less attention to sentence meaning when they did not have to judge the plausibility of each sentence. We should also highlight that participants in PK12 always saw two conditions of each trial but never saw the control sentence first. This could have caused them to anticipate an animate and related plausible word when reading the same sentence context for the second time. This may have made it more likely that they temporarily mistook the animate-related word for a plausible word. For understanding the PK12 results, this potential confound inherent to the experimental design is relevant because the reported interaction between animacy and relatedness was for a large part driven by the reduced N400 to the animaterelated condition. In our experiment, participants did not see two conditions of each item, and therefore could not have formed such a strategy based on the repetition of a sentence context.

P600 results
The third relevant pattern in our results was that, in Experiment 1, animacy violations did not elicit a P600 effect, but instead we observed enhanced anterior positivities for related compared to unrelated words and for animate compared to inanimate words. These results therefore also do not replicate the PK12 findings.
Based on their findings and previous reports (e.g., Paczynski and Kuperberg, 2011), PK12 argued that their animacy P600 effect constituted a 'semantic P600' effect (e.g., Kim and Osterhout, 2005), thought to reflect continued analysis or reanalysis that is triggered by a conflict between an expected representation and the detection of an impossible proposition. However, they could not rule out the possibility that the observed effects were due to the plausibility judgment task. While some studies found semantic P600 effects in absence of such a task (Van De Meerendonk et al., 2010;Nieuwland and Van Berkum, 2005), other studies found that selection restriction anomalies involving animacy only elicited P600 effects when participants performed a plausibility judgment task, but not when they only passively read the sentences (Kolk et al., 2003;Schacht et al., 2014).
Our own results suggest that the task played a role in the observed patterns. The P600 effects in Experiment 1 roughly patterned with task difficulty. The animate-related condition elicited the largest P600 and 7 Like PK12, we analyzed the ERP responses to sentence-final words; those results are available online (https://osf.io/se3pc). PK12 reported that all violations elicited an enhanced N400 to the control condition and with no differences between the violation conditions. From this they argued that all violation conditions ultimately evoked similar downstream processing consequences. Our own results for Experiment 1 showed extended N400-like activity for all violation conditions compared to control (all ps < .001). In Experiment 2, only inanimate-unrelated words elicited significantly extended N400-like activity compared to control words (p = .029). Sentence-final figures provided in our online supplementary materials also show the different observed patterns with and without task in our two experiments.
was also the most difficult, as evident from the response accuracy and response time measurements. This condition may be more difficult precisely because these words are both animate and related to the context. Moreover, in Experiment 2, where there was no plausibility judgment task, we did not observe any P600 patterns, and, if anything, animacy violations elicited extended N400 activity. In this experiment, the ERP waveforms across all conditions were very different, namely less positive going, from those obtained in Experiment 1.
We think that the P600 effects in our Experiment 1, and possibly those in PK12, reflect the concomitant ERP effects of the task (i.e., decision-based effects). We note that sensitivity to task-demands of the P600 effects does not necessarily contradict the interpretation of PK12 that the P600 activity reflected detection of propositional impossibility. In their study, participants may have considered inanimate words as especially relevant to the task, despite the balanced design with filler sentences so that inanimate nouns were encountered equally often in plausible and implausible sentences. However, we think that these differences between studies and experiments serve as a warning about always or only using judgment tasks in such studies. Judgment tasks indeed sometimes boost engagement with the linguistic materials, yet they can change the comprehension processes of interest, and they can elicit ERP components that would otherwise not be elicited, potentially masking N400 activity, because of spatio-temporal component overlap.
Additional analyses comparing the patterns observed in Experiment 1 and 2 by the addition of the between-subject 2-level factor experiment type (task, no task) can be found in our supplementary materials (https://osf.io/se3pc). Here we provide a brief discussion of the most salient patterns. The N400 effects of animacy did not differ when participants performed an acceptability judgment task from when they did not. However, the task did affect semantic relatedness, because we observed an effect of semantic relatedness in Experiment 1 but not in Experiment 2. One possible conclusion is that participants are sensitive to semantic relatedness only when performing an acceptability judgment task. However, the effect of relatedness in Experiment 1 was not as strong as the effect of animacy, which would make it harder to detect without the task, because the task allowed us to select only the trials where participants' responses matched to the expected plausibility ratings of the experimental conditions. We know from various other studies that semantically related words do lead to reduced N400 effects in the absence of such a task, although these studies typically present constrained sentences that led to an expectation of a certain word (e.g., Metusalem et al., 2012;Nieuwland, 2016;Ito et al., 2016). However, it is also entirely possible that the absence of a semantic relatedness effect in this experiment was not detected because of low statistical power, given the small sample size in Experiment 2.
Another salient difference between the experiments was in the post-N400 time window. While all conditions in Experiment 1 elicited strongly positive-going ERPs in this window, the N400 effects in Experiment 2 extended beyond the typical N400 time window, leading to a sustained negativity and absence of clear post-N400 positive deflections. In Experiment 2, we thus did not observe the anterior positivities for related compared to unrelated conditions and for animate compared to inanimate conditions seen in Experiment 1. Based on such results, we conclude that the post-N400 activity in Experiment 1 was likely a result of the acceptability task (see also Kolk et al., 2003).

No evidence for priority of animacy during sentence comprehension
Our results only partially replicate the findings observed by PK12. In the N400 window, we found strong effects of animacy in Experiments 1 and 2, and a relatively small effect of relatedness in Experiment 1. But we did not observe the interaction pattern between animacy and relatedness found by PK12. Although the lack of an interaction effect is a null result, we note that this pattern arose from observing effects where PK12 did not: a small relatedness effect for inanimate words, and an animacy effect for unrelated words. Moreover, we also observed an N400 difference between animate-related words and control words where PK12 did not.
At the P600 window, in Experiment 1 only we found enhanced anterior positivities for animates compared to inanimates and for related compared to unrelated words, but not a posterior P600 animacy effect. We used the exact same design as PK12 and used their materials, but we adapted and expanded the original set of materials, and our participants never saw two conditions of the same item, unlike in PK12. Because we tested twice the number of participants as PK12, the number of observations for analysis was roughly the same (but our results are more likely to generalize to new participant observations because we modeled more subject-variance). PK12 used 120 experimental sentence contexts twice in two different conditions, rendering the experiment very long, and making participants tired, leading to potential repetition effects, and perhaps inducing experiment-specific strategies. We wanted to reduce the length of the experiment, and avoid within-participant repetition of sentence contexts to minimize potential processing strategies. Whether and how these differences explain the discrepancy between the observed results remains an open question. Perhaps more importantly, it remains to be seen whether the results of PK12 and/or those reported here are replicable (to encourage replication, we made all our materials and presentation scripts available online on https://osf.io/se3pc), but we note that we observed the same pattern of results in our first twenty participants and the twenty additional participants we tested (for information, see footnote 2).
In conclusion, in contrast to PK12, our results support an incremental, constraint-based account of semantic processing (MacDonald et al., 1994;Marslen-Wilson et al., 1988). At least at the point where the meaning of a word is retrieved, animacy and real-world event-knowledge appear to be used concurrently, in a similarly incremental fashion.

Author note
All data, stimulus materials and data analysis files were available during peer review and remain publicly available on the Open Science Framework (https://osf.io/se3pc), in accordance with the Peer Review Openness Initiative (http://opennessinitiative.org; Morey et al., 2016).

Reanalyzed data results
Upon reviewer request, we re-analyzed our data using a 0.1-20 Hz filter, corrected channels only using topographic interpolation, and using a more focused statistical approach. Our reanalysis therefore focuses on the medial ROI. Furthermore, because both PK and our own results suggest that the observed N400 effects are broadly distributed within that ROI, we excluded distributional factors from the N400 analysis and examined activity averaged over all channels in the medial ROI. For the P600 time window, because both PK and our own results suggest that the observed effects depend on anteriority, we included the anteriority factor for these analyses.
These re-analyses yielded highly similar results as the original analyses described in the main text. The only difference was that in these new analyses, but not the original analyses, the enhanced N400 for animate-related words compared to control words in Experiment 2 was statistically significant, as we had also observed in Experiment 1. The average numbers of computed trials per condition used in the new analyses were also similar to those in the original analyses.  (Table A1). Animacy by relatedness interaction analyses. A repeated measures ANOVA with factors animacy (animate, inanimate) and relatedness (related, unrelated) showed main effects of animacy and relatedness (Table A2) such that animate words elicited smaller N400s (M = 1.85 μV, SD = 2.94) than inanimate words (M = − 0.14 μV, SD = 2.89, M diff = 1.99 μV, η p 2 = 0.517) and likewise, related words elicited smaller N400s (M = 1.30 μV, SD = 3.00) than unrelated words (M = 0.41 μV, SD = 2.78, M diff = 0.89 μV, η p 2 = 0.199). The interaction between animacy and relatedness was not significant (F < 1) (Table A2).

P600 Results
Pairwise (control vs. violation type) analyses. A series of repeated measures General Linear Models (GLMs -ANOVAs) comparing the control condition with each of the violation type and including the anteriority factor (anterior, posterior) showed that animate-related words elicited more positive ERPs (M = 6.56 μV, SD = 3.62) than the control condition (M = 4.96 μV, SD = 3.15) as indicated by a main effect of sentence type (M diff = − 1.60 μV, η p 2 = 0.163). The interaction between sentence and anteriority for this comparison was only marginal (Table A1). For the comparison between the inanimate-unrelated and the control conditions, the main effect of sentence type was not significant, however, the sentence by anteriority interaction was significant (Table A1). Follow up of this interaction showed that inanimate-unrelated words elicited more positive ERPs than the control condition at the posterior region (control posterior: M = 4.87 μV, SD = 3.44, inanimate-unrelated posterior: M = 6.71 μV, SD = 3.17, M diff = − 1.83 μV, F 1,39 = 12.6, p = .001, η p 2 = 0.244), while the difference between the two sentence types did not reach statistical significance at the anterior region (control anterior: M = 5.04 μV, SD = 3.23, inanimate-unrelated anterior: M = 4.23 μV, SD = 2.81, M diff = 0.80 μV, F 1,39 = 2.9, p = .096, η p 2 = 0.069). Finally, neither the main effect of sentence type nor the interaction between sentence type and anteriority were significant for the animate-unrelated and inanimate-related comparisons to the control condition (Table A1).
Summary of Experiment 1. At the N400 time-window, control words elicited significantly smaller N400s than all violation sentences. In the animacy by relatedness analysis, animate words elicited smaller N400s than inanimate words, and related words elicited smaller N400s than unrelated words. The interaction between animacy and relatedness was not significant.
Results from the P600 time-window showed more positive ERPs overall for animate-related than control words and more positive ERPs for inanimate-unrelated than control words at the posterior region. The animacy by relatedness interaction analysis showed more positive ERPs for animate than inanimate words at anterior regions and more positive ERPs for related than unrelated words also at anterior regions.  (Table A3). Animacy by relatedness interaction analyses. The significant main effect of animacy showed that animate words elicited smaller N400s (M = − 0.02 μV, SD = 2.31) than inanimate words (M = − 1.22 μV, SD = 2.69, M diff = 1.20 μV, η p 2 = 0.334). Neither the main effect of relatedness, nor the interaction between animacy and relatedness were significant (Fs < 1) (Table A4).

P600 Results
Pairwise (control vs. violation type) analyses. None of the comparisons between control words violation words was statistically significant (ps ≥ .201) and none of the interactions between sentence type and anteriority were significant either (ps ≥ .191) (Table A3).
Animacy by relatedness interaction analyses. No significant main effects or interactions were found in this analysis (ps ≥ .150) ( Table A4).

Summary of Experiment 2
At the N400 time-window, in the pairwise comparisons, control words elicited smaller N400s than animate-related, inanimate-related and inanimate-unrelated words. In the animacy by relatedness interaction analyses, animate words elicited smaller N400s than inanimate words.
Results from the P600 time-window did not yield any statistically significant differences between the control and the violation conditions. No significant main effects or interactions were found in the animacy by relatedness analysis at this time-window.