Lexical cues and discourse integration: An ERP study of the N400 and P600 components

In a sentence reading ERP study in Swedish we investigated the roles of the N400 and P600 components. By manipulating ease of lexical retrieval and discourse integration of the critical words in four conditions (contextually primed/non-primed and degree of contextual ﬁt), we explored these components from a sentence processing perspective. The re-sults indicate that the N400 indexes lexical retrieval and access of stored conceptual knowledge, whereas the P600 component indexes pragmatic processes, such as integration of a word into the discourse context, or the information structural status of the word. The results support single-stream models of sentence processing where lexical retrieval and integration do not take place in parallel, as in multi-stream models. © 2024 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
Sentence processing is to a large extent about identifying and integrating words into context.The context can be very local or can expand much further, ranging from only the preceding word to the whole discourse.Two ERP componentsdthe temporally earlier N400 and the later P600dare thought to index different aspects of these processes.What precise aspects of the processing these components signal is debated, however.The N400 component has alternatively been taken to reflect lexical retrieval, lexical/discourse integration, or both of these, and the P600 has been taken to index either syntactic repair/integration or discourse integration more generally, or both.The functional interpretation of these components is crucial when we build models of how language is processed.One of the aspects in which processing models differ is in how many processing streams they posit.Multistream models typically make use of two processing streams: a semantic stream and an algorithmic/syntactic stream.Single-stream models, on the other hand, assume only one stream in which a word's lexical meaning is retrieved from long-term memory and integrated syntactically and semantically into the (discourse) context as partly overlapping processes (for an overview, see Aurnhammer, Delogu, Schulz, Brouwer, & Crocker, 2021).In both types of model the N400 and the P600 components play crucial roles.The aim of the present study was to investigate which of these two approaches can account for N400 and P600 effects when the ease of lexical retrieval and discourse integration of critical words is manipulated.Before going into details about the manipulations, we will give a description of the two ERP components and how they relate to each other in the two processing approaches.
1.1.The N400 and P600 components Delogu, Brouwer, and Crocker (2019) identify three major functional accounts of the N400 component.On the access/ retrieval account (e.g., Brouwer, Crocker, Venhuizen, & Hoeks, 2017;Delogu, Brouwer, & Crocker, 2021;Delogu et al., 2019;Kutas & Federmeier, 2011;Lau, Phillips, & Poeppel, 2008), the N400 amplitude is an index of the efforts required to access the conceptual knowledge of the word form in long-term memory.This effort is influenced by lexical and contextual cues.The more cues are given, the more reduced the N400 amplitude is.
On the integration account (e.g., Brown & Hagoort, 2000;Hagoort, Hald, Bastiaansen, & Petersson, 2004), the N400 is an index of the effort required to incorporate the meaning of the eliciting word into the preceding context.On this approach the component is an indication of how difficult the utterance interpretation is to update (see Delogu et al., 2019, 2).
Finally, on the hybrid account (for example Baggio, 2018;Baggio & Hagoort, 2011;Neufeld et al., 2016;Nieuwland et al., 2020), the N400 is an index of the efforts involved in both retrieval of word meaning from long-term memory and integration into the preceding context.

Single-and multi-stream models
Single-stream models adhere to the access/retrieval account of the N400.As mentioned above, several factors contribute to a word's N400 amplitude and both bottom-up and top-down information has an effect.However, the results in Aurnhammer et al. (2021) indicate that lexical association of the preceding context is more important than the listener's expectation of an upcoming word when it comes to decreasing the N400 amplitude.That is, the more lexically related a word is to its preceding context, the easier it is to access it in long-term memory.The P600 is an index of the difficulty of integrating the word into the discourse context.Importantly, in the single-stream models, every word in a discourse context will have a P600 component, just as it has an N400 component.The P600 amplitude is thus modulated by ease of integration e the easier a word is to integrate, the smaller the amplitude e and Brouwer et al. (2012) suggest that the P600 might perhaps not index syntactic processing per se.They propose that it might be the integration difficulties resulting from syntactic complexities (e.g., garden paths) and violations (e.g., agreement errors) that lead to P600 effects rather than the syntactic structures themselves.
Multi-stream models adhere to either the integration or the hybrid account of the N400 (see Brouwer et al., 2012, for a review of different processing models).The important difference between single-stream and multi-stream models is that the N400 is seen as (partly) indexing contextual integration of a word in the latter, i.e., a semantic processing stream arrives at some kind of semantic interpretation independently of a syntactic representation of the same linguistic input.The role of the P600 component varies between different multi-stream approaches.The component may index a clash of two interpretations, a semantic and a syntactic (Kim & Osterhout, 2005;Van De Meerendonk et al., 2009).Some models see the P600 as an index of the difficulty of updating the discourse model, but only if the semantic representation differs substantially from a predicted representation, and a P600 effect is the result of passing a threshold in how much discrepancy there can be between an incoming word and the predicted word (Kuperberg et al., 2020).In addition, some multi-stream models see the P600 as a component with different distributions, one anterior and one posterior, depending on in what type of contexts the clash arises (Kuperberg et al., 2020, see also DeLong & Kutas, 2020).
To tease apart what the N400 and the P600 components index, Delogu et al. (2019) set up an experiment where they manipulated the context so that the critical word was either contextually primed or not, and either plausible or implausible in the context.Contextual priming should matter for lexical retrieval, while plausibility should play a role for integration. 1The critical noun zebra is contextually primed and plausible (baseline condition) in the example: Frauke entered the zoo.A little later she took a photo of a zebra …2 The same noun is contextually primed but implausible in the example Frauke left the zoo.A little later she took a photo of a zebra … and contextually non-primed and implausible in the example F. entered the stadium.A little later she took a photo of a zebra … In comparison to the baseline condition (contextually primed and plausible), the contextually non-primed implausible condition gave rise to an N400 effect, but no P600 effect, while the contextually primed implausible condition gave rise to a P600 effect, but no N400 effect.Delogu et al. interpreted these 1 Delogu et al. (2019) use the term EVENT RELATED, i.e., a word related to the event described in the clause.We use CONTEXTUAL PRIMING to avoid confusion with the method used event related potentials.
results to mean that the N400 component reflects lexical retrieval processes but not semantic integration since there was an N400 effect only in the condition where the critical word was contextually non-primed.With regard to the P600 effect, they took the results to indicate that the P600 component reflects semantic integration in addition to syntactic reanalysis/reprocessing.The unexpected lack of a P600 effect in the non-primed implausible condition was thought to be the result of component overlap: the N400 effect cancelled out the P600 effect.Since the sentences containing the critical word in this condition were not totally implausible in themselves (although they were of course less expected than in the plausible condition), any P600 effects might have been small and may therefore indeed have been cancelled out by the N400 effect.In a follow-up study, their non-primed, implausible condition indeed showed a P600 effect when the primed, implausible condition from the previous study was replaced by a non-primed, plausible condition, providing a better baseline for the plausibility manipulation (Delogu et al., 2021).

The present study
The aim of the present study on Swedish was also to tease apart the functional interpretations of the N400 and P600 components but we used a somewhat different design from that in Delogu et al. (2019).First, the context leading up to the critical word was kept identical across conditions in our study.By manipulating only the critical word, we controlled for the context and its potential connotations (in Delogu et al.'s example, in contrast, left the zoo leaves room for participants to infer an indefinite number of locations unknown to the experimenters).Second, we included words that violate thematic role selection.This was done to test whether we could replicate the anterior P600 effect that has been found in some studies, especially when a predicate's thematic role assignment is violated (DeLong & Kutas, 2020;Kuperberg et al., 2020).If the P600 component is indeed an index of contextual integration, as in the single-stream model, it is reasonable to assume that it should have a similar distribution irrespective of the difficulty of integration, rather than being anterior in some cases.We used sentences where the critical word was either contextually primed or non-primed and either fitted the context or not.Contextual fit is not a binary notion, but is gradable in nature.We used it to describe how well a word fits in the sentential context, as will be described below.As in Delogu et al. (2019), the idea was that if the critical word is contextually primed, it should be easier to retrieve than if it is non-primed, and if it fits in the context, it should be easier to integrate than if it does not fit in the context.
(1) a.A few of the boys went to the park.They had a good time.
b.Not many of the boys went to the park.They stayed at home instead.
With a positive quantifier, like a few in (1a), the invoked set is that of the boys who went to the park, the REFSET, while with a negative quantifier, like not many in (1b), the invoked set is that of the boys who didn't go to the park, the COMPSET (examples from Sanford & Moxey, 2004).Notably, both the REFSET and the COMPSET are lexically primed by the same context since both sets are determined by the relation between the subject (x number of the boys) and the predicate (go to the park).This means that the REFSET and the COMPSET readings have the same contextual priming.
Quantified noun phrases are known to show some odd behaviour in processing that should be taken into account when using them in the experimental materials.Previous studies on the processing of quantifiers suggest that positive and negative quantified noun phrases are processed differently.In a series of experiments Urbach, DeLong, and Kutas (2015;2010) showed that positive and negative quantifiers gave rise to the same N400 effect on the word worms in contrast to the word crops, in sentences such as Few/Many farmers grow crops/worms ….For negative quantifiers, this N400 effect was unexpected since the implausible word is worms rather than crops with the negative quantifier few and the N400 effect should be reversed compared to positive quantifiers. 3ositive quantifiers thus seem to be fully integrated earlier than negative quantifiers, even though, as Urbach et al. (2015) point out, both types are fully integrated at the end of the interpretive processes, possibly no later than at the end of the clause.It is also a well-known fact that quantifiers interact with each other in the same clause and can give rise to socalled scope ambiguities (May, 1977;Szabolcsi, 1997) as in sentences like Every girl climbed a tree.Depending on which quantified noun phrase takes wide scope, different interpretations arise: if the universal quantifier every takes wide scope, the interpretation is that every girl is such that she climbed a tree and there could be separate tree for each girl, but when the existential quantifier (the indefinite noun phrase) takes wide scope, the interpretation is that there is one tree such that every girl climbed it and there is just a single tree relevant in the discourse.Results from an ERP study by Dwivedi, Phillips, Einagel, and Baum (2010) suggest that scope ambiguities are underspecified representations, and not resolved until disambiguating information is provided.
In order to avoid any delayed integration of the quantified noun phrases, we used only positive quantifiers as in (1a).To avoid scope ambiguities, the sentences contained an intransitive verb and no other quantifying elements, such as indefinite noun phrases, or adverbs like often or rarely.In addition, the critical word appeared in the clause following the quantifier.An example item from the material is given below in (2).All sentences in the item thus had the same beginning and wrap-up and differed only in what adjective appeared in the second clause (in boldface in a-d below).
(2) N€ astan alla h€ ackl€ oparna snavade i semifinalen och att de var sa ˚Almost all the hurdlers fell in the semifinal and that they were so a. klumpiga var f€ orva ˚nande.REFSET  We manipulated the predicative adjective in the final clause (a-d, above) on two parameters: whether the adjective was contextually primed, i.e., if it was related to the event described in the first part of the sentence, and to what degree it fitted in the context.In (2a) and ( 2b), the adjective targeted the REFSET and the COMPSET, respectively.As described above, the two sets are given by the relation between the subject and the predicate.The REFSET and the COMPSET were thus identical in terms of contextual PRIMING.Given that positive quantifiers like n€ astan alla ('almost all') put focus on the REFSET, the pronoun they will be interpreted as referring to the REFSET in all conditions.Thus, only the adjective clumsy, which is compatible with this reading, fitted contextually, while skilful, which is compatible with a COMPSET interpretation, did not fit contextually.We called these conditions REFSET and COMPSET, respectively.The third condition had an adjective (skinny in 2c) that was not immediately connected to the context, i.e., it was contextually NON- PRIMED.However, the property of the adjective in this condition could easily be ascribed to the subject (i.e., the hurdlers), but only as a general property unrelated to the context.We called this condition UNRELATED.The final condition used an adjective that was contextually NON-PRIMED and in addition was incompatible with an animate subject, i.e., a thematic role violation.We called this condition ANOMALOUS.The conditions related to the two parameters are shown in Table 1.
The two processing approaches to sentence processing make different predictions regarding the conditions in Table 1.The REFSET condition is both contextually primed and plausible and therefore the kind of sentence that is most likely to occur in discourse.We treat this as the baseline condition.The possible N400 and P600 effects discussed below for the other conditions are in relation to this baseline.
In the COMPSET condition the critical word is contextually primed, but implausible since the pronoun they takes the REFSET as antecedent, rather than the COMPSET.In a single-stream model, the predicted result is that there should be no N400 effect, since the critical words in both baseline and COMPSET are identically primed by the context.However, since the critical word in this condition is not plausible with the REFSET as antecedent to the pronoun, it should be difficult to integrate into the discourse context, and therefore a P600 effect is predicted.
Multi-stream models predict an N400 effect in the COMPSET condition since the N400 is an index of integration, on the integration and hybrid accounts of the N400.Regardless of their account of P600 effects, they predict the absence of such an P600 effect.In an approach where the P600 is an index of syntactic complexity or reanalysis (Kim & Osterhout, 2005), no P600 is predicted, since any adjective in the post-copular position is syntactically well-formed and not more complex than the baseline.In approaches where the P600 is an indication of a mismatch between a semantic stream (visible in an N400 effect) and a syntactic stream, the prediction is also that there should be no P600 effect since the heuristic semantic stream arrives at an implausible interpretation, and the algorithmic syntactic stream arrives at the same interpretation, hence there is no clash between the streams and no P600 effect should arise.In short, an N400 effect in the COMPSET condition is unexpected under a single-stream approach,4 and a P600 effect without an N400 effect is unexpected under a multistream approaches.
In the UNRELATED condition, single-stream processing models predict that there will be both an N400 and a P600 effect, since the critical word is neither contextually primed nor related to the event described in the sentence, i.e., does not fit the context.The adjective should thus be more difficult to retrieve (N400) and integrate (P600) than in the baseline condition.Some studies have found a frontal P600 for this type of condition, as in DeLong and Kutas (2020).Importantly, single-stream models predict a posterior, not an anterior P600 effect.
Multi-stream models make the same predictions in the UNRELATED condition, as in the COMPSET condition.The adjective is again unexpected, causing an N400 effect, but the sentence is syntactically well-formed, so there is no need for a reanalysis causing a P600, nor is there a clash between the streams since the algorithmic stream once again should reach the same interpretation as the heuristic semantic stream.Note also that the approach in Kuperberg et al. (2020), which is a three stream model, does not predict a P600 effect in this condition since the discrepancy between the critical word and the expectation from the context is not large enough. 5In short, for the UNRELATED condition, an N400 effect is expected on both single-and multi-stream approaches, while a P600 effect is expected only on the single-stream approach.
For the ANOMALOUS condition, both single-stream and multistream approaches predict N400 and P600 effects, though for different reasons.In the single-stream model, the ANOMALOUS  (Kuperberg, 2007).
Although the effects of this condition cannot be used to separate between the two processing approaches, the condition is different from the other three in that it involves a thematic role violation in addition to being unrelated to the event described (as in the UNRELATED condition).If the P600 effect in this condition is not posterior, this would be an indication that the P600 is not just about integration into context as suggested by Brouwer et al. (2012).
The procedure of determining the critical words in the categories are described in detail in the Materials part of the Methods section.No part of the study procedures or analysis plans was preregistered prior to the research being conducted.

Methods
We report how we determined our sample size, all data exclusions, all inclusion/exclusion criteria, whether inclusion/ exclusion criteria were established prior to data analysis, all manipulations, and all measures in the study.

Participants
Based on effects found in previous studies (Heinat & Klingvall, 2020;Klingvall & Heinat, 2022a), our aim was to have at least thirty-five participants.Forty students from Lund University (age 19e40, mean 23, 24 female), who were all self-reported native speakers of Swedish, participated in the study in exchange for cinema vouchers.They were all right handed and had no diagnosed psychiatric or neurological disorders.

Materials
The four conditions in (2), repeated below as (3), have been described in detail above.
(3) N€ astan alla h€ ackl€ oparna snavade i semifinalen och att de var sa ˚CW Almost all the hurdlers fell in the semifinal and that they were so CW a. klumpiga var f€ orva ˚nande.REFSET  The materials consisted of 160 experimental items of four sentences each, as in (3).Each sentence frame had a quantifying phrase as subject (n€ astan alla X 'almost all X', de flesta X 'most X', i stort sett alla X 'virtually all X') and a predicative adjective (bold-marked in the example item), manipulated with regard to contextual priming and contextual fit, in the second clause.
To verify that the adjectives had the intended properties, a total of 180 items were rated on a seven-grade Likert scale (1 ¼ unnatural sentence e 7 ¼ completely natural sentence) by 29 participants, none of whom participated in the EEG experiment (see DeLong & Kutas, 2020, for a similar method).In addition to the four experimental conditions, we included two additional conditions with a negative quantifier as subject and the REFSET and COMPSET adjectives (e.g.,Not many of the hurdlers fell in the semifinal and that they were so clumsy/skilful …).This was to make sure that the REFSET and COMPSET adjectives were plausible with the intended set.From the rated items, 160 items were selected.To be included, REFSET adjectives had to receive a mean higher than 4 with a positive QE (mean 6.4,SD .64),and a lower mean with a negative QE in the same item (mean 3.7, SD .72),and COMPSET adjectives had to receive a mean score lower than 3 with a positive QE (mean 1.83, SD .52)and a higher score with a negative QE in that item (mean 4.05, SD .77).Thus we made sure that the adjectives in the REFSET and COMPSET conditions targeted the intended set.
The sentences in the UNRELATED condition received surprisingly low ratings (mean 2.65, SD 1.05), most likely because they were unrelated to the event described in the first clause of the sentence.In a follow-up rating study (14 participants, none of whom participated in the EEG experiment), we therefore tested the plausibility of the subject and the predicative adjective on their own (Almost all the hurdlers were skinny).The UNRELATED adjectives included in the EEG experiment had a mean of 6.3, SD .54 in this frame, showing that they were indeed compatible with the subject.The ANOMALOUS adjectives received low ratings both in the full sentence frame (mean 1.7, SD .71)and in the simple frame (mean 2.0, SD .64).
Word frequency and word expectancy have been shown to affect the size of the N400, such that higher word frequency and cloze probability attenuate the N400 (see Kutas & Federmeier, 2011).Due to their potential effect on the N400, we included these two measures for the adjectives in the statistical model.Word frequencies were obtained from corpus data. 6To assess word expectancy, we conducted a cloze test where we gathered approximately 28 responses (mean 27.5, range: 23 -39) for each of the 160 sentence frames.The cloze test was conducted online in google forms.The sentences were divided into eight lists with twenty sentences in each list.The participants were instructed to complete the sentences as in Almost all the hurdlers fell in the semifinal and that they were so ….None of participants in the cloze test took part in any of the norming studies or the EEG experiment.The results from the cloze test and the frequency of the critical words are shown in Table 2.The mean cloze probability of the REFSET is half of the mean cloze probability for low constraining contexts in Kuperberg et al. (2020) (.2%). 7 Kuperberg et al. (2020) found 6 The frequencies are taken from Svenska spra ˚kbanken, size 12. 53G tokens. 7The mean cloze probability of the REFSET is due to only a few sentences.Only 6 sentences had a cloze probability over 40% and no single sentence reached the mean cloze probability of a high constraining context in Kuperberg et al. ( 2020) (71%).
different P600 effects in high and low constraining contexts.
The cloze probabilities in Table 2 show that the sentence frames in the items are low constraining contexts.The four sentences in each of the 160 items were distributed across 4 lists in a Latin square design for the EEG experiment, such that each list contained an equal number of sentences from all conditions (40) but only one sentence from each item.142 unrelated filler sentences were also included, making the total number of sentences on each list 302.The filler sentences contained non-quantified definite subjects (e.g., the boys), negatively quantified subjects (e.g., few boys), anticipatory subjects and bare nouns.The majority of the filler sentences were non-anomalous (e.g., Farbr€ oderna gn€ allde under m€ otet och att de var sa ˚retliga m€ arktes direkt.'The old-men whined during the meeting and one could immediately tell that they were petulant.'and Det var roligt att ga ˚kursen i id ehistoria och att l€ ara sig mer om ideologier.'It was fun to take the course in History of Ideas and to learn more about ideologies.'), or they had an agreement mismatch between the subject and a predicative adjective (as in Sparris € ar underbart s€ ager Anna 'Asparagus COMMON is wonderful NEUTER says Anna').

Procedure
At the beginning of the session, the participants were given a consent form to complete.Once the electrodes had been applied, the participants were placed in front of a computer screen in the lab.The first slide contained instructions.These were followed by 8 practice sentences during which the participants could ask questions if anything was unclear.Only after that did the actual experiment start.The stimuli were presented using PsychoPy (Peirce et al., 2019) and each trial began with a cross-hair displayed centrally on the screen for 500 ms, followed by a blank screen for 200 ms after which the sentences were presented word-by-word for 300 ms each, with a 200 ms blank screen interval between them.The next sentence was presented after the participant pressed a key.
To keep the participants on task, one fourth of the sentences were followed by a yes/no question about the contents of the sentence.Since it has been found that explicit judgement tasks can elicit positive potentials at critical words even when the task is post-sentential (Kuperberg, 2007;Nieuwland, 2014;Osterhout & Mobley, 1995;Roehm, Bornkessel-Schlesewsky, R€ osler, & Schlesewsky, 2007),8 all questions were directed to the meaning of the sentences and never targeted the critical word.The sentences in (3), for example, were followed by the question: Did some of the hurdlers fall?No participant was excluded due to low scorings on the questions (correct answers: mean 87%, range 77%e98%).The experiment lasted about 60 min (mean time ¼ 60 mins, range 49e80 mins).
The EEG was filtered (.1e100 Hz band-width filter) and corrected for ocular artefacts using independent component analysis in EEGLab (Delorme & Makeig, 2004).Data was then segmented into epochs that started 200 ms before critical word onset, and lasted until 1500 ms after adjective onset.All epochs were baseline-corrected (200 ms baseline) and then automatically screened for artefacts (minimal/maximal allowed amplitude ¼ À75/75 mV, and vertical and horizontal eye movements) using ERPLAB (Lopez-Calderon & Luck, 2014).No participant had more than 30% of the epochs removed from any of the four conditions (mean number of epochs remaining per condition ¼ 37.2 (93%), SD 2.3).

Data analysis
In line with the recommendations in Kretzschmar and Alday (to appear) and Luck and Gaspelin (2017), we restricted the analysis to pre-defined regions and time windows.The posterior region for the N400 effect consists of CPz, CP3/4, Pz, P3/4 P7/8, Oz, O1/2, with a time window from 300 to 500 ms after critical word onset (see So ski c, Jovanovi c, Styles, Kappenman, & Kovi c, 2022).
From previous studies on P600 effects related to discourse integration, it seems that the effect is more central and prolonged compared to P600 effects induced by syntactic violations (Delogu et al., 2019(Delogu et al., , 2021;;Klingvall & Heinat, 2022a).In Delogu et al. (2019) a significant P600 effect did not occur until after 800 ms after critical word onset, and DeLong and Kutas (2020) used two time windows for P600 effects: 600e900 and 900e1200 ms after critical word onset.Based on these studies, and the plots from Klingvall and Heinat (2022a), where the P600 effect clearly continues longer than 1000 ms after word onset, the time window for the P600 effect in the present study was between 800 and 1100 ms after critical word onset.
The distribution of the P600 appears to be quite widespread but the effect is often largest in the posterior region of the scalp (Leckey & Federmeier, 2020).As mentioned above, some approaches to sentence processing e.g., Kuperberg et al. (2020) and DeLong and Kutas (2020) divide the P600 into one anterior and one posterior P600, indexing slightly different things.Since the P600 is generally considered to be an effect in the posterior region, we will distinguish the two distributions by calling the frontal P600, anterior P600.Kuperberg et al. (2020) and DeLong and Kutas (2020) used different electrode systems and based on their ROIs we defined an anterior region of interest: Fp1, Fp2, F7, F3, Fz, F4 and F8.We followed Kuperberg et al. (2020) and used a time window between 600 and 1000 ms after critical word onset for the anterior P600.The region of interest for the P600 was identical to the region in Klingvall & Heinat, 2022a and thus consisted of CPZ, CP3/4, PZ, P3/4, OZ, O1/2.For the statistical analyses we performed linear mixedeffects model analyses (Baayen, Davidson, & Bates, 2008; Kretzschmar & Alday, to appear), using the LmerTest package for R (Kuznetsova, Brockhoff, & Christensen, 2017;R Core Team, 2023).Logfrequency and cloze probability were first included as scaled continuous predictors in the models.Only Logfrequency showed a significant interaction with another predictor but was still excluded in the final models, for discussion see Appendix A. The grand average ERPs of the three ROIs and time windows specified above are presented in the summaries of the converging models that show significantly better model fit than less complex models.The models used are shown under each table.

Results
Grand average ERPs of the four conditions from selected electrodes along the midline are shown in Fig. 1  c o r t e x 1 7 8 ( 2 0 2 4 ) 9 1 e1 0 3 3.1.

N400
The results from the time span and ROI for the N400 component are shown in Table 3.The baseline, the REFSET, is the intercept in the model.The greatest difference is between the REFSET and the ANOMALOUS conditions.This difference is significant.The UNRELATED condition is also significantly different from the REFSET, but this difference is less than half of that between REFSET and ANOMALOUS conditions.The COMPSET condition is practically the same as the intercept.

P600
The results from the fitted model on the P600 time window and ROI are shown in Table 4. Compared to the REFSET condition there is a significant P600 effect in the other three conditions.Again, the largest effect is in the ANOMALOUS condition, and the effects in the COMPSET and UNRELATED conditions are of the same magnitude.

Anterior P600
The results from the fitted model on the anterior P600 time window and ROI are shown in Table 5.Compared to the REFSET condition there is no significant effect in any of the other conditions.The non-significant difference between the REFSET and the ANOMALOUS is also in the wrong direction, the ANOMALOUS condition being more negative rather than positive, contrary to what an anterior P600 would show.

Discussion
This study aimed to investigate the functional interpretation of the N400 and P600 components, and in particular to provide data to distinguish between single-and multi-stream processing models, but also between different accounts of the N400 component: on the one hand the access/retrieval account, and on the other hand the integration and the hybrid accounts.
The experimental material consisted of sentences in which the critical word was a predicative adjective in the final clause of two main clauses.The adjective was manipulated for contextual priming and degree of contextual fit.The adjective counted as primed if it was associated with the event described in the verbal predicate in the first clause, and it fitted the context if the subject could plausibly be ascribed the property denoted by the adjective.This resulted in four different experimental conditions: REFSET (primed and plausible), which is also the baseline to which the other conditions were compared, COMPSET (primed and implausible), UNRELATED (non-primed because it is unrelated to the event described but still a plausible property of the subject), and ANOMALOUS (nonprimed and anomalous due to an animacy violation of thematic roles).We found that both contextual priming and contextual fit had an effect on processing in terms of N400 and P600 effects.More precisely, the conditions where the adjective was not associated with the event described by the predicate in the first clause, i.e., the non-primed UNRELATED and ANOMALOUS conditions, gave rise to N400 effects relative to the baseline, the REFSET condition.The effect in the ANOMALOUS condition was more than double the size of that in the UNRELATED condition.All three conditions showed a P600 effect relative to the baseline.The effect was again largest in the ANOMALOUS condition, and the effects in the COMPSET and UNRELATED conditions were virtually the same.There were no significant effects in the anterior P600 time window and region.
We take the N400 results to support the single-stream approach to sentence processing and the access/retrieval interpretation of the N400 component.The single stream approach is the only approach that predicts the absence of an N400 effect in this condition.The integration and hybrid accounts predict an N400 effect in the COMPSET condition, since the adjective is contextually primed but not plausible and therefore hard to integrate.The absence of an N400 effect in this condition replicates previous results in processing studies on REFSET/COMPSET readings, in which set-reading mismatches give rise to P600 but not N400 effects (Heinat & Klingvall, 2020;Klingvall & Heinat, 2022a).
The P600 findings support an interpretation of this component as indexing discourse integration more generally, rather than an effect of conflicting interpretations of a syntactic and a semantic stream.In a multi-stream approach, the P600 effect in the COMPSET condition is difficult to account for,   c o r t e x 1 7 8 ( 2 0 2 4 ) 9 1 e1 0 3 since the sentences are syntactically well-formed and there is no clash in processing streams that would cause a P600 effect.Even though the adjective in the UNRELATED condition was contextually non-primed by the event described in the first clause, it was compatible with the subject.According to Kuperberg et al. (2020) a condition like this should not require an update of the discourse model, i.e., the integration of the UNRELATED word should not be difficult enough to cause a P600 effect, and as in the COMPSET condition, since there are no conflicting interpretations of the different processing streams, other multi-stream models do not predict a P600 effect in this condition either.As described in section 2.2, the adjectives in the UNRELATED condition received low scores when they were normed in the sentential context used in the experimental material (although they were clearly well-formed as properties of the subjects, as the follow-up norm showed).Our interpretation of the P600 effect in this condition is that it is related to integration difficulties in the discourse model.The difficulty is caused by an information structural shift: the general property ascribed to the subject in the second clause steers away from the previous discourse topic, i.e., the event talked about in the first clause (see Klingvall & Heinat, 2022b).It seems likely that the required update of the discourse structure due to this information structural shift is what is reflected in the P600 effect in this condition (see Burkhardt, 2006, for the relation between information structure and the P600).The fact that Kuperberg et al. (2020) did not find any P600 effect but only an N400 effect in their LOW CONSTRAINT UN- EXPECTED condition vs LOW CONSTRAINT EXPECTED condition might be due to the contextual fit of the critical words being better in their study so that the P600 was cancelled out by component overlap (see discussion in Brouwer et al., 2012).
Component overlap was also suggested as an explanation for the unexpected lack of a P600 effect for the CONTEXTUALLY NON-PRIMED IMPLAUSIBLE condition in Delogu et al. (2019): the N400 effect cancelled out the P600 effect.Our ANOMALOUS condition, in contrast, gave rise to a P600 effect in addition to an N400 effect, as expected under the access/retrieval account of the N400.In the present study, thus, the P600 effect was strong enough not to be cancelled out by the N400 (see Aurnhammer, Crocker, & Brouwer, 2023, for details of overlap of the N400 and P600 components).Most likely this difference in results is due to differences in the materials in the two studies.Unlike in Delogu et al. (2019), our ANOMALOUS condition involved thematic role violations.

Concluding remarks
From a sentence processing perspective, the results from the study can be accounted for in a single-stream model, such as the Retrieval-Integration (RI) account (Brouwer et al., 2012(Brouwer et al., , 2017)).
The assumption in a single-stream model is that semantic information is not processed independently, as in multistream models (see Aurnhammer et al., 2023;Brouwer et al., 2012, for a review).In the RI account the N400 and P600 components index two different processes: lexical retrieval and discourse integration.Lexical retrieval is not only a bottom-up process, but it is also facilitated by lexical and contextual priming.This facilitation is seen in a reduced N400 effect.This is clearly the results in the present study: the nonprimed ANOMALOUS adjective has the largest N400 effect, and the UNRELATED adjective, which is non-primed but denotes a possible property of the subject, has a reduced N400, and the COMPSET adjective has no N400 effect at all since its contextual priming is exactly the same as for the REFSET condition, the baseline.The low ratings of all adjectives in context in the ANOMALOUS, COMPSET and UNRELATED conditions, as well as their low cloze probabilities, indicate that they are not plausible in context (see Materials section).The RI account predicts that they should all induce a P600 effect, which they do.In the UNRELATED condition we see that information structure too has an effect on the P600 component.This is expected since information structure is related to the information status that sentence constituents get from the context, not their grammatical status.
The results are not easily accounted for in multi-stream approaches.The COMPSET condition is problematic for multistream models since they all predict an N400 effect but no P600 effect, contrary to the results.As pointed out by Brouwer et al. (2012), biphasic N400/P600-effects that are found in syntactically well-formed sentences, as in our UNRELATED condition, are also difficult to account for in multi-stream models that assume that the N400 component indexes both lexical retrieval and discourse integration and the P600 indexes syntactic processing, e.g., structural reanalysis, or indicates a clash between the interpretations arrived at in the semantic heuristic stream and the algorithmic syntactic stream.
What needs to be further investigated is to what extent different cues facilitate the processing of words.Brouwer et al. (2012) claim that cues, such as top-down information, do not constrain the pattern of activation, but on the contrary, add to the activation of certain words.From the results in the present study it is not possible to tell whether the reduced N400 and P600 effects are due to constraining patterns, or higher activation.One step in the direction of answering this question would be to use critical words in the UNRELATED condition that are not contextually primed but plausible in the sentence context to a higher degree than was the case in the present study.
Another outstanding question is if the P600 is only an indication of contextual integration, as suggested by Brouwer et al. (2012), or if it is also an index of syntactic violations.As Brouwer et al. (2012) point out, many of the experiments that find 'syntactic' P600 effects have material that may also introduce difficulties in integration.For instance, if there is a number mismatch in subject e verb agreement, the parser will have to update the discourse model and add, or subtract, discourse participants depending on the order of information.To be able to tease apart a syntactic P600 from an integration P600, some morphological feature/syntactic structure that does not also involve integration difficulties need to be used.We leave these questions for future studies.

Data accessibility statement
All stimulus materials, code and data associated with the study are available at the following repository: https://osf.io/c5wt6/.

Word frequency and cloze probability
Word frequency and word expectancy (cloze probability) have been shown to affect the N400 such that higher values for these features reduce the N400 (Kutas & Federmeier, 2011).We measured word expectancy through a cloze task and included the probabilities in the statistical analysis.The analysis showed, however, that cloze probability was not a significant predictor in either the N400 or the P600 time window.In addition, the model including cloze probability did not significantly improve model fit compared to the models without it (see for example Winter, 2019, for model comparisons) and was thus excluded from the model.Word frequency showed a significant interaction in only one condition.Contrary to expectations, and to the results from previous studies, the effect of word frequency was negative in the nonsignificant interactions in the N400 time window, i.e., the higher the frequency of the adjective, the larger the N400 effect.The effects of frequency on the P600 component are not described in the literature and it is unclear what effects (if any) it should have for this component.However, it seems reasonable to assume that, if anything, higher frequency should facilitate integration, rather than hinder it.The interactions with frequency that we find in the P600 time window indeed indicate that high frequency words are integrated more easily, but the effect is significant only in the REFSET condition.It is not clear to us why the effects of frequency are not uniform across conditions, especially in the N400 time window where higher frequency of words is known to have a facilitating effect on retrieval.We interpret the interactions that we see as unreliable and spurious, and we therefore decided to exclude word frequency from the models used.
Here are shown the models that do include frequency and cloze probability.c o r t e x 1 7 8 ( 2 0 2 4 ) 9 1 e1 0 3

A.1 Frequency
Fig. 1 e Panel A: Grand average ERPs from selected electrodes (Cz, CPz and Pz) time locked to the target adjective in the REFSET condition (black solid line), the ANOMALOUS condition (red long dashed line), the COMPSET condition (blue dashed line) and the UNRELATED condition (green dashdotted line) (data low-pass filtered 10Hz for visual purposes only).Panel B: Topographic maps of the simple effects of the ANOMALOUS (first column), the COMPSET (second column) and the UNRELATED (third column) conditions minus the REFSET condition in time spans of 200 ms, from 200 ms to 1200 ms after the onset of the target adjective.

Table 1 e
The conditions and the parameters.

Table 2 e
Cloze probability and frequency of the adjectives.

Table 3 e
Grand average ERPs on the adjective for the N400 ROI, time window 300e500 ms.

Table 4 e
Grand average ERPs on the adjective for the P600 ROI, time window 800e1100 ms.

Table 5 e
Grand average ERPs on the adjective for the anterior P600 ROI, time window 600e1000 ms.

Table A1 e
Results for N400 effects on the adjective with frequency in the model.

Table A2 e
Results for P600 effects on the adjective with frequency in the model.

Table A3 e
Results for anterior P600 effects on the adjective with frequency in the model.

Table A4 e
Results for N400 effects on the adjective with cloze probability in the model.

Table A5 e
Results for P600 effects on the adjective with cloze probability in the model.

Table A6 e
Results for anterior P600 effects on the adjective with cloze probability the model.