The impact of response congruence on speech production: An event-related potentials study

A puzzling finding in the speech production literature is the facilitation of categorically related distractors in a superordinate level naming task. The context is in this case response congruent, because application of the task instruction to the context would lead to the correct response. This study investigates the time-course of response congruence effects in speech production using event-related potentials (ERPs). Participants overtly named target words that were overlaid on context pictures with either their superordinate category level name or their associated function, while their response times and ERPs were recorded. Behavioural results replicate the facilitating effect of response congruence. The ERP results showed that the N2 was larger for a response incongruent than congruent context, and this effect correlated with the behavioural pattern of results. This key finding suggests that response incongruence is associated with a conflict-monitoring response which drives the behavioural effect. Further, N400 amplitude was not modulated by response congruence, showing that its effect appears confined to the conceptualisation phase. Finally, P3 modulations mirrored those in RTs, but unlike the N2 effect, they did not correlate with RTs. This suggests that, although the facilitating effect of response congruence is confined to the conceptualisation phase of speech production, response incongruent representations may remain active during later processing stages, or that this late effect of response congruence reflects conflict resolve. Implications for models of speech production are discussed.


Introduction
Although speaking is one of the most essential human abilities, there is still debate about its underlying principles.The stages assumed in word production broadly consist of conceptualisation, lexical access, and articulation (Levelt, Roelofs, & Meyer, 1999), which have typically been investigated with single word production tasks in various paradigms.In particular, the picture-word interference (PWI) task, in which a speaker names a picture while ignoring a superimposed word, has been widely used for decades (Macleod, 1991;Schriefers, Meyer, & Levelt, 1990).Other popular tasks include picture naming in semantically related vs. unrelated blocks (Belke & Meyer, 2005), or the cumulative semantic interference task in which semantically heterogeneous picture-naming blocks include an increasing number of items belonging to the same semantic categories (Howard, Nickels, Coltheart, & Cole-Virtue, 2006).The key finding from these studies is that a categorically related context, presented as a word, a picture, or by having named categorically related items in previous trials, hampers production of a picture name more than an unrelated context.By contrast, associatively or phonologically related contexts usually induce facilitation compared to an unrelated one (but see McDonagh, E-mail address: jk28@stir.ac.uk.
The Swinging Lexical Network model (SLN; Rahman and Melinger, 2009;2019) explains semantic interference and facilitation based on the number of lexical competitors and their level of activation.Thus, in line with earlier models of word production, lexical selection is assumed to be a competitive process (Roelofs, 1992;Starreveld & la Heij, 1996).An important feature of the SLN is that semantic facilitation, due to spreading of activation at the conceptual level, is distinguished from semantic interference due to competitive lexical (lemma) selection (Levelt, 1989).The relative strength of facilitation and interference at each level determines if the speech production process is hampered or facilitated by the context.
In contrast to other models of word production, the SLN can account for context effects that appear when the task instruction differs from basic-level picture naming.For example, when the task is to name a picture with its superordinate category level name (e.g., to name the picture of a flower with its superordinate category label "plant"), categorically related context words (e.g., "heather") induce facilitation instead of interference, as compared to unrelated context words (e.g., "stone"; Glaser & Dungelhoff, 1984;Kuipers, La Heij, & Costa, 2006).Kuipers et al. (2006) argued that, in this scenario, the context stimuli are response congruent, because application of the task instruction to the context stimulus leads to the same answer as application of the task to the target stimulus.
Within the SLN, semantic facilitation due to spreading of activation at the conceptual level would be accompanied by response congruence facilitation when the target and context converge on the to-be-verbalised target concept (Hantsch, Jescheniak, & Schriefers, 2009;Kuipers & La Heij, 2008, 2012;Kuipers et al., 2006).Kuipers and La Heij (2008) used superordinate-level naming and associated function naming tasks to show that response congruence can indeed outweigh lexical interference at the lexical level.They argued that this response congruence may be due to the availability of multiple cues while one tries to retrieve the goal concept.Automatic application of the task to the target and a response congruent context stimulus would lead to activation of multiple correct retrieval cues.By contrast, a response incongruent context may activate wrong retrieval cues, which could hamper activation of the goal concept.If response congruence impacts the conceptualisation phase only, no additional assumptions are needed at other levels of processing to account for this facilitation effect (see La Heij, Starreveld, & Kuipers, 2007 for a discussion on this issue).
The current study aimed to test the assumption that response congruence impacts the conceptualisation phase only.If it is indeed confined to the conceptualisation phase, this would lend support for the SLN, or at least to some of the underlying principles.However, if response congruence impacts cognitive processes beyond the conceptualisation phase, the SLN would need adjusting, or perhaps another framework would be more suited to explain it.For example, if the conflict between response alternatives is not resolved until close to articulation, models that place lexical selection at a late stage in word production (e.g., the Response Exclusion Hypothesis, Mahon et al., 2007) may be better equipped to explain response congruence.
To investigate the time-course of response congruence, the experimental design used here was kept mostly the same as Kuipers andLa Heij (2008, 2012), who used superordinate category-level naming and associated function naming of target words that were printed on top of context pictures.Thus, the target-context configuration was opposite to the more commonly used picture-word interference task in which the picture is the target with the context word printed on top.The reason for this reversal of the typical picture-word stimulus compound leads back to Glaser and Dungelhoff (1984), who directly compared the standard picture-word task configuration with the reverse configuration in a superordinate-level naming task.Their results showed that context effects in these tasks were of the same polarity but stronger for word targets, while naming latencies in this condition were longer (by 112 ms).This overall naming time difference may be due to the fact that pictures can be categorized upon perception of category-specific features (e.g., fur), meaning that participants can achieve the picture superordinate category-level naming task without fully identifying the target stimulus.Target words are not sensitive to this problem, which favours the use of word stimuli over pictures when the task is superordinate category-level naming.It should be noted that word superordinate category level naming and word reading are two very different tasks.Word superordinate category level naming is a conceptually driven task, whilst word reading is not.The latter can be achieved by graphene-phoneme conversion and is therefore much faster than picture naming with little sensitivity to context effects (Smith & Magee, 1980).
Within the timeline of speech production as detailed by Indefrey and Levelt (2004) and Indefrey (2011), conceptualisation takes place within the first 200 ms in basic-level picture naming.However, one can assume that more complex naming tasks, like superordinate category level naming, require a longer conceptualisation stage.Given that the latter task results in longer average naming latencies than the former (around 100 ms longer; Glaser & Dungelhoff, 1984), here it will be assumed that conceptualisation in the tasks used here would take place within the first 300 ms.
To obtain an estimate of the time-window of the effect of response congruence, Event-Related Potentials (ERPs) were recorded while participants performed word-naming tasks.ERPs are particularly useful in this respect because they can give insight into cognitive processes with high temporal resolution.Although speech distorts the brain EEG signal, the period before speech onset has been shown to allow for meaningful ERP analysis (up to about 600 ms in picture naming; Christoffels, Firk, & Schiller, 2007;Costa, Strijkers, Martin, & Thierry, 2009;Koester & Schiller, 2008;Porcaro, Medaglia, & Krott, 2015;Python, Fargier, & Laganaro, 2018).This time-range includes the P1, N1, P2, N2, P3, and N400 ERP components.
The most relevant ERP component here is the N2, because it spans the conceptual preparation phase (0-300 ms in the tasks used here; c.f., Indefrey, 2011) and because it is sensitive to semantic context effects in the PWI task as shown by Krott, Medaglia, and Porcaro (2019).These authors found greater N2 negativity for categorically related context words than for unrelated ones in a standard PWI task.Greater N2 negativity has also been observed for incongruent as compared to target-congruent distractor words in picture naming (Piai, Roelofs, & van der Meij, 2012;Xiao, Zhang, Jia, Zhang, & Luo, 2010).
The N2 is best known for its sensitivity to response conflict in a variety of Cognitive Psychology tasks, expressed as a greater negativity at frontal electrode sites when a stimulus elicits multiple competing response options (Nieuwenhuis, Yeung, van den Wildenberg, & Ridderinkhof, 2003;Yeung, Botvinick, & Cohen, 2004).Therefore, the N2 is commonly associated with monitoring/resolving response conflict.In picture or word naming tasks, a response incongruent context would activate an incorrect response alternative alongside the target response.Such a situation in which there are multiple response alternatives requires a conflict monitoring/resolving response alike the go-nogo tasks and many other paradigms in cognitive psychology.Indeed, conflict monitoring is a key aspect of several models of speech production (Gauvin & Harstuiker, 2020;Levelt, 1989;Pickering & Garrod, 2013) and the N2 in speech production is assumed to reflect this (Nozari, Dell, & Schwartz, 2011).Therefore, response incongruent distractors would be expected to induce greater negativity on the N2 than a response congruent context, in line with Piai et al. (2012) and Xiao et al. (2010).
The second ERP component relevant to this study is the N400, which indexes the ease with which target and context stimuli are semantically integrated (Kutas & Federmeier, 2011).The N400 time-window ranges 350-500 ms approximately, which puts it after the conceptualisation phase, into the phase of word form encoding (275 ms up till 600ms after stimulus onset; Indefrey, 2011).It therefore reflects semantic processes that take place after a concept has been selected for production.Given that the context stimuli used here are categorically related or unrelated to the target stimuli, the N400 is expected to be modulated by these conditions with more negative N400 amplitude for unrelated pairs than related ones in both tasks.If response congruence also modulates the N400, this would suggest that response congruence exceeds the time-window of conceptualisation.
The final ERP considered here is the late parietal P3 (also called P3b), which is typically associated with the onset of oddball stimuli (Donchin & Coles, 1988), reflecting the number of neural resources committed to stimulus prioritization (Polich, 2007).In the context of speech production, only few reports mention P3 modulations, perhaps because this component appears close to response onset in basic-level naming tasks.For example, Costa et al. (2009) found that P3 amplitude increased in amplitude with increasing semantic interference, and Von Grebmer zu Wolfsthurn, Pablos, and Schiller (2021) reported a P3 modulation by cognate status and gender congruence in a bilingual picture naming context.In the current study, the P3 was expected to reveal if the effect of response congruence extends beyond the time-windows associated with conceptualisation and lexical selection, potentially up till response selection.However, given that P3 amplitude is generally lower for more difficult conditions with longer response times (Polich, 2007), semantic relatedness and response congruence may both influence P3 amplitude here.Investigation of the relation between these potential modulations and observed behaviour will clarify whether or not the cognitive processes reflected by the ERP modulations are the driving force behind the observed behaviour.
To summarise, this study aims to investigate the time-window of response congruence using two tasks: Naming the superordinate category-level or the associated action/function of target words printed within picture distractors.The overall procedures were kept as similar as possible to Kuipers and La Heij (2008;2012) with the exception that the current tasks were in English and that the participants' EEG was recorded.In the superordinate-level naming task, the target word "car" will be named as "vehicle" and in the function/action naming task as (to) "drive".As shown in these previous studies, responses in the categorisation task are expected to be quicker with response congruent distractors (e.g., a picture of a bus) than with a response incongruent context (a picture of a hammer).Importantly, categorically related distractors that are response congruent in the category level naming task, but not in the function naming task (e.g., a picture of a sailboat with function response (to) "sail"), are expected to induce facilitation in the category level naming task but not in the function naming task.ERPs were recorded to track the influence of context effects during the preparation phase before overt speech.

Participants
Given that previous studies on response congruence in speech production used 18 to 20 participants and observed a large effect size (between 0.71 and 0.84) for this effect (Kuipers & La Heij, 2008, 2012), here it was decided that between 20 and 30 participants should give a large enough effect size to detect any effects of response congruence in ERPs, if present.Twenty-six Undergraduate Psychology Fig. 1.Example stimuli in the different conditions.Target words were printed on pictures that were either semantically related and response congruent in both tasks (C+F+ condition), semantically related, but only response congruent in the superordinate-level naming task (C + F-condition), or unrelated and response incongruent in both tasks (C-F-condition).
students from Bangor University were recruited for this study.They gave written informed consent before the experiment proper and received course credits or £10 for participation.This study was approved by Bangor University's ethics committee.Participants were native English speakers who had normal or corrected-to-normal vision and none reported having neurological problems.The data of 6 participants were excluded from data analysis due to excessive artefacts in the EEG data.The participants included in the analyses consisted of 11 females and 9 males, all right-handed but one ambidextrous, with a mean age of 19.6, SD 0.4, range 17-26 years.

Materials
Twenty-five highly familiar English words (M familiarity = 553 SE = 2; Coltheart, 1981;e.g., target word "CAR" with category naming response "vehicle" and function naming response (to) "drive") were paired with a context picture sharing the word's semantic category (categorically related, C+) and its associated action or function (function related, F+), for example a picture of a bus (C+F+ condition; Fig. 1).The next condition was a categorically related context picture, but its associated action or function would be different than that of the target.For example, a picture of an airplane which has the category naming response "vehicle" but function naming response "to fly" (C+F-condition).The third condition consisted of pictures that were categorically unrelated (C-) and had another associated action/function (F-).For example the picture of a goat with category naming response "animal" and associated action (to) "milk" (C-F-condition; Appendix 1).The C-F-condition was created by re-pairing the context pictures of the C+F+ and C + F-conditions, leading to 50 unrelated stimulus pairs.These were combined with the related pairs to create two stimulus lists of 75 stimuli each.The pictures were coloured images collected from online databases and words were printed within a white text box placed over the picture, to ensure legibility.

Procedure
Participants were seated in front of a 40" LED screen and fitted with a 32-electrode Quick cap (Neuromedicalsupplies.com),which has the electrodes arranged according to the 10-20 convention.Data were recorded at a rate of 1 kHz, referenced to the left mastoid, and band-pass filtered between 0.1 Hz and 200 Hz.Additional electrodes were placed on both temples and above and below one eye to monitor eye movements and blinks.At the start of each experimental block, participants were given a sheet with the target words and the expected responses (either the superordinate category-level word or the associated action), to practise the expected responses.Next, the participants' responses were verified with a practice series, during which participants named all words presented sequentially on a computer screen without a context picture.Errors were verbally corrected by the experimenter.Next, participants started with either the categorisation or the function naming task (counterbalanced across participants), which consisted of the presentation of both stimulus lists (also with a counterbalanced order) in two experimental blocks of approximately 8 min long (150 trials in total per task, 50 trials per condition), separated by a short break.In each trial, a central fixation cross appeared for 800 ms followed by the picture-word compound which was presented for 2 s or until speech onset, which was measured with a voice key connected to a Serial Response Box (SR box; Psychology Software Tools; Fig. 2).The experimenter recorded correct responses, errors and speech disfluencies using the SR box.

Behavioural data
RTs longer than 1500 ms (2.4%) and less than 300 ms (due to mouth clicks; 4.8%) as well as speech disfluencies (1.5%) were excluded from the analysis.Error rates were too low (3.4% in total; Appendix 3) for meaningful analysis, which suggests participants had little difficulty with producing the expected responses.The remaining RT data was analysed with linear mixed effects modelling using the lmerTest package in R (Kuznetsova, Brockhoff, & Christensen, 2017).The experimental conditions were contrast-coded for the analysis.The factor Task included superordinate-level naming, coded 1 vs. function naming, coded − 1. ; For the Relatedness conditions, two factors were created: Relatedness factor 1: C + F+, coded 1 vs. C-F-, coded − 1, and Relatedness factor 2: C + F-, coded 1 vs. C-F-, coded − 1 (Appendix 2).Factors were stepwise removed from the most complex model (random slopes and intercepts for all factors by participants and items) until convergence was achieved (Barr, Levy, Scheepers, & Tily, 2013).Post-hoc tests were performed with the lsmeans package (Lenth, 2016).

EEG data
Continuous EEG data were zero-phase shift, low-pass filtered at 30 Hz, re-referenced to the average of the mastoid electrodes, epoched from − 100 ms to 900 ms relative to stimulus onset, baseline corrected using the prestimulus interval, and artefact rejected with a ± 70 μV threshold.Blinks were mathematically corrected using Neuroscan 4.5 (Compumedics.com).This resulted in 30.1±0.6 accepted trials per condition on average per participant.All ERP modulations were analysed with an ANOVA with the within-subjects factors Relatedness (C+F+ vs. C + F-vs.C-F-) and Task (superordinate-level naming vs. function/action naming).Greenhouse-Geisser corrections are reported where applicable.
The ERP component analysis was informed by visual inspection of the grand average ERP plot in which all experimental conditions are averaged (Kappenman & Luck, 2016) to verify if the ERP components of interest here appeared at the predicted time-window and electrodes.This inspection revealed a frontal N2 between 200 ms and 260 ms which is typical for this component (Yeung et al., 2004, Fig. 3).Therefore, mean N2 amplitude was analysed over electrodes F3, F4, and Fz in this time-window.The N400 was typically maximal from 370 ms to 450 ms at central to central-parietal electrodes (C3, C4, Cz, CP3, CP4, CPz; Kutas & Federmeier, 2011).Finally, the P3 was maximal over the time-window of 460-530 ms over central-parietal electrodes (P3, P4, Pz), which corresponds to the typical time-window and topography of the P3 (Polich, 2007).
A correlation analysis was performed on the statistically significant modulations in behaviour and ERP components to assess which ERP modulation likely drives differences in behaviour.Finally, to gain insight into the timing of potential ERP modulations by response congruence, ms-by-ms paired-samples t-tests (Guthrie & Buchwald, 1991) on the average amplitude of an 11 ms sliding window (− 5 ms and +5 ms for each ms) were conducted on the averages of frontal (F3, Fz, F4), Central (C3, Cz, C4), and Parietal (P3, Pz, P4) electrodes.Given that this method is sensitive to false positives when applied to EEG data (Piai, Dahlstadtt & Maris, 2015), a conservative threshold of 20 consecutive significant tests was adopted (Kuipers & Thierry, 2013).This threshold is more conservative than Costa et al. (2009) who used a threshold of 15 consecutive tests on their picture naming data, because the current data was recorded at a higher sample rate (1 KHz compared to 250 Hz), which increases the autocorrelation (to 0.98 approximately, Kuipers & Thierry, 2013) and therefore also increases the chance that two consecutive tests are statistically significant.This analysis will be used to track the timeline of the task effect in each condition to identify when response congruence appears.Of key importance is the task effect in the C + F-condition because this condition changes from being response congruent in the categorisation tasks to response incongruent in the function naming task.If the task effect would be significant at the same time in more than just the C + F-condition, then this modulation is likely due to aspects of the task rather than due to response congruence.

Behavioural results
Due to non-convergence of the full model (random slopes and intercepts for all factors by participants and items), in the next most complex model, the random slopes for items were removed.The final linear mixed effects model on RTs included the fixed factors Task and Relatedness, as well as their interactions, with random intercepts for items and random intercepts and slopes for Task by participants (Appendix 2).In this model, the factor Task, both Relatedness factors, and the interaction between Task and Relatedness factor 2 were statistically significant (Table 1; Fig. 4).This shows that Superordinate-level naming (M = 1079 ms, SE = 17 ms) was performed slower than Function naming (M = 1023 ms, SE = 20 ms) and that RTs in the C+F+ condition (M = 1014, SE = 16) and the C + F-condition (M = 1037, SE = 16) were both shorter than in the C-F-condition (M = 1084, SE = 16).The interaction of Task with Relatedness factor 2 showed that RTs in the C + F-condition (M = 1056 ms, SE = 17) were faster than in the C-F-condition (M = 1127 ms, SE = 17; p < .0001)for the categorisation task, but not in the function naming task (a difference of 13 ms in the same direction; p = .7).Therefore, these results replicate previous findings that response congruence, when present, induces facilitation compared to conditions that are not response congruent.

N2 analysis
Mean N2 amplitude analysis revealed that there was neither a significant effect of Relatedness (F (1.4,27.5)= 0.63, p = .49)nor one of Task (F (1,19) = 2.56, p = .13).However, the interaction between Relatedness and Task was significant (F (2,38) = 3.99, p = .02;Figs. 5  and 6).Post-hoc t-tests on the task difference in each of the conditions showed that only the C + F-condition differed between tasks, with a more negative N2 response in the function naming task (M = − 1.6 μV, SE = 0.78) than in the superordinate-level naming task (M = 0.1 μV, SE = 0.8, t (19) = 3.4, p = .003;both other ps > .6).This important finding suggests that when a stimulus is response incongruent, it elicits a greater N2 conflict monitoring response than when this stimulus is response congruent.

ERP-RT correlation analyses
The correlation analyses between the significant ERP and RT modulations revealed that the magnitude of the N2 increase in the C + F-condition was negatively correlated with the response congruence effect observed in RTs (R = − 0.46, p = .033),whereas the N400 and P3 amplitude modulations did not correlate with RTs.This shows that a greater N2 conflict monitoring response to response incongruent stimuli in the function naming task is associated with a smaller RT difference between tasks.

Moving window ERP analysis
The moving window analysis revealed that the task effect for the C + F-condition at frontal electrodes (F3, Fz, F4) appeared at 152 ms until 302 ms, then reappeared at 345 ms until 437 ms.Notably, the task effect for the C+F+ condition did not reach significance and the task effect for the C-F-condition was only significant from 349 ms to 400 ms (Fig. 9).
At central electrodes (C3, Cz, C4), the task effect for the C-F-condition just passed the length criterion in two sections: 182 ms-204 ms and 227 ms-248 ms.This effect was more robustly significant from 381 ms to 424 ms.The task effect did not reach significance at central electrodes for the other conditions.
At parietal electrodes (P3, Pz, P4), no task effect survived the length criterion, however, in the C + F-condition the difference between 490 ms and 508 ms was just 2 ms short of the length criterion.

Discussion
This study aimed to establish the time-course of response congruence effects in word production tasks.We measured behavioural and electrophysiological brain responses in a variant of the PWI task in which the word is the target and the picture the context stimulus.This target-context configuration has been shown to be highly effective in eliciting response congruence effects in superordinate level naming and associated function/action naming (Glaser & Dungelhoff, 1984;Glaser & Glaser, 1989;Kuipers et al., 2006;Kuipers & La Heij, 2008, 2012), and our behavioural results replicate these previous reports.Compared to unrelated distractor pictures (C-F-), categorically and function-related pictures (C + F+) induced facilitation in both tasks whereas categorically, but not function-related pictures (C + F-), only induced facilitation in the superordinate-level naming task.Therefore, a context picture only induced facilitation when the task made it response congruent.Given that the C + F-condition did not induce semantic facilitation in the function naming task, facilitation due to spreading of activation within semantic categories appears negligible compared to the response congruence facilitation (see Kuipers & La Heij, 2008 for a discussion of this issue).
The ERP results showed that N2 amplitude in the C + F-condition was more negative when response incongruent in the action naming task than when response congruent in the categorisation task.Importantly, this task effect was absent in the other conditions in which response (in)congruence did not differ between tasks.Given that the N2 is associated with conflict monitoring in typical executive control tasks (Yeung et al., 2004) and speech production (Nozari & Pinet, 2020), this result suggests that the change from response congruent to response incongruent elicits a conflict monitoring response.Further, the correlation analyses revealed that a greater N2 amplitude by response incongruence was associated with relatively longer RTs.Therefore, the conflict monitoring response appears to drive the pattern of results observed in RTs.The interim conclusion of the N2 results is that, because the N2 falls within the conceptualisation stage (Indefrey, 2011), response congruence impacts this processing stage.Evaluation of the N400 and P3 modulations will clarify if response congruence impacts later processing stages as well.
The N400, which time-window corresponds to the word form encoding stage, was only modulated by the semantic relationship between target and context.On average, semantically unrelated picture-word pairs elicited more negative N400 amplitude than categorically related ones in each task.This is in line with the general interpretation of the N400 that processing semantically related stimuli requires less neural effort than processing unrelated ones (Kutas & Federmeier, 2011).However, the C + F-condition did not differ in N400 amplitude from the C-F-condition, which suggests the semantic distance between the C + F-picture-word pairs may be greater than that of the C+F+ pairs.This finding is somewhat unexpected, but it replicates the findings of Rose, Aristei, Melinger, & Rahman (2019) who explicitly manipulated semantic distance of their picture-word stimuli in a standard version of the PWI task.
Although the exact cause of this increased N400 negativity of the C + F-condition may be difficult to establish, it could be due to a carry-over effect from the greater negativity of this condition at the N2, which generally corresponds to Rose, Aristei, Melinger, & Rahman (2019) explanation of this pattern of results.Importantly, there was no effect of response congruence on the N400, suggesting this effect does not extend into the time-range of word form encoding.The semantic effect in N400 amplitude did not correlate with the semantic effect in RTs which suggests that semantic integration of the picture-word stimuli proceeds independently of the ongoing word naming task.Thus, N400 amplitude measured in speech production tasks does not necessarily reflect or influence processes related to speech production.However, in a task in which decisions need to be made about the semantic relation between stimuli, N400 amplitude reflects a key component of the task, and its amplitude is in this case likely to be associated with RTs (as in e.g., Kuipers, Jones, & Thierry, 2018; see also Kutas & Federmeier, 2011).
Modulation of P3 amplitude showed a pattern similar to that of RTs.Again, the interaction was driven by the C + F-condition which differed from the C-F-condition in the superordinate-level naming task, but not in the function naming task.These P3 modulations did not correlate with the effects observed in RTs, which shows they are not driving the pattern of results in RTs.Instead, it is likely that the P3 amplitude modulation reflects response initiation processes with greatest amplitudes for the fastest responses as typical in oddball tasks (Polich, 2007) and picture naming tasks (Costa et al., 2009;Von Grebmer zu Wolfsthurn, Pablos, and Schiller (2021)).In such a scenario, the size of the N2 modulation determines RTs for a large part whereas P3 amplitude reflects response initiation.This late modulation could potentially reflect another conflict monitoring step, which will be evaluated next.
The moving average analysis on the task effect in each condition revealed an early frontal modulation starting at 152 ms after stimulus onset, which falls within the time-frame assumed for conceptualisation (Indefrey, 2011).Given that this task difference was only observed in the C + F-condition, this strongly suggests response congruence starts to impact speech preparation early during conceptualisation.This influence extends to 302 ms which is not surprising given that conceptualisation of the superordinate level or the associated function is likely much longer than conceptualisation of the basic-level representation.As mentioned in the introduction, basic-level naming usually takes about 600 ms whereas RTs in the tasks used here were around 1s.Some of this increased naming time may be due to longer lexical selection processes driven by word frequency and age of acquisition (Pérez, 2007), but there is no reason to assume that phonological processes would take much longer in the tasks used here.Therefore, most of the increased naming times (relative to basic-level naming) can be attributed to the conceptualisation phase.
In addition to the early impact on conceptualisation, the task effect in the C + F-condition reappeared at frontal electrode sites at a later interval (from 345 ms until 437 ms).Some may argue that this shows that response congruence impacted processing in this time window.However, besides the task effect in the C + F-condition, there was also a task effect in the C-F-condition at the same time, albeit shorter-lived.Therefore, this effect is unlikely to be the result of response congruence.It is also unlikely to be a general task effect because it was absent in the C+F+ condition.Therefore, further investigation of this effect is needed to identify its origin.
Finally, there was a marginally significant task effect at parietal electrodes at a late interval (490 ms-508 ms).This interval falls within the P3 time-range and could suggest response congruence also impacts late response selection mechanisms.The P3 differed between tasks for the C + F-condition and the effect in the moving average analysis appears to reflect that.It is possible this task difference reflects a secondary conflict monitoring step, however, it only occurs at parietal electrodes whereas activation of the ACC is associated with frontal electrodes.Instead, it is possible that this effect reflects the moment that the response incongruence conflict is resolved, which is explicitly included in the Hierarchical Conflict Model (Gauvin & Harstuiker, 2020).
Importantly, no additional frontal modulations were observed that could solely be attributed to response congruence.However, it is conceivable that, despite response congruence having its main impact during the conceptualisation phase, response incongruent concepts activate a potential response alternative that remains active at least until response initiation.Such a scenario would be in line with cascading (Dell, 1986) or interactive (Baese-Berk and Goldrick, 2009;De Zubicaray, Wilson, McMahon, & Muthiah, 2001) speech production theories, and/or those that locate selection at the response level (Mahon et al., 2007).Active context representations may be disregarded for production in the conceptualisation phase, but some may argue that this does not imply that they are completely inhibited at each level of representation.If these context representations remain active and continue to be processed, conflict monitoring and resolve may be needed at several or all processing stages (Nozari et al., 2011;Gauvin & Harstuiker, 2020).Although the current data does not speak to this issue, different stimuli and/or tasks may elicit a later conflict monitoring event, when, for example, the conflict is phonological in nature.Such late, or perhaps multiple conflict monitoring events, as suggested by Gauvin and Harstuiker (2020), could potentially be characterised by late or multiple instances of frontal negativity.A problem with such an interpretation is that only the N2 and Error Related Negativity (ERN) are associated with ACC activation (Yeung et al., 2004), which limits the time-windows during which ACC activation could be identified in ERPs.Therefore, such studies would also need to monitor What are the implications of these findings for the standard picture-word interference task in which related and unrelated context conditions are both response incongruent?A comparison of related and unrelated conditions with an identity condition would provide insight into whether the former conditions elicit a conflict monitoring response.Behavioural data show that, compared to incongruent words, a picture's name written on a picture reduces naming times (Glaser & Dungelhoff, 1984;Piai et al., 2012), and induces less frontal negativity within the N2 time range (Piai et al., 2012).Thus, non-identical distractors in standard picture naming are response incongruent and appear to trigger an N2 conflict monitoring response.In the standard PWI task, this response incongruence will not be expressed in naming times, because related and unrelated picture names are equally response incongruent.Any context effects on naming times will be driven by the relative strength of any semantic facilitation, lexical interference and/or phonological facilitation, as characterised by the SLN (Rahman & Melinger, 2019).
Then, although the conflict monitoring interpretation of the N2 leans on a substantial number of studies, the current dataset does not show direct involvement of the ACC when context stimuli are response incongruent.Neuroimaging of the ACC in this context may be needed to rule out potential other interpretations of this effect.Clearly, models that include conflict monitoring in speech production (and perception; Nozari et al., 2011;Gauvin & Harstuiker, 2020) are ideally suited to account for the interpretation of response incongruence as a conflict monitoring event.However, the current dataset does not allow adjudication between the SLN that is well-suited to explain response congruence and interactive models that include conflict monitoring and resolution, because neither can fully account for response congruence.Future studies may address other types of target-distractor relationships (e.g., phonologically or associatively ones) to do so.
To conclude, the results reported here show that response congruence has an early impact on speech production that aligns with the conceptualisation phase and that a response incongruent context appears to elicit a conflict-monitoring response during this phase.The data is not conclusive about whether response incongruent representations remain active during later processing stages, potentially leading to further conflict monitoring, or that later influence of response congruence reflects conflict resolve.Models of speech production may need to be adjusted to fully explain the impact of response congruence and incongruence on word production.

Fig. 4 .
Fig. 4. Reaction times in the different experimental conditions.Violin plots of RTs in the C + F+, C + F-, and C-F-conditions in the two tasks.Means are depicted by the solid horizontal black lines and the distribution of individual responses by the outline of the violin.

Table 1
Fixed effects estimates of the linear mixed effects model of RTs.