Are control processes domain-general? A replication of ‘To adapt or not to adapt? The question of domain-general cognitive control’ (Kan et al. 2013)

Conflict and conflict adaptation are well-studied phenomena in experimental psychology. Standard tasks investigating causes and outcomes of conflict during information processing include the Stroop, the Flanker and the Simon task. Interestingly, recent research efforts have moved toward investigating whether conflict in one task domain influences information processing in another task domain, typically referred to as cross-task conflict adaptation. These transfer effects are of central importance for theories about our cognitive architecture, as they are interpreted as pointing towards domain-general cognitive mechanisms. Given the importance of these cross-task transfer effects, the current paper targets at replicating one of the key findings. Specifically, Kan et al. (Kan et al. 2013 Cognition 129, 637–651) showed that reading syntactically ambiguous sentences result in processing adjustments in subsequent Stroop trials. This result is in line with the idea that conflict monitoring works in a domain overarching manner. The present paper presents two replication studies: (i) exact replication: identical sentence-reading task intermixed with stimulus-based Stroop task and (ii) conceptual replication: identical sentence-reading task intermixed with response-based Stroop task. Power calculations were based on the original paper. Both experiments were pre-registered. Despite the experiments being closely designed according to the original study, there was no evidence supporting the hypothesis regarding cross-domain conflict adaptation.


Comments to the Author(s)
The author proposes a study to directly and conceptually replicate the Experiment 1 by Kan et al. (2013). In my opinion this is a sound replication proposal and I am impressed by the details the author provides especially with regard to the online testing procedure.
I have only a few minor suggestions you might want to consider: • In my opinion there are details about the statistical model, the factors and levels and the inference criteria missing in the data analysis plan but also in the pre-registration. • For me the theoretical motivation for the conceptual replication (Experiment 2) is not entirely clear. • Has the congruent/incongruent procedure been tested online? I am a huge fan of online testing but from my experience I would advise to run a short pilot to see if you can find the effect in the sentence reading task. • The original study does not have a big sample size, it might be possible that the original effect size is overestimated due to a lack of power in the original study. Maybe this is something to keep in the back of your mind? • In my experience you might encounter some technical difficulties while data collection as well. What do you plan on doing with data sets where single trials are missing (e.g. due to instable internet connection)? What will you do with participants that start the experiment again because they might have accidentally closed the window? • Maybe consider using Prolific instead of Mturk? Mturk is not made specifically for academic research whereas Prolific is (Palan & Schitter, 2018;https://doi.org/10.1016/j.jbef.2017. Of course, this is not at all required, I just have very good experience with Prolific and the way they try to ensure data quality. • De Leuuw (2015) himself suggests using Chrome as this is best tested in jsPsych, it also seems that Chrome (in combination with Windows) "adds the least random noise to RT measurement" (see Pronk et al. 2020; https://doi.org/10.3758/s13428-019-01321-2). • The combination of data sets as described in the pre-registration might be interesting here as well. • In the spirit of open science and to make it easier to follow your analyses pipeline you could also share the code and the analyses scripts together with the pre-registration or at the later stage together with the data. And some very minor points: -On p. 7 in line 20/21 it has to be Braem instead of Bream. -On p. 10 line 22: do you mean "right" finger or ring finger? Because for left-handed people it is the left finger?
The author proposes to replicate Experiment 1 from Kan et al., as well as conducting a conceptual replication using s small modification. Experiment 1 of Kan et al. had participants perform intermixed trials of a sentence comprehension and Stroop task. The observation of a two-way interaction between preceding (sentence) congruency and Stroop task congruency was interpreted as evidence for domain general conflict adaptation -the Stroop effect was smaller when it followed an ambiguous sentence. The original Experiment 1 used Stroop stimuli where the incongruent words ('brown', 'orange', 'red') did not match the font colours to be named ('blue', 'green', 'yellow'). The conceptual replication proposed here (Experiment 2) modifies this so that they do overlap.
Review I found the proposal to be generally clear and well-written. I thought that the primary and secondary criteria were met, or nearly met with a potential small elaboration (point 1). The proposed replication appears to match the details of the original experiment, with the perhaps notable exception that it will be conducted with an online sample. I was satisfied with the author's justification for why this does not compromise the ability of the proposed study to detect the effect of interest. The author proposes numerous secondary analyses which I thought were sufficient quality control checks. I have a couple of minor comments on the logic of the conceptual replication study, but they are probably not critical.
1) I thought that there could be a more explicit statement about the critical hypothesis test(s), as it may not be obvious to a reader not familiar with Kan et al. From the power analysis, I assume that the main test is the two-way interaction between previous-trial and current-trial congruency. Some elaboration may be warranted on whether there is an expected direction/pattern of condition means. Further, the author proposes to analyse RTs, error rates, and arcsin transformed error rates (as in the original). With multiple outcomes, there is potential flexibility in determining whether the original findings replicated. It might be worth commenting on whether significant effects in one or all outcomes will be considered as sufficient evidence.
2) The conceptual replication seems sensible on the surface, though it did raise a couple of questions in my mind that the author might want to consider.
The reasoning given is that Experiment 2 "…should result in larger conflict… and potentially larger cross-task transfer effects". One interpretation of this is that conflict needs to be sufficiently large for transfer to occur. Another is that transfer could occur in both experiments, but the effect size would be larger in Experiment 2. If it is the latter, what is the benefit of the additional Experiment over increasing the power of Experiment 1 for a smaller effect size? I don't object to the planned sample size, and it is already substantially larger than the original study. However, my reading is that it is based on the original effect size, and prior replication efforts have indicated that it is common for replications to produce smaller effect sizes than the original studies (e.g. Open Science Collaboration, 2015).
3) Kan et al. make the distinction between representational conflict and response conflict (Egner, 2008), and focus on representational conflict throughout their experiments. In their Stroop tasks, and in Experiment 1 here, the incongruent words were not part of the eligible response set. Experiment 2 here aims to elicit a larger conflict effect by adding response conflict (the incongruent words are part of the eligible response set). The author may want to comment on In short, I would encourage the author to use a different design to test to effect. My lab is open for further discussion about the potential empirical directions.
Below, I list a few more observations about the proposed design. Is the proposed Exp1 a direct replication? I think there are a few deviations from the original study. I do not mind them saying that it is a direct replication but it should be apparent that there are deviations. Online vs in lab "The mapping of keys (G, H, J) is randomly assigned to a response colour (blue, green, yellow) for each individual participant,..." The original study doesn't randomize it. Page 10 line 59 (footnote): I believe this is an important deviation from the original study, however, I am not sure I understand how the stimulus set looked like. There were 42 non-filler sentences in the original study. The footnote suggests that participants did not see all the 42 sentences but just one pseudo-randomized set of 21 sentences. In the original study, the participants saw all the 42 sentences. As the materials for Experiment 1 of Kan et al. 2013 are available the specific order of the sentences in the original study could be applied in the replication as well. Page 11 line 26: I am not sure that I understand the reasoning behind the exclusion criteria on the comprehension probe task. The author mentions Kan excluding only one participant at a chance level (50%) and that all the other participants score above 70%. The author then states that the replication will follow Kan et al. (2013) and participants with 6 or fewer correct responses will be dropped. This indicates a 60% (given that there were 10 comprehension probe sentences) cutoff level which is neither 50% nor 70%. The whole sentence preprocessing part is missing. This is not a problem in itself but should be noted as a deviation and justified. Page 10 line 52: The test part consists of 197 trials. In Kan they used 60 cong stroop + 60 incong stroop + 21 cong and 21 incong sentences which is 162. If the filler sentences are included that is 162 + 29 = 191.
Page 11 line 59 (footnote): The stepwise reduction and testing for the presence of the Stroop effect are problematic if NHST is used to test the presence of the effect. Continuous testing can modify the alpha level hence the results. General comments Page 10 line 55: The started sentence is not finished. Page 11 line 15: The footnote mark could be moved to the sentence regarding the RT outlier exclusions. Page 7 line 32: I am not sure that I am convinced that Exp 2 is needed based on this one-sentence explanation. Page 7 line 55: What was the exact effect size used for the sample size determination?
There is nothing about the analysis procedure in the paper. Nor about the inferences that the authors will make based on the results. Again, a lot of questions remain unanswered.
The Editors assigned to your Stage 1 Replication submission ("Are control processes domaingeneral? A replication of "To adapt or not to adapt? The question of domain-general cognitive control" (Kan et al., 2013)") have now received comments from reviewers. We would like you to revise your paper in accordance with the referee and editors suggestions which can be found below (not including confidential reports to the Editor). Please note this decision does not guarantee eventual acceptance.
Please submit a copy of your revised paper within three weeks (i.e. by the 01-Jan-2021). If deemed necessary by the Editors, your manuscript will be sent back to one or more of the original reviewers for assessment. If the original reviewers are not available we may invite new reviewers.
To revise your manuscript, log into http://mc.manuscriptcentral.com/rsos and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision. Revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you must respond to the comments made by the referees and upload a file "Response to Referees" in the "File Upload" step. Please use this to document how you have responded to the comments, and the adjustments you have made. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response.
Once again, thank you for submitting your manuscript to Royal Society Open Science and we look forward to receiving your revision. If you have any questions at all, please do not hesitate to get in touch. Full author guidelines may be found at https://royalsocietypublishing.org/rsos/replication-studies#AuthorsGuidance. Three specialist reviewers have now assessed the Stage 1 manuscript. Reviewers 1 and 2 are broadly positive, judging that the Stage 1 primary criteria are largely met and recommend inprinciple acceptance (IPA) following minor revision. Reviewer 3, however, judges that neither criteria are met and recommends rejection and a re-evaluation of the project.
This was an unusual review process because, by coincidence, RSOS is currently assessing a Replication submission of the same original study, co-authored by Reviewer 3 (signed by Balazs Aczel). The Aczel paper is currently under Stage 2 review following the completion of the work, and so I felt it would be sensible to invite the main author of that replication to assess your replication proposal. The reviewer has kindly made their Stage 2 manuscript available as a preprint (which was also flagged by Reviewer 2), and as you will see from his assessment, based upon on the results of their (larger) replication attempt, the reviewer believes it is futile to reattempt it here.
From an editorial standpoint, whether the prior replication attempt succeeded or failed (or even whether it happened at all) is not relevant to the accept/reject decision regarding the current Stage 1 submission. The submissions are in that sense independent. However, I feel that Reviewer 3's assessment is particularly valuable in this case because knowledge of the prior replication outcomes may lead you to reassess whether your proposed replication is the best use of your resources. In addition, Reviewer 3 is also concerned about deviations from the original study methodology (primary criterion #1), and the validity/robustness of the proposed replication attempt (primary criterion #2). As the reviewer notes, many key details are missing from the manuscript and need to be clarified.
Provided all concerns that related to the primary criteria are addressed then the current Stage 1 submission can achieve IPA following revision (again, regardless of the prior replicatio study). However, in light of the outcomes of this prior replication, you may decide to take Reviewer 3's advice and re-evaluate the current approach, perhaps even working with Reviewer 3 (which is possible --in the event that authors end up collaborating with a reviewer as part of a Registered Report or Replication submission then that reviewer is simply removed from the review process moving forward). I appreciate that this is an unusual and somewhat complicated peer review process and remain open to informal discussion with the authors concering specific scenarios moving forward (feel free to email me directly at chambersc1@cardiff.ac.uk or via the RSOS email address).
Reviewer Comments to Author: Reviewer: 1 Comments to the Author(s) The author proposes a study to directly and conceptually replicate the Experiment 1 by Kan et al. (2013). In my opinion this is a sound replication proposal and I am impressed by the details the author provides especially with regard to the online testing procedure.
I have only a few minor suggestions you might want to consider: • In my opinion there are details about the statistical model, the factors and levels and the inference criteria missing in the data analysis plan but also in the pre-registration. • For me the theoretical motivation for the conceptual replication (Experiment 2) is not entirely clear.
• Has the congruent/incongruent procedure been tested online? I am a huge fan of online testing but from my experience I would advise to run a short pilot to see if you can find the effect in the sentence reading task. • The original study does not have a big sample size, it might be possible that the original effect size is overestimated due to a lack of power in the original study. Maybe this is something to keep in the back of your mind? • In my experience you might encounter some technical difficulties while data collection as well. What do you plan on doing with data sets where single trials are missing (e.g. due to instable internet connection)? What will you do with participants that start the experiment again because they might have accidentally closed the window? • Maybe consider using Prolific instead of Mturk? Mturk is not made specifically for academic research whereas Prolific is (Palan & Schitter, 2018;https://doi.org/10.1016/j.jbef.2017. Of course, this is not at all required, I just have very good experience with Prolific and the way they try to ensure data quality. • De Leuuw (2015) himself suggests using Chrome as this is best tested in jsPsych, it also seems that Chrome (in combination with Windows) "adds the least random noise to RT measurement" (see Pronk et al. 2020; https://doi.org/10.3758/s13428-019-01321-2). • The combination of data sets as described in the pre-registration might be interesting here as well.
• In the spirit of open science and to make it easier to follow your analyses pipeline you could also share the code and the analyses scripts together with the pre-registration or at the later stage together with the data. And some very minor points: -On p. 7 in line 20/21 it has to be Braem instead of Bream. -On p. 10 line 22: do you mean "right" finger or ring finger? Because for left-handed people it is the left finger?
Reviewer: 2 Comments to the Author(s) Summary The manuscript is a stage 1 replication study of the paper: The author proposes to replicate Experiment 1 from Kan et al., as well as conducting a conceptual replication using s small modification. Experiment 1 of Kan et al. had participants perform intermixed trials of a sentence comprehension and Stroop task. The observation of a two-way interaction between preceding (sentence) congruency and Stroop task congruency was interpreted as evidence for domain general conflict adaptation -the Stroop effect was smaller when it followed an ambiguous sentence. The original Experiment 1 used Stroop stimuli where the incongruent words ('brown', 'orange', 'red') did not match the font colours to be named ('blue', 'green', 'yellow'). The conceptual replication proposed here (Experiment 2) modifies this so that they do overlap.
Review I found the proposal to be generally clear and well-written. I thought that the primary and secondary criteria were met, or nearly met with a potential small elaboration (point 1). The proposed replication appears to match the details of the original experiment, with the perhaps notable exception that it will be conducted with an online sample. I was satisfied with the author's justification for why this does not compromise the ability of the proposed study to detect the effect of interest. The author proposes numerous secondary analyses which I thought were sufficient quality control checks. I have a couple of minor comments on the logic of the conceptual replication study, but they are probably not critical.
1) I thought that there could be a more explicit statement about the critical hypothesis test(s), as it may not be obvious to a reader not familiar with Kan et al. From the power analysis, I assume that the main test is the two-way interaction between previous-trial and current-trial congruency. Some elaboration may be warranted on whether there is an expected direction/pattern of condition means. Further, the author proposes to analyse RTs, error rates, and arcsin transformed error rates (as in the original). With multiple outcomes, there is potential flexibility in determining whether the original findings replicated. It might be worth commenting on whether significant effects in one or all outcomes will be considered as sufficient evidence.
2) The conceptual replication seems sensible on the surface, though it did raise a couple of questions in my mind that the author might want to consider.
The reasoning given is that Experiment 2 "…should result in larger conflict… and potentially larger cross-task transfer effects". One interpretation of this is that conflict needs to be sufficiently large for transfer to occur. Another is that transfer could occur in both experiments, but the effect size would be larger in Experiment 2. If it is the latter, what is the benefit of the additional Experiment over increasing the power of Experiment 1 for a smaller effect size? I don't object to the planned sample size, and it is already substantially larger than the original study. However, my reading is that it is based on the original effect size, and prior replication efforts have indicated that it is common for replications to produce smaller effect sizes than the original studies (e.g. Open Science Collaboration, 2015).
3) Kan et al. make the distinction between representational conflict and response conflict (Egner, 2008), and focus on representational conflict throughout their experiments. In their Stroop tasks, and in Experiment 1 here, the incongruent words were not part of the eligible response set. Experiment 2 here aims to elicit a larger conflict effect by adding response conflict (the incongruent words are part of the eligible response set). The author may want to comment on whether response conflict should also be expected to contribute to the transfer effect. Naively, I have no reason to expect that it should not, but the prediction doesn't necessarily follow from the original study. Reviewer: 3 Comments to the Author(s) Before going into details of the review, I have to mention that I am in a special situation as I have been requested to be a reviewer of this submission while I have a Stage 2 submission under review at RSOS. In fact, my team has replicated the very same study, Kan et al., 2013. An advantage of the registered report format that we can give advice to the submitters before too much investment in the project has been done. Saying that I would discourage the author to conduct the replication, or in the proposed form for the following reasons: We replicated Exp 1 from Kan et al. in three countries by independent labs with greater sample size. Compared to the proposed work, we very closely followed the original protocol and conducted the experiments in the lab and not online (some other deviations of the proposal are discussed below).
Our results were rather discouraging as you can read it in our preprint: https://psyarxiv.com/5k8rq Besides all of this, even the original study of Kan et al. (2013) raised doubts that Exp. 1 is a good test of the theory, that is why they (and we) conducted Exp. 2 and 3. but the submission doesn't propose to replicate them. Nevertheless, our results were similarly pessimistic about the presence of congruency sequence effect, making us believe that the cross-task congruency sequence effect is either nonexistent or the design of Kan et al. is not a good test of the effect.
In short, I would encourage the author to use a different design to test to effect. My lab is open for further discussion about the potential empirical directions.
Below, I list a few more observations about the proposed design. Is the proposed Exp1 a direct replication? I think there are a few deviations from the original study. I do not mind them saying that it is a direct replication but it should be apparent that there are deviations. Online vs in lab "The mapping of keys (G, H, J) is randomly assigned to a response colour (blue, green, yellow) for each individual participant,..." The original study doesn't randomize it. Page 10 line 59 (footnote): I believe this is an important deviation from the original study, however, I am not sure I understand how the stimulus set looked like. There were 42 non-filler sentences in the original study. The footnote suggests that participants did not see all the 42 sentences but just one pseudo-randomized set of 21 sentences. In the original study, the participants saw all the 42 sentences. As the materials for Experiment 1 of Kan et al. 2013 are available the specific order of the sentences in the original study could be applied in the replication as well. Page 11 line 26: I am not sure that I understand the reasoning behind the exclusion criteria on the comprehension probe task. The author mentions Kan excluding only one participant at a chance level (50%) and that all the other participants score above 70%. The author then states that the replication will follow Kan et al. (2013) and participants with 6 or fewer correct responses will be dropped. This indicates a 60% (given that there were 10 comprehension probe sentences) cutoff level which is neither 50% nor 70%. The whole sentence preprocessing part is missing. This is not a problem in itself but should be noted as a deviation and justified. Page 10 line 52: The test part consists of 197 trials. In Kan they used 60 cong stroop + 60 incong stroop + 21 cong and 21 incong sentences which is 162. If the filler sentences are included that is 162 + 29 = 191.
Page 11 line 59 (footnote): The stepwise reduction and testing for the presence of the Stroop effect are problematic if NHST is used to test the presence of the effect. Continuous testing can modify the alpha level hence the results. General comments Page 10 line 55: The started sentence is not finished. Page 11 line 15: The footnote mark could be moved to the sentence regarding the RT outlier exclusions. Page 7 line 32: I am not sure that I am convinced that Exp 2 is needed based on this one-sentence explanation. Page 7 line 55: What was the exact effect size used for the sample size determination?
There is nothing about the analysis procedure in the paper. Nor about the inferences that the authors will make based on the results. Again, a lot of questions remain unanswered.

Do you have any ethical concerns with this paper? No
Have you any concerns about statistical analyses in this paper? No

Recommendation? Accept in principle
Comments to the Author(s) Thank you very much. All my comments have been addressed sufficiently.

Comments to the Author(s)
The revised manuscript addresses my previous comments. I have one note on a rationale the author gives in their response, though this is not used in the manuscript so it may not warrant further changes.
My previous comment 1 suggested that the author could consider how they would interpret potentially different effects in reaction times and error rates. The manuscript now states that they focus on reaction times, as they are more commonly used in the literature. However their response to my comment also notes: "Typically, in human information processing models the measures of reaction times and error rates are often seen as measuring the same underlying cognitive process (e.g. Draheim, Hicks, & Engle, 2016)." I'm not sure this reference supports focusing on reaction times. Draheim et al. propose using a composite measure of RT and accuracy, on the basis that the latency costs typically used in taskswitching are contaminated by speed-accuracy trade-offs. The way in which multiple processes contribute to patterns of RT and error effects is also something I have been interested in in my own work using evidence accumulation models . For example, in models like the drift-diffusion model, an increase in the boundary separation parameter can lead to an increase in reaction time effects and a decrease (or no observable change) in error rates. This parameter has previously been implicated in conflict adaptation in the form of post-error slowing (Dutilh et al., 2011). Though sequential congruency effects do not follow the same pattern, the point is that basing an interpretation on either reaction times or error rates may overlook theoretically meaningful combinations of the two.

Comments to the Author(s)
In this second round of reviews, I found a lot of improvement in the submission, especially in the analysis plan section. My big picture view is that I still think that if our own study did not replicate the evidence that the original paper showed for Exp1 then a new study would need more and not fewer participants to further explore this question. Since the present proposal is not planning to use Bayesian analysis, the expected non-significant results won't allow for any assessment of the original hypothesis. As a registration, I still miss the explicit links between result patterns and interpretations. Below, I explain these and a few other comments.
In their response, the author stated that "the data has already been collected". I might have got misled on that by the manuscript future tense wording, e.g., "Therefore 100 participants will be collected to achieve sufficient power of .9, given that some participants will need to be excluded.." I find it still confusing.
From our 152 participants, we lost 20 due to exclusions. Online data-collection might be more noisy. I would expect even more exclusions. Now that the author knows that with a greater sample we could not fully replicate the effect, I wonder if the power analysis would need to be recalculated with our outputs. I appreciate that an analysis part is added to this registered report. I think that is a crucial part. We can see that the author plans to conduct the main analyses with and without exclusions. It would be important to add how the author would interpret if the results are not in line, which analysis would be the basis of the conclusions.
I cannot seem to find how the sentences will be divided into sentence regions. That was a tough task for us. It would be important to have it worked out before the analysis.
The region-based analysis works with reading time predictions based on the person's reading times. It is crucial to state whether the prediction will be calculated with the inclusion of all the readings, or just the congruent sentences or the filler sentences as well.
I cannot see what pattern of results of the sentence region-based analysis will lead to what conclusions. Again, something crucial for a registration.
It would be great to add 'outcome-neutral analyses", for example to see if there is Stroop effect at all. Without that, the rest of the analyses are not too interesting.
Two experiments described but we don't know how the author would interpret the results if they contradict each other. What pattern of results would replicate the original findings and what would contradict the original hypothesis?
"was presented for xxx ms" is a misleading wording for online experiments. The truth would be something like "We used the xxx method for requesting that the computer present the stimulus for xxx ms" For an explanation, see https://twitter.com/ceptional/status/1296686833305153536

Balazs Aczel
Decision letter (RSOS-201814.R1) We hope you are keeping well at this difficult and unusual time. We continue to value your support of the journal in these challenging circumstances. If Royal Society Open Science can assist you at all, please don't hesitate to let us know at the email address below.

Dear Dr Dudschig
On behalf of the Editors, I am pleased to inform you that your Manuscript RSOS-201814.R1 entitled "Are control processes domain-general? A replication of "To adapt or not to adapt? The question of domain-general cognitive control" (Kan et al., 2013)" deemed suitable for in-principle acceptance in Royal Society Open Science subject to minor revision in accordance with the referee and editor suggestions. Please find their comments at the end of this email.
The reviewers and handling editors have recommended publication, but also suggest some minor revisions to your manuscript. Therefore, I invite you to respond to the comments and revise your manuscript.
Please you submit the revised version of your manuscript within 7 days (i.e. by the 23-Feb-2021). If you do not think you will be able to meet this date please let me know immediately.
To revise your manuscript, log into https://mc.manuscriptcentral.com/rsos and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions". Under "Actions," click on "Create a Revision." You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you will be able to respond to the comments made by the referees and upload a file "Response to Referees" in "Section 6 -File Upload". You can use this to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the referees.
Full author guidelines can be found here https://royalsocietypublishing.org/rsos/replication-studies#AuthorsGuidance Once again, thank you for submitting your manuscript to Royal Society Open Science and I look forward to receiving your revision. If you have any questions at all, please do not hesitate to get in touch. The Stage 1 manuscript was returned to three reviewers who assessed the original submission. The reviews are broadly positive and we are now closer to IPA. Reviewer 1 is now satisfied but there remain some final issues to address concerning the consideration of RTs and error rates (Reviewer 2), and clarity and precision of the methods and prospective interpretration (Reviewer 3). Please respond carefully to these points and I will assess the next revision at desk before issuing a final Stage 1 decision.
Reviewer comments to Author: Reviewer: 1 Comments to the Author(s) Thank you very much. All my comments have been addressed sufficiently.
Reviewer: 2 Comments to the Author(s) The revised manuscript addresses my previous comments. I have one note on a rationale the author gives in their response, though this is not used in the manuscript so it may not warrant further changes.
My previous comment 1 suggested that the author could consider how they would interpret potentially different effects in reaction times and error rates. The manuscript now states that they focus on reaction times, as they are more commonly used in the literature. However their response to my comment also notes: "Typically, in human information processing models the measures of reaction times and error rates are often seen as measuring the same underlying cognitive process (e.g. Draheim, Hicks, & Engle, 2016)." I'm not sure this reference supports focusing on reaction times. Draheim et al. propose using a composite measure of RT and accuracy, on the basis that the latency costs typically used in taskswitching are contaminated by speed-accuracy trade-offs. The way in which multiple processes contribute to patterns of RT and error effects is also something I have been interested in in my own work using evidence accumulation models . For example, in models like the drift-diffusion model, an increase in the boundary separation parameter can lead to an increase in reaction time effects and a decrease (or no observable change) in error rates. This parameter has previously been implicated in conflict adaptation in the form of post-error slowing (Dutilh et al., 2011). Though sequential congruency effects do not follow the same pattern, the point is that basing an interpretation on either reaction times or error rates may overlook theoretically meaningful combinations of the two. Reviewer: 3 Comments to the Author(s) In this second round of reviews, I found a lot of improvement in the submission, especially in the analysis plan section. My big picture view is that I still think that if our own study did not replicate the evidence that the original paper showed for Exp1 then a new study would need more and not fewer participants to further explore this question. Since the present proposal is not planning to use Bayesian analysis, the expected non-significant results won't allow for any assessment of the original hypothesis. As a registration, I still miss the explicit links between result patterns and interpretations. Below, I explain these and a few other comments.
In their response, the author stated that "the data has already been collected". I might have got misled on that by the manuscript future tense wording, e.g., "Therefore 100 participants will be collected to achieve sufficient power of .9, given that some participants will need to be excluded.." I find it still confusing.
From our 152 participants, we lost 20 due to exclusions. Online data-collection might be more noisy. I would expect even more exclusions. Now that the author knows that with a greater sample we could not fully replicate the effect, I wonder if the power analysis would need to be recalculated with our outputs.
I appreciate that an analysis part is added to this registered report. I think that is a crucial part. We can see that the author plans to conduct the main analyses with and without exclusions. It would be important to add how the author would interpret if the results are not in line, which analysis would be the basis of the conclusions.
I cannot seem to find how the sentences will be divided into sentence regions. That was a tough task for us. It would be important to have it worked out before the analysis.
The region-based analysis works with reading time predictions based on the person's reading times. It is crucial to state whether the prediction will be calculated with the inclusion of all the readings, or just the congruent sentences or the filler sentences as well.
I cannot see what pattern of results of the sentence region-based analysis will lead to what conclusions. Again, something crucial for a registration.
It would be great to add 'outcome-neutral analyses", for example to see if there is Stroop effect at all. Without that, the rest of the analyses are not too interesting.
Two experiments described but we don't know how the author would interpret the results if they contradict each other. What pattern of results would replicate the original findings and what would contradict the original hypothesis?
"was presented for xxx ms" is a misleading wording for online experiments. The truth would be something like "We used the xxx method for requesting that the computer present the stimulus for xxx ms" For an explanation, see https://twitter.com/ceptional/status/1296686833305153536

Decision letter (RSOS-210550.R0)
We hope you are keeping well at this difficult and unusual time. We continue to value your support of the journal in these challenging circumstances. If Royal Society Open Science can assist you at all, please don't hesitate to let us know at the email address below.

Dear Dr Dudschig
On behalf of the Editor, I am pleased to inform you that your Manuscript RSOS-210550 entitled "Are control processes domain-general? A replication of "To adapt or not to adapt? The question of domain-general cognitive control" (Kan et al., 2013)" has been accepted in principle for publication in Royal Society Open Science.
Please note that you must now register your approved protocol on the Open Science Framework (https://osf.io/rr), using the 'Submit your approved Registered Report' option and then the 'Registered Report Protocol Preregistration' option. Please use the Registered Report option even though your article is being accepted as a Stage 1 Replication. Further into the registration process, in the Journal Title field enter 'Royal Society Open Science (Replication article type, Results-Blind track)'. Please note that a time-stamped, independent registration of the protocol is mandatory under journal policy, and manuscripts that do not conform to this requirement cannot be considered at Stage 2. The protocol should be registered unchanged from its current approved state. Please include a URL to the protocol in your Stage 2 manuscript, and because you submitted via the Results-Blind track please note in the manuscript that the pre-registration was performed after data analysis (e.g. 'This article received results-blind in-principle acceptance (IPA) at Royal Society Open Science. Following IPA, the accepted Stage 1 version of the manuscript, not including results and discussion, was preregistered on the OSF (URL). This preregistration was performed after data analysis.') Please also note that this new registration is required even though you already preregistered your protocol on AsPredicted prior to data collection. Both registrations should be reported at Stage 2.
Following completion of your study, we invite you to resubmit your paper for peer review as a Stage 2 Replication. Please note that your manuscript can still be rejected for publication at Stage 2 if the Editors consider any of the following conditions to be met: • The Introduction and methods deviated from the approved Stage 1 submission (required). • The authors' conclusions were not considered justified given the data.
We encourage you to read the complete guidelines for authors concerning Stage 2 submissions at : https://royalsocietypublishing.org/rsos/replication-studies#AuthorsGuidance. Please especially note the requirements for data sharing and that withdrawing your manuscript will result in publication of a Withdrawn Registration.
We encourage you to read the complete guidelines for authors concerning Stage 2 submissions at https://royalsocietypublishing.org/rsos/registered-reports#ReviewerGuideRegRep. Please especially note the requirements for data sharing and that withdrawing your manuscript will result in publication of a Withdrawn Registration.
Once again, thank you for submitting your manuscript to Royal Society Open Science and I look forward to receiving your Stage 2 submission. If you have any questions at all, please do not hesitate to get in touch. We look forward to hearing from you shortly with the anticipated submission date for your stage two manuscript.

Are the interpretations and conclusions justified by the results? Yes
Is the language acceptable? Yes

Recommendation?
Accept as is

Comments to the Author(s)
This study presents a very clear replication attempt of Kan et al., (2013). I am impressed by the openness about the analyses and the details provided. The conclusions drawn seem to be appropriate and based on the data. I just have two comments. One of the comments is with regard to the data and analyses shared: meta data and explanations of your data/variables will help to understand every column of your data and would make your analyses more reproducible and the data more reusable. Additionally, there seem to be quite some spelling mistakes in the manuscript.

Review form: Reviewer 2
Is the manuscript scientifically sound in its present form? Yes

Do you have any ethical concerns with this paper? No
Have you any concerns about statistical analyses in this paper? No

Recommendation? Accept with minor revision
Comments to the Author(s) Summary This is the stage 2 version of a stage 1 replication report that I had previously reviewed. The manuscript attempts to replicate the findings of Experiment 1 from Kan et al. (2013). Review I thought this was a well-conducted replication and I have only a handful of minor comments. The analysis seemed to me to be consistent with the plan (though see minor comment in point 1). The authors additionally removed extremely long RTs before analysis, which was not preregistered but is entirely sensible to me. The interpretation of the results seems well-justified.

1)
The pre-registered analysis plan states that the results would focus on raw error rates (pg 12, line 5), but the results state that the focus was on the arcsin transformed error rates (pg 14, line 26). It appears to make little difference -the manuscript states that both would be reported if they differed and I didn't see an instance of it -but it is an inconsistency with the analysis plan.

Are the interpretations and conclusions justified by the results? No
Is the language acceptable? Yes

Do you have any ethical concerns with this paper? No
Have you any concerns about statistical analyses in this paper? No

Recommendation? Accept with minor revision
Comments to the Author(s) I was happy to see that this project reached completion. Although the results are not surprising, I believe that it's important to publish data and results relevant to the investigated question. I only found minor problems with the submission: The manuscript has a good number of sentences written in future tense. Searching for "will" or "will be" quickly shows where. Some of these are relevant, but other must be left here from Stage 1.
p.6. "This preregistration was performed after data analysis." might be a mistake.
p.14 In "For RT, the ANOVA revealed a significant main effect of Stroop congruency with faster responses to congruent trials (654 ms)" and similar places, it must be mentioned that these are mean values.
P.16 "was not significant, F(1, 89) = 1.46, p = .231, ηp 2 = 0.02, indicating that the congruency effect was similar following compatible trials" is an incorrect conclusion as non-significant results cannot indicate such conclusion.
P.20 Please double-check the "difference in both raw and residual reading times (~11 ms difference) within the first critical region, ts(94) > 4.03, ps < .001, dz > 0.41" results. Appendix table indicates that the SD was 100 ms here, so I was surprised to see such a strong effect for 11ms difference.

Balazs Aczel
Decision letter (RSOS-210550.R1) We hope you are keeping well at this difficult and unusual time. We continue to value your support of the journal in these challenging circumstances. If Royal Society Open Science can assist you at all, please don't hesitate to let us know at the email address below.

Dear Dr Dudschig
On behalf of the Editor, I am pleased to inform you that your Stage 2 Replication submission RSOS-210550.R1 entitled "Are control processes domain-general? A replication of "To adapt or not to adapt? The question of domain-general cognitive control" (Kan et al., 2013)" has been accepted for publication in Royal Society Open Science subject to minor revision in accordance with the referee suggestions. Please find the referees' comments at the end of this email.
The reviewers and Subject Editor have recommended publication, but also suggest some minor revisions to your manuscript. We invite you to respond to the comments and revise your manuscript. Below the referees' and Editors' comments (where applicable) we provide additional requirements. Final acceptance of your manuscript is dependent on these requirements being met. We provide guidance below to help you prepare your revision.
Please submit your revised manuscript and required files (see below) no later than 7 days from today's (ie 27-Apr-2022) date. Note: the ScholarOne system will 'lock' if submission of the revision is attempted 7 or more days after the deadline. If you do not think you will be able to meet this deadline please contact the editorial office immediately.
Please note article processing charges apply to papers accepted for publication in Royal Society Open Science (https://royalsocietypublishing.org/rsos/charges). Charges will also apply to papers transferred to the journal from other Royal Society Publishing journals, as well as papers submitted as part of our collaboration with the Royal Society of Chemistry

Associate Editor Comments to Author (Professor Chris Chambers): Comments to the Author:
The three reviewers who assessed the Stage 1 submission kindly returned to evaluate the Stage 2 manuscript. As you will see, all are positive about the completed article and judge that the primary Stage 1 criteria are met. There are some minor issues to attend to in revision, including the curation of the online data, some minor inconsistencies with the original protocol, correct interpretation of non-significant findings, and presentational issues (e.g. concerning tenses and typos). Regarding Reviewer 3's concern with your statement, "This preregistration was performed after data analysis": since you submitted via the results-blind track, your statement is correct.
Provided you are able to respond comprehensively to all points raised in a minor revision and response, final acceptance should be forthcoming without requiring further in-depth review.

Reviewers' comments to Author: Reviewer: 1 Comments to the Author(s)
This study presents a very clear replication attempt of Kan et al., (2013). I am impressed by the openness about the analyses and the details provided. The conclusions drawn seem to be appropriate and based on the data.
I just have two comments. One of the comments is with regard to the data and analyses shared: meta data and explanations of your data/variables will help to understand every column of your data and would make your analyses more reproducible and the data more reusable. Additionally, there seem to be quite some spelling mistakes in the manuscript.

Reviewer: 2 Comments to the Author(s) Summary
This is the stage 2 version of a stage 1 replication report that I had previously reviewed. The manuscript attempts to replicate the findings of Experiment 1 from Kan et al. (2013). Review I thought this was a well-conducted replication and I have only a handful of minor comments. The analysis seemed to me to be consistent with the plan (though see minor comment in point 1). The authors additionally removed extremely long RTs before analysis, which was not preregistered but is entirely sensible to me. The interpretation of the results seems well-justified.
1) The pre-registered analysis plan states that the results would focus on raw error rates (pg 12, line 5), but the results state that the focus was on the arcsin transformed error rates (pg 14, line 26). It appears to make little difference -the manuscript states that both would be reported if they differed and I didn't see an instance of it -but it is an inconsistency with the analysis plan.

Reviewer: 3
Comments to the Author(s) I was happy to see that this project reached completion. Although the results are not surprising, I believe that it's important to publish data and results relevant to the investigated question. I only found minor problems with the submission: The manuscript has a good number of sentences written in future tense. Searching for "will" or "will be" quickly shows where. Some of these are relevant, but other must be left here from Stage 1.
p.6. "This preregistration was performed after data analysis." might be a mistake.
p.14 In "For RT, the ANOVA revealed a significant main effect of Stroop congruency with faster responses to congruent trials (654 ms)" and similar places, it must be mentioned that these are mean values. P.16 "was not significant, F(1, 89) = 1.46, p = .231, ηp 2 = 0.02, indicating that the congruency effect was similar following compatible trials" is an incorrect conclusion as non-significant results cannot indicate such conclusion.
P.20 Please double-check the "difference in both raw and residual reading times (~11 ms difference) within the first critical region, ts(94) > 4.03, ps < .001, dz > 0.41" results. Appendix table indicates that the SD was 100 ms here, so I was surprised to see such a strong effect for 11ms difference.

===PREPARING YOUR MANUSCRIPT===
Your revised paper should include the changes requested by the referees and Editors of your manuscript.
You should provide two versions of this manuscript and both versions must be provided in an editable format: one version should clearly identify all the changes that have been made (for instance, in coloured highlight, in bold text, or tracked changes); a 'clean' version of the new manuscript that incorporates the changes made, but does not highlight them. This version will be used for typesetting.
Please ensure that any equations included in the paper are editable text and not embedded images.
Please ensure that you include an acknowledgements' section before your reference list/bibliography. This should acknowledge anyone who assisted with your work, but does not qualify as an author per the guidelines at https://royalsociety.org/journals/ethicspolicies/openness/.
While not essential, it will speed up the preparation of your manuscript proof if you format your references/bibliography in Vancouver style (please see https://royalsociety.org/journals/authors/author-guidelines/#formatting). You should include DOIs for as many of the references as possible.
If you have been asked to revise the written English in your submission as a condition of publication, you must do so, and you are expected to provide evidence that you have received language editing support. The journal would prefer that you use a professional language editing service and provide a certificate of editing, but a signed letter from a colleague who is a proficient user of English is acceptable. Note the journal has arranged a number of discounts for authors using professional language editing services (https://royalsociety.org/journals/authors/benefits/language-editing/).

===PREPARING YOUR REVISION IN SCHOLARONE===
To revise your manuscript, log into https://mc.manuscriptcentral.com/rsos and enter your Author Centre -this may be accessed by clicking on "Author" in the dark toolbar at the top of the page (just below the journal name). You will find your manuscript listed under "Manuscripts with Decisions". Under "Actions", click on "Create a Revision".
Attach your point-by-point response to referees and Editors at the 'View and respond to decision letter' step. This document should be uploaded in an editable file type (.doc or .docx are preferred). This is essential, and your manuscript will be returned to you if you do not provide it.
Please ensure that you include a summary of your paper at the 'Type, Title, & Abstract' step. This should be no more than 100 words to explain to a non-scientific audience the key findings of your research. This will be included in a weekly highlights email circulated by the Royal Society press office to national UK, international, and scientific news outlets to promote your work. An effective summary can substantially increase the readership of your paper.
At the 'File upload' step you should include the following files: --Your revised manuscript in editable file format (.doc, .docx, or .tex preferred). You should upload two versions: 1) One version identifying all the changes that have been made (for instance, in coloured highlight, in bold text, or tracked changes); 2) A 'clean' version of the new manuscript that incorporates the changes made, but does not highlight them.
--An individual file of each figure (EPS or print-quality PDF preferred [either format should be produced directly from original creation package], or original software format).
--An editable file of each table (.doc, .docx, .xls, .xlsx, or .csv --If you are requesting a discretionary waiver for the article processing charge, the waiver form must be included at this step.
--If you are providing image files for potential cover images, please upload these at this step, and inform the editorial office you have done so. You must hold the copyright to any image provided.
--A copy of your point-by-point response to referees and Editors. This will expedite the preparation of your proof.
At the 'Details & comments' step, you should review and respond to the queries on the electronic submission form. In particular, we would ask that you do the following: --Ensure that your data access statement meets the requirements at https://royalsociety.org/journals/authors/author-guidelines/#data. You should ensure that you cite the dataset in your reference list. If you have deposited data etc in the Dryad repository, please only include the 'For publication' link at this stage. You should remove the 'For review' link.
--If you are requesting an article processing charge waiver, you must select the relevant waiver option (if requesting a discretionary waiver, the form should have been uploaded, see 'File upload' above).
--If you have uploaded any electronic supplementary (ESM) files, please ensure you follow the guidance at https://royalsociety.org/journals/authors/author-guidelines/#supplementarymaterial to include a suitable title and informative caption. An example of appropriate titling and captioning may be found at https://figshare.com/articles/Table_S2_from_Is_there_a_trade-off_between_peak_performance_and_performance_breadth_across_temperatures_for_aerobic_sc ope_in_teleost_fishes_/3843624. At the 'Review & submit' step, you must view the PDF proof of the manuscript before you will be able to submit the revision. Note: if any parts of the electronic submission form have not been completed, these will be noted by red message boxes -you will need to resolve these errors before you can submit the revision. If you have not already done so, please ensure that you send to the editorial office an editable version of your accepted manuscript, and individual files for each figure and table included in your manuscript. You can send these in a zip folder if more convenient. Failure to provide these files may delay the processing of your proof.
Please remember to make any data sets or code libraries 'live' prior to publication, and update any links as needed when you receive a proof to check -for instance, from a private 'for review' URL to a publicly accessible 'for publication' URL. It is also good practice to add data sets, code and other digital materials to your reference list.

Royal Society Open Science is a fully open access journal. A payment may be due before your article is published. Our partner Copyright Clearance Center's RightsLink for Scientific
Communications will contact the corresponding author about your open access options from the email domain @copyright.com (if you have any queries regarding fees, please see https://royalsocietypublishing.org/rsos/charges or contact authorfees@royalsociety.org).
The proof of your paper will be available for review using the Royal Society online proofing system and you will receive details of how to access this in the near future from our production office (openscience_proofs@royalsociety.org). We aim to maintain rapid times to publication after acceptance of your manuscript and we would ask you to please contact both the production office and editorial office if you are likely to be away from e-mail contact to minimise delays to publication. If you are going to be away, please nominate a co-author (if available) to manage the proofing process, and ensure they are copied into your email to the journal.
Please see the Royal Society Publishing guidance on how you may share your accepted author manuscript at https://royalsociety.org/journals/ethics-policies/media-embargo/. After publication, some additional ways to effectively promote your article can also be found here https://royalsociety.org/blog/2020/07/promoting-your-latest-paper-and-tracking-yourresults/.
On behalf of the Editors of Royal Society Open Science, thank you for your support of the journal and we look forward to your continued contributions to Royal Society Open Science. The author proposes a study to directly and conceptually replicate the Experiment 1 by Kan et al. (2013). In my opinion this is a sound replication proposal and I am impressed by the details the author provides especially with regard to the online testing procedure.
I have only a few minor suggestions you might want to consider: • In my opinion there are details about the statistical model, the factors and levels and the inference criteria missing in the data analysis plan but also in the pre-registration.
I have added more details regarding the planned analysis and the statistical tests.
• For me the theoretical motivation for the conceptual replication (Experiment 2) is not entirely clear.
I have now added this motivation to the manuscript at the start of Experiment 2. The core point lies in the well-studied increase of the Stroop conflict if using response-based conflict, which is due to two mechanisms typically contributing to the Stroop effecta semantic conflict and a response competition mechanism. If the conflict observed within the stimulus-based Stroop task is small, the opportunity to observe a reduced Stroop effect following difficult to process sentences is also reduced. Additionally, I wanted to keep Experiment 1 and Experiment 2 as identical as possible with the exception of the Stroop materials.
• Has the congruent/incongruent procedure been tested online? I am a huge fan of online testing but from my experience I would advise to run a short pilot to see if you can find the effect in the sentence reading task.
Yes, I agree here. I have experience using online data collection procedures and have found the results to be relatively consistent with lab-based testing. Although this testing has been with more standard reaction time tasks, I have some data from other online experiments using the moving window procedure showing robust effects. Of course, one aspect that previous experience has taught me is that participant exclusion tends to be a bit higher in online than in lab-based studies. Here, the use of predefined data quality exclusion criterion is essential. For example, ten of the filler materials used in the sentence task are followed by comprehension questions. I will remove participants who did not perform at an adequate level to these questions (a criterion used in the original study) and importantly will perform additional anaylses (see manuscript, follow-up analyses) which further ensure data quality beyond data-quality-checking standards from the original paper.
• The original study does not have a big sample size, it might be possible that the original effect size is overestimated due to a lack of power in the original study. Maybe this is something to keep in the back of your mind?
Yes, I agree here. The original sample size was small but the replication will more than double the original number of participants. In addition, as Experiment 2 is essentially the same experiment, it is possible to combine for an even larger sample size (I will report such a combined analysis in additional section now).
• In my experience you might encounter some technical difficulties while data collection as well. What do you plan on doing with data sets where single trials are missing (e.g. due to instable internet connection)? What will you do with participants that start the experiment again because they might have accidentally closed the window?
Only complete data sets will be analysed. The data is stored on our local server at the end of the experiment. I have not had any issues with lost trials within a single data file across several large samples of participants so far, so I don't anticipate this to be a big problem. From my experience using MTurk, participants cannot re-accept a task once they have started it. Also, given that we pay participants a comparably high rate, the tasks are typically rather popular and the link expires quickly, not leaving many options to return to the task.
• Maybe consider using Prolific instead of Mturk? Mturk is not made specifically for academic research whereas Prolific is (Palan & Schitter, 2018; https://doi.org/10.1016/j.). Of course, this is not at all required, I just have very good experience with Prolific and the way they try to ensure data quality.
Currently, I have more experience using MTurk and therefore run that study on MTurk. However, I have looked at Prolific and I think I will consider that as an option in the future.
• De Leuuw (2015) himself suggests using Chrome as this is best tested in jsPsych, it also seems that Chrome (in combination with Windows) "adds the least random noise to RT measurement" (see Pronk et al. 2020; https://doi.org/10.3758/).

I do not require that participants use a specific OS/browser combination. Pilot testing did ensure the experiment runs on Windows/Mac/Linux using Firefox/Safari + Chromium based browsers. I only specify that users should avoid Internet Explorer. I think that although precision within online studies not always matches that of a lab based environment (e.g., I would not be keen to run a lowlevel perception psychophysics experiments within an online environment, nor experiments involving the synchronistation of tone and visual stimuli)
, online studies can achieve good enough precision to be useful in the current setup. Regarding restricting participants to only Windows/Chrome, I fell that this is not necessary. For example, the precision difference across different OS/browser combinations reported in Bridges et al. (2020) did not seem to indicate substantial differences large enough to require participants to use a specific OS/browser combination with jsPsych. However, thanks for pointing this out to me, and I definitely will keep this in mind for future studies. Given that the paper has been submitted at the result-blind track to the journal, I cannot change anything regarding data collection now. But I did implement several data check options in the paper, that allow to access whether the measurements worked with regard to all basic effects (Stroop effect, conflict adaptaion within the Stroop task etc.) • The combination of data sets as described in the pre-registration might be interesting here as well.
Yes, I agree here, I will now report this analysis.
• In the spirit of open science and to make it easier to follow your analyses pipeline you could also share the code and the analyses scripts together with the pre-registration or at the later stage together with the data.
Yes, all data files and analysis scripts will be uploaded to an appropriate repository. The study was registered at "As Predicted". I plan to upload the data to ZENODO, but of course also OSF if required by the journal or reviewer.
And some very minor points: -On p. 7 in line 20/21 it has to be Braem instead of Bream. The author proposes to replicate Experiment 1 from Kan et al., as well as conducting a conceptual replication using s small modification. Experiment 1 of Kan et al. had participants perform intermixed trials of a sentence comprehension and Stroop task. The observation of a two-way interaction between preceding (sentence) congruency and Stroop task congruency was interpreted as evidence for domain general conflict adaptationthe Stroop effect was smaller when it followed an ambiguous sentence. The original Experiment 1 used Stroop stimuli where the incongruent words ('brown', 'orange', 'red') did not match the font colours to be named ('blue', 'green', 'yellow'). The conceptual replication proposed here (Experiment 2) modifies this so that they do overlap.
Review I found the proposal to be generally clear and well-written. I thought that the primary and secondary criteria were met, or nearly met with a potential small elaboration (point 1). The proposed replication appears to match the details of the original experiment, with the perhaps notable exception that it will be conducted with an online sample. I was satisfied with the author's justification for why this does not compromise the ability of the proposed study to detect the effect of interest. The author proposes numerous secondary analyses which I thought were sufficient quality control checks. I have a couple of minor comments on the logic of the conceptual replication study, but they are probably not critical. 1) I thought that there could be a more explicit statement about the critical hypothesis test(s), as it may not be obvious to a reader not familiar with Kan et al. From the power analysis, I assume that the main test is the two-way interaction between previous-trial and current-trial congruency. Some elaboration may be warranted on whether there is an expected direction/pattern of condition means. Further, the author proposes to analyse RTs, error rates, and arcsin transformed error rates (as in the original). With multiple outcomes, there is potential flexibility in determining whether the original findings replicated. It might be worth commenting on whether significant effects in one or all outcomes will be considered as sufficient evidence.
Thanks for pointing this out to me. It is indeed the fact that in the original Kan paper there were two ANOVAs mentioned for the error rates (raw proportions and an arcsin transformation) but also for the RT data there were two analyses mentioned (one excluding outliers and one replacing the outliers). However, for each dependent variable, the authors report that the results across all analyses showed consistent patterns, whereby they chose to only report the details for one of these analyses for the error rates and RTs.
With regard to the question what conditions need to be fulfilled to call it a successful replication, your comment was very helpful to think about this more clearly. Typically, in human information processing models the measures of reaction times and error rates are often seen as measuring the same underlying cognitive process (e.g. Draheim, Hicks, & Engle, 2016). Nevertheless, there is a strong tradition to focus more heavily on the reaction time patterns.
Therefore, I focus on the two-way interaction between previous sentence congruency and current Stroop congruency in reaction time. In line with the conflict adaptation effect, it is predicted that the conflict effect is reduced following incongruent sentences (= direction / pattern of means). From the background of the conflict monitoring literature, and specifically the empirical evidence demonstrating these conflict adaptation effects, the effectsif existent -are basically almost always present in the reaction time data (and if reported, typically mirrored in the error data). In an extreme case, participants could focus on accuracy in the current study and commit hardly any errors. Therefore, my assessment of whether the effect replicates or not will be biased towards the data pattern observed in reaction time. I state all these things more clearly in the manuscript now. Draheim, C., Hicks, K. L., & Engle, R. W. (2016). Combining reaction time and accuracy: The relationship between working memory capacity and task switching as a case example. Perspectives on Psychological Science,11, 2) The conceptual replication seems sensible on the surface, though it did raise a couple of questions in my mind that the author might want to consider.
The reasoning given is that Experiment 2 "…should result in larger conflict… and potentially larger cross-task transfer effects". One interpretation of this is that conflict needs to be sufficiently large for transfer to occur. Another is that transfer could occur in both experiments, but the effect size would be larger in Experiment 2. If it is the latter, what is the benefit of the additional Experiment over increasing the power of Experiment 1 for a smaller effect size? I don't object to the planned sample size, and it is already substantially larger than the original study. However, my reading is that it is based on the original effect size, and prior replication efforts have indicated that it is common for replications to produce smaller effect sizes than the original studies (e.g. Open Science Collaboration, 2015). Kan et al.). However, in order to observe a difference in the size of conflict experienced in the Stroop trials depending on previous sentence congruency, I wanted to increase the size of conflict effect specifically within the Stroop trials. I still agree with the reviewer, that another option would be to increase the sample size of Experiment 1. However, as also noted by reviewer 1, the high similarity between Experiment 1 and 2 does not exclude the possibility for a combined analysis. Indeed, I have now added a section to the paper where such a combined analysis will be conducted to deal with potential issues of power due to the true effect size potentially being smaller than the one originally reported in the study of Kan et al.

As we use the same set of sentences in both experiments, the size of conflict experienced from sentences to Stroop trials should be similar (which is the critical effect reported by
3) Kan et al. make the distinction between representational conflict and response conflict (Egner, 2008), and focus on representational conflict throughout their experiments. In their Stroop tasks, and in Experiment 1 here, the incongruent words were not part of the eligible response set. Experiment 2 here aims to elicit a larger conflict effect by adding response conflict (the incongruent words are part of the eligible response set). The author may want to comment on whether response conflict should also be expected to contribute to the transfer effect. Naively, I have no reason to expect that it should not, but the prediction doesn't necessarily follow from the original study.
I agree that the original paper focuses on representational conflict. Importantly, the Stroop conflict used in Experiment 2 in my conceptual replication contains both response and stimulus (i.e. representational) conflict. Given that the representational conflict in the Stroop task originates on a semantic level, and the sentence conflict originates on a syntactic level, I don't see a reason for the conflict transfer effect to be limited to this specific Stroop conflict type. I have now clarified this in more detail within the manuscript. Before going into details of the review, I have to mention that I am in a special situation as I have been requested to be a reviewer of this submission while I have a Stage 2 submission under review at RSOS. In fact, my team has replicated the very same study, Kan et al., 2013. An advantage of the registered report format that we can give advice to the submitters before too much investment in the project has been done. Saying that I would discourage the author to conduct the replication, or in the proposed form for the following reasons: We replicated Exp 1 from Kan et al. in three countries by independent labs with greater sample size. Compared to the proposed work, we very closely followed the original protocol and conducted the experiments in the lab and not online (some other deviations of the proposal are discussed below).
Our results were rather discouraging as you can read it in our preprint: https://psyarxiv.com/5k8rq Thanks for pointing this out, I now cite that there is another replication study and that is failed to replicate the core findings of Kan et al. (2013).
Besides all of this, even the original study of Kan et al. (2013) raised doubts that Exp. 1 is a good test of the theory, that is why they (and we) conducted Exp. 2 and 3. but the submission doesn't propose to replicate them. Nevertheless, our results were similarly pessimistic about the presence of congruency sequence effect, making us believe that the cross-task congruency sequence effect is either nonexistent or the design of Kan et al. is not a good test of the effect.
In short, I would encourage the author to use a different design to test to effect. My lab is open for further discussion about the potential empirical directions.
Response: Thanks for all these suggestions, I chose to replicate specifically Experiment 1 with a sample size resulting in sufficient power as my research is specifically interested in linguistic conflict and how linguistic conflict interacts with other types of conflict processing. As the other reviewers pointed outthe changes between my two experiments are so smallthat is also an option to combine the analysis to see whether there really is an underlying power issue. I added this planed analysis now to the manuscript. It would be great to be in contact about potential further directionsthanks very much for that optionhowever the current paper has been submitted on the result-blind track to the journal. This means that the data has already been collected before this submission, but the outcome is not revealed to the reviewers (which is an option given by the journal). I read your replication, it is really interesting, I have also learned about a few things, which seemed not reported in the original Cognition paper (more details below). I just wrote down what stood out to me, but there might be some points interesting to discuss.
Below, I list a few more observations about the proposed design. Is the proposed Exp1 a direct replication? I think there are a few deviations from the original study. I do not mind them saying that it is a direct replication but it should be apparent that there are deviations.

Online vs in lab
Thanks for careful reading and assessment of the submitted replication. I have now highlighted any potential deviation between the current replication and the original study, and your comments were really helpful there. The question of whether this is a direct replication or not is difficult. I believe that all deviations from the original article are small (except for online vs. Lab) and are now clearly stated in the revision. What was very surprising for me was the following: In the process of revising the current re-submission and careful reading of the replication of your group, I came to the conclusion that there must be slight issues with the description of the original study. I will outline these details below, but the only reasonable way for me to explain the discrepancies between your replication and the Kan paper method section would be that Kan et al. didn't report precisely what materials were used within the baseline Stroop task. I also don't think that this is a major issue, but it is very surprising. Therefore, it is questionable whether the original study and the pre-print replication followed the details reported in the respective method sections within the manuscripts (see comments below). I suspect that the authors of the pre-print used the materials sent by Kan, which weren't reported to all final details in the paper. Anyway, I don't think this is an issue for the effect we are interested in. But I see for example one benefit of my online data collection: I think the advantage of using the online format might also be that it was possible for me to test US native speakers, therefore regarding the language background a very similar sample to the original study, which is not the case in the other replication (testing people in Australia, Singapore, UK). I now didn't check whether you adjusted your items to British English spelling, but in both cases this would be a deviation from the original study. I know this soon becomes a philosophical question what a direct replication is, but in summary I don't think that the differences in my replication are that much more critical than in the other replication.
"The mapping of keys (G, H, J) is randomly assigned to a response colour (blue, green, yellow) for each individual participant,..." The original study doesn't randomize it.
Yes, that is a difference. However, it is a very uncommon procedure not to randomize assignments of stimuli to response keys. If this is the driving force behind the Kan study effects, it would be very surprising. I now further highlight this discrepancy.
Page 10 line 59 (footnote): I believe this is an important deviation from the original study, however, I am not sure I understand how the stimulus set looked like. There were 42 non-filler sentences in the original study. The footnote suggests that participants did not see all the 42 sentences but just one pseudo-randomized set of 21 sentences. In the original study, the participants saw all the 42 sentences. As the materials for Experiment 1 of Kan et al. 2013 are available the specific order of the sentences in the original study could be applied in the replication as well.
This is a misunderstanding caused by the footnote. I now write this more clearly. Participants do see 42 non-filler items (21 for congruent/21 from incongruent sentences) in the current replication, which is in line with the original study. I do present them in a randomised order (constrained such that a non-filler sentence is followed by a Stroop trial). I think the requirement that they are presented in the same specific order as the original study is questionable. Indeed, it is not clear from the original study that a fixed random order was used for all participants. I did this as again it is a more standard procedure to randomize materials and sentence order instead of keeping that identical across all participants. I used a randomization procedure that ensured that each critical sentence was followed by a Stroop task (so no critical trials are lost that are relavant to test the hypothesis). Indeed I think the fixed order of items across all participants within the original study is actually questionable (resulting in potential confounds of specific order).
Again, I now highlight this discrepancy regarding the randomization more clearly in the paper.
Page 11 line 26: I am not sure that I understand the reasoning behind the exclusion criteria on the comprehension probe task. The author mentions Kan excluding only one participant at a chance level (50%) and that all the other participants score above 70%. The author then states that the replication will follow Kan et al. (2013) and participants with 6 or fewer correct responses will be dropped. This indicates a 60% (given that there were 10 comprehension probe sentences) cutoff level which is neither 50% nor 70%.
I do not understand this point. I believe this is the same comprehension criterion as the original paper. If I remove participants with 6 or fewer correct responses, the accuracy rate of remaining participants will be at least 70%. From Kan et. al "The remaining participants (n = 39) scored at or above 70% (M = .90, SD = .09)". I clarified this in the paper.
The whole sentence preprocessing part is missing. This is not a problem in itself but should be noted as a deviation and justified.
I plan to analyse the sentence task following the procedure described in Kan et al. Specifically, raw reading times 2.5SDs beyond the subjects mean across all conditions are replaced with the 2.5SD cutoff value. Subsequently, regression will be used to predict raw reading time from sentence region length (number of characters). Each individuals predicted reading time is then subtracted from their actual reading time to give a residual reading time. T-tests are then performed on the residual reading time in each sentence region, most importantly, within the temporarily ambiguous and disambiguating regions. I now clarify this. Thanks for pointing this out. I did not have it in the original submission, as the focus was on the transfer effect to the Stroop trials.
Page 10 line 52: The test part consists of 197 trials. In Kan they used 60 cong stroop + 60 incong stroop + 21 cong and 21 incong sentences which is 162. If the filler sentences are included that is 162 + 29 = 191. This is correct. My test phase will consist of 63 congruent Stroop and 63 incongruent Stroop stimuli. This change was made in order to balance the possible combinations of relevant and irrelevant dimensions within the Stroop task. Interestingly, this is not something that seems to be balanced within the original study. For example, within subjects the word "red" did not appear in blue the same number of times as it did orange. Whilst this is not critical to the hypothesis under investigation, it is the reason for my slight change in the number of Stroop stimuli presented. This is not something that is possible to ascertain from either the method section of the original study. I also further highlight this discrepancy.
In addition, I think it is also relevant to highlight another difference between my replication and the original Kan et al. and the pre-print replication manuscripts. Again, I do not think this aspect of the experiment was clear in either the original study nor the replication, and only became evident when looking at the results of the pre-print replication (NB. I do not know for certain if this was the case for the original Kan paper). The baseline "Stroop" task contained non Stroop-like stimuli items, such as the words tax, sounds, hungry. In total 30 words were presented, of which 6 were colour words. Thus, the reported numbers of congruent vs. incongruent trials within the baseline Stroop blocks are incorrect. Specifically, from Kan "Then they completed a baseline block of 145 Stroop trials (intermixed congruent and incongruent)" and from pre-print replication "It was followed by a baseline trial with 145 intermixed congruent and incongruent Stroop stimuli." I do not think this is correct, as the actual baseline block contained 61 congruent and 60 incongruent stimuli, with 24 items being non Stroop-like items. My baseline block contains only stroop-like items with 72 in each congruency condition. Again, I do not think this overly problematic regarding the hypotheses tested, but I think it should be made clear within the method section. Why this was not clearly indicated in the original paper and the pre-print replication is unclear.
Page 11 line 59 (footnote): The stepwise reduction and testing for the presence of the Stroop effect are problematic if NHST is used to test the presence of the effect. Continuous testing can modify the alpha level hence the results.
I agree this is potentially an issue. However, it is not really a step-wise reduction of the 2x2 interaction of interest, but rather of participants who do not show a sentence congruency effect, nor a clear Stroop conflict adaptation effect in the baseline. This is only used as an additional test of the 2x2 interaction; specifically, if the 2x2 interaction is absent even within a subset of participants that do show both a clear sentence congruency effect and a clear Stroop to Stroop adaptation effect within the baseline, the evidence against cross-task adaptation would be rather strong (as long as the power remains sufficient). I have now more clearly stated the critical tests in line with the original study, and that these step-wise-tests are really just seen as an additional source of information to strengthen the conclusions.

General comments
Page 10 line 55: The started sentence is not finished.
Thanks, now corrected.
Page 11 line 15: The footnote mark could be moved to the sentence regarding the RT outlier exclusions.
I moved the footnote and have now clarified the whole analysis section to improve the structure.
Page 7 line 32: I am not sure that I am convinced that Exp 2 is needed based on this one-sentence explanation.
I have now clarified the motivation of experiment 2 at the start of Experiment 2.
Page 7 line 55: What was the exact effect size used for the sample size determination?
The effect-size was calculated from the 2*2 interaction of previous Sentence congruency and current Stroop congruency on RT. The number of required participants was calculated from the partial eta squared value here, and using the R package Superpower. I clarified this in the paper. There is nothing about the analysis procedure in the paper. Nor about the inferences that the authors will make based on the results. Again, a lot of questions remain unanswered.
Thanks for pointing this out. The paper now includes these details. The planned analysis is, in a first step, identical to the original study, with the critical tests being on the reaction times. The revised version makes now clear what results will be interpreted as a successful replication.
The reviewers and handling editors have recommended publication, but also suggest some minor revisions to your manuscript. Therefore, I invite you to respond to the comments and revise your manuscript.
Associate Editor Comments to Author (Professor Chris Chambers): The Stage 1 manuscript was returned to three reviewers who assessed the original submission. The reviews are broadly positive and we are now closer to IPA. Reviewer 1 is now satisfied but there remain some final issues to address concerning the consideration of RTs and error rates (Reviewer 2), and clarity and precision of the methods and prospective interpretation (Reviewer 3). Please respond carefully to these points and I will assess the next revision at desk before issuing a final Stage 1 decision.
Reviewer comments to Author: Reviewer: 1 Comments to the Author(s) Thank you very much. All my comments have been addressed sufficiently.

Response: Thanks!
Reviewer: 2 Comments to the Author(s) The revised manuscript addresses my previous comments. I have one note on a rationale the author gives in their response, though this is not used in the manuscript so it may not warrant further changes.
My previous comment 1 suggested that the author could consider how they would interpret potentially different effects in reaction times and error rates. The manuscript now states that they focus on reaction times, as they are more commonly used in the literature. However their response to my comment also notes: "Typically, in human information processing models the measures of reaction times and error rates are often seen as measuring the same underlying cognitive process (e.g. Draheim, Hicks, & Engle, 2016)." I'm not sure this reference supports focusing on reaction times. Draheim et al. propose using a composite measure of RT and accuracy, on the basis that the latency costs typically used in task-switching are contaminated by speed-accuracy trade-offs. The way in which multiple processes contribute to patterns of RT and error effects is also something I have been interested in in my own work using evidence accumulation models . For example, in models like the driftdiffusion model, an increase in the boundary separation parameter can lead to an increase in reaction time effects and a decrease (or no observable change) in error rates. This parameter has previously been implicated in conflict adaptation in the form of post-error slowing (Dutilh et al., 2011). Though sequential congruency effects do not follow the same pattern, the point is that basing an interpretation on either reaction times or error rates may overlook theoretically meaningful combinations of the two.