Effects of situated learning and clarification of misconceptions on contextual reasoning about natural selection

Natural selection is a core principle of evolution. Understanding natural selection enables students to think about the evolution and the variability of life. Despite its great importance, understanding natural selection is challenging for students. This is evident in the phenomenon of contextual reasoning, showing that students can often explain natural selection in one context (e.g., trait gain) but not in another (e.g., trait loss). The study pursues the following aims: First, to examine the link between contextual reasoning and situated learning. Second, to explore whether different instructional strategies differ in their associated cognitive load. Third, to investigate whether clarifying common misconceptions about natural selection (no vs. yes) is an effective strategy to regular instructions when aiming to increase key concepts and reduce misconceptions. Fourth, to exploratively examine the effectiveness of different instructional strategies. In a 2 × 2 factorial intervention study with a total of N = 373 secondary school students, we varied the instructional material of a 90-min intervention in terms of the evolutionary context (trait gain vs. trait loss) and the availability of additional support in the form of a clarification of misconceptions (no vs. yes). We measured students’ cognitive load immediately after instruction and assessed their ability to reason about natural selection (i.e., use of key concepts and misconceptions) later. We documented low knowledge about evolution in the pre-test and persisting misconceptions in the post-test. The results showed that the intervention context of trait loss elicited a higher intrinsic and extraneous cognitive load than trait gain. Moreover, when the clarification of misconceptions is analyzed in connection to the intervention context, it reveals a potential for reducing misconceptions in some contexts. Students who have learned in trait gain contexts with a clarification used significantly fewer misconceptions in later reasoning than students who learned in trait gain contexts without a clarification of misconceptions. Our study creates new insights into learning about natural selection by outlining the complex interplay between situated learning, cognitive load, clarification of misconceptions, and contextual reasoning. Additionally, it advises researchers and educators on potential instructional strategies.


Introduction
Natural selection is a core principle of evolution and constitutes a pivotal role in evolution education (National Research Council [NRC] 2012; Secretariat of the

Open Access
Evolution: Education and Outreach *Correspondence: j.grossschedl@uni-koeln.de 1 Present Address: Institute for Biology Education, Faculty of Mathematics and Natural Sciences, University of Cologne, Herbert-Lewin-Straße 10, 50931 Cologne, Germany Full list of author information is available at the end of the article Page 2 of 21 Aptyka et al. Evolution: Education and Outreach (2022) 15:5 Standing Conference of the Ministers of Education and Cultural Affairs of the Länder in the federal republic of Germany [KMK] 2020). Understanding natural selection enables students to think and reason about the emergence and existence of biological variability and the evolution of species on earth. Concurrently, the principles of evolutionary biology can offer helpful approaches and strategies for facing global challenges (Carroll et al. 2014;Smith 2010). Despite its general importance, understanding and reasoning about evolutionary processes pose challenges that manifest themselves in unscientific or naïve ideas (in the following, referred to as misconceptions). These misconceptions can be traced back to early childhood and are also observed in secondary school students (e.g., Bishop and Anderson 1990;Opfer et al. 2012;Beggrow and Sbeglia 2019;Ha and Nehm 2014;Nehm and Ha 2011;Evans 2001;Kampourakis and Zogza 2009;Beniermann 2019;Kuschmierz et al. 2020a). Research with German secondary school students yielded that they have low (Beniermann 2019;Kuschmierz et al. 2020a, b) to moderate (Kuschmierz et al. 2020a;Fenner 2013;Lammert 2012) knowledge about evolution and often rely on teleological and 'Lamarckian' conceptions to explain natural selection (Beniermann 2019;Kuschmierz et al. 2020b;Fenner 2013;Lammert 2012). Additionally, other studies indicated that students use key concepts and misconceptions in a context-related manner when reasoning about natural selection (also termed contextual reasoning; Nehm and Ha 2011;Nehm et al. 2012). Context herein refers to the underlying content or topic. For example, when the context differs concerning the polarity of trait change, students use the key concept heritability of variation ('heritability') and the misconception use or disuse of particular body parts ('use/disuse') more often in trait loss than in trait gain scenarios (see also Opfer et al. 2012;Ha and Nehm 2014;Nehm et al. 2012).
Although normative reasoning about natural selection would require using the same key concepts in all contexts, context-related reasoning occurs (Nehm and Ha 2011;Nehm et al. 2012;Federer et al. 2015). To promote normative reasoning in all contexts, there is an urgent need to ascertain factors linked to the phenomenon of contextual reasoning and identify effective instructional strategies that increase the use of key concepts and reduce prevailing misconceptions. Current research suggests that the phenomenon of contextual reasoning could be related to situated learning Beggrow and Sbeglia 2019;Nehm and Ha 2011;Kirsh 2009). The situated learning approach (closely associated with situated cognition) posits that learning and knowledge are inextricably linked to a specific situation (Kirsh 2009;Brown et al. 1989;Reder et al. 1994;Sutton 2008). This approach can also be applied to instructional situations. Again, situational entities determine learning and knowledge acquisition. For example, if students are learning with tasks on a certain context, this underlying context is decisive. As a result, acquired knowledge may only be accessible to a limited extent in new contexts. However, tasks in new contexts can be solved more effectively when students are familiar with similar ones (Kirsh 2009;Reder et al. 1994). If there is a lack of attention to situated learning, context-dependent lessons can occur, which means that natural selection may only be taught in one context but not in another. Context-dependent instructions do not appear to be uncommon, as Nehm and Ha (2011) ascertained, for example, that scenarios involving trait gain are covered more frequently in the curriculum than scenarios involving trait loss. Hence, students have more opportunities to develop key concepts of natural selection in trait gain contexts, while they lack practice in contexts of trait loss. Consequently, a mix of key concepts and misconceptions or relatively intuitive ideas (e.g., from childhood) may remain or are reinforced in trait loss contexts Evans 2001;Kirsh 2009;Reder et al. 1994). Notwithstanding, there is a lack of empirical evidence on whether contextual reasoning about natural selection is demonstrably related to situated learning. Nehm and Ha (2011) extend the view on situated learning and contextual reasoning by pointing to a possible link between persistent misconceptions in less familiar contexts and a high load on working memory capacity. The emerging patterns in contextual reasoning may indicate that some contexts impose a greater cognitive load and are more difficult for students than others (Nehm and Ha 2011;Federer et al. 2015;Nehm 2018). For instance, it is more challenging for students to reason about trait loss or evolution in the plant kingdom than trait gain or evolution in the animal kingdom since individuals use fewer key concepts and more misconceptions (Nehm and Ha 2011;Federer et al. 2015;Nehm 2018;Ha et al. 2006;Großschedl et al. 2018). Similar patterns in concept use emerged throughout the study of the history of biology (Ha and Nehm 2014). Accordingly, difficulties in reasoning about trait loss scenarios are not necessarily due to familiarity. Some contexts may have always been more difficult than others (Ha and Nehm 2014). The link between different evolutionary contexts and their associated cognitive load has been a neglected aspect in previous studies, which makes this a worthwhile research direction of biology education (Nehm and Ha 2011;Klepsch and Seufert 2020).
Along with ascertaining factors associated with the phenomenon of contextual reasoning, it is important to consider instructional strategies to promote normative reasoning about natural selection. Andrews et al. (2011) highlighted that instructions could be more effective when key concepts of natural selection are taught and educators provide their students with an additional clarification of misconceptions. Clarifying misconceptions can enable students to recognize their misconceptions and compare them to scientific key concepts (Gartmeier et al. 2008;Tulis et al. 2016). Recognizing discrepancies when comparing concepts can play a vital role in restructuring inherent conceptualizations and developing scientific knowledge about natural selection (Kampourakis and Zogza 2009;Gartmeier et al. 2008;Tulis et al. 2016;Limón 2001;Oser et al. 2012;Rea-Ramírez and Clement 1998;Nelson 2008). Hence, it remains to be ascertained whether the clarification of misconceptions will lead to an increase in key concepts while reducing misconceptions.
In addition, a new perspective emerged from the literature suggesting that the aforementioned situated learning approach should be considered in combination with the instructional strategy of clarifying misconceptions. Studies on situated learning argue that concepts are generally developed in a context-related matter, including key concepts and misconceptions (Kirsh 2009;Goel et al. 2010;Barsalou 2016;Bechtel et al. 2009;Sadler 2009). Consequently, further developing situated concepts could be meaningful in contexts similar to those in which they originated (Barsalou 2005(Barsalou , 2016Barsalou et al. 2009). Thus, it appears to be of importance to investigate situated learning in different contexts in combination with a clarification of misconceptions.
This study combines fundamental and applicationorientated research to address the previously outlined research gaps. We investigate factors related to contextual reasoning and explore instructional strategies for strengthening key concepts and reducing misconceptions about natural selection. Our main aims are to clarify (a) whether situated learning is linked to differences in contextual reasoning and (b) whether instructions and underlying contexts differ in the associated cognitive load. Additionally, (c) it is essential to investigate if an explicit clarification of misconceptions increases the probability that students use fewer misconceptions and simultaneously more key concepts. Finally, (d) there is a need to explore whether considering learning as a holistic, situated process could shed light on seminal instructional strategies in terms of different contexts (trait gain vs. trait loss) and the clarification of misconceptions (no vs. yes). By engaging with these aims, our study considers the conglomerate of interdependencies. It can shed light on the hitherto unexplained manifestations of contextual reasoning and allows for a more profound exploration of effective instructional strategies.

Background
The classical situated learning approach emphasizes that learning and knowledge acquisition is tied to the situation in which it occurs. Thus knowledge acquisition relates uniquely to situation-specific entities such as the underlying activity, setting, and culture (Reder et al. 1994;Goel et al. 2010;Barsalou 2016). Moreover, situated learning determines how individuals integrate new into existing knowledge. In particular, the context is a central element for the instructional situations, as it is associated with different amounts or types of concepts. Since the use of concepts is context-related, the development or transformation of concepts should occur in a context similar to the one of interest (Reder et al. 1994;Goel et al. 2010;Barsalou 2016;Hendricks 2001;Schaffernicht 2006). For example, if the goal is to develop concepts in the context of trait gain, it may be helpful to tailor the instructional material to trait gain scenarios, especially for novices. The learning situation is also decisive for the extent to which students can retrieve knowledge in new situations (Goel et al. 2010;Barsalou 2016;Johnson-Laird 1980;Paas and Ayres 2014). Much of what is acquired is often not directly accessible and transferable from one situation to another (e.g., Brown et al. 1989;Lave and Wenger 1991). Anderson et al. (1996) propounded that students who have learned in a specific situation ('source') can recall the acquired knowledge more easily in similar target situations ('target'). For instructional situations, this means that the likelihood of transferability depends on how educational situations are framed (e.g., in a meaningful and authentic context) and how familiar the situations are. Therefore, the transfer is also more effortless for novices in similar contexts. The more experienced students are with knowledge transfer, the higher the probability that the transfer will be effective, even between somewhat dissimilar contexts (Reder et al. 1994;Anderson et al. 1996). An empirical study in educational sciences showed that situated learning is associated with an increase in the immediate learning effect on conceptual knowledge, for example, on causal reasoning (Hendricks 2001). Further multi-faceted empirical findings from cognitive sciences on situated learning and conceptualization are presented in Barsalou (2005). Empirical research on these theoretical approaches to situated learning in evolution education is scarce. Nevertheless, Nehm and colleagues alluded that the situatedness of prior learning can be related to different patterns in the reasoning about natural selection (e.g., Opfer et al. 2012;Nehm and Ha 2011). Thus, it is worth examining whether students who learn in one evolutionary context transfer their knowledge equally well to familiar and unfamiliar contexts.
In addition to the relationship between situated learning and contextual reasoning, previous studies revealed that cognitive load is associated with instructions (e.g., Klepsch and Seufert 2020;Klepsch et al. 2017;Sweller and Chandler 1994;Klahr and Robinson 1981;Cooper 1998). Different instructional situations can bind different amounts of cognitive resources. Depending on the nature of the learning situation, the type and extent of the load can vary. To be more specific, the cognitive load is composed of three interrelated types: intrinsic, germane, and extraneous cognitive load (Chandler and Sweller 1991). Intrinsic load increases as tasks become more complex (e.g., large numbers of interacting elements; underlying context). Relating these findings to insights from evolution education, the difficulties associated with trait loss (compared to trait gain) may also translate into an increased intrinsic cognitive load (Nehm and Ha 2011;Federer et al. 2015). Germane load represents the capacity used for cognitive processes during the acquisition of knowledge or concept development. It is the only load that promotes learning. Germane load increases when students learn with effective instructions (e.g., Klepsch and Seufert 2020;Mayer 2002;Moreno and Park 2010;Sweller 2011). An example of an effective instructional strategy may be to add a clarification of misconceptions to regular teaching (Andrews et al. 2011;Choi and Hannafin 1995). Extraneous load depends on design variation and represents the artificially induced cognitive load. Taking the three loads together, the general rule for cognitive load during learning is that instructions should neither be over-nor under-challenging because this can lead to ineffective learning, erroneous reasoning, and misconceptions (Klepsch and Seufert 2020;Goel et al. 2010;Johnson-Laird 1980;Taber 2017;Schneeweiß and Gropengießer 2019). Overall, assessing cognitive load can improve our understanding of how individual learning outcomes emerge. It can aid in clarifying which modifications in instructional materials (e.g., context, instructional strategy) elicit which type of cognitive load. Furthermore, a cognitive load assessment can indicate which instructional materials must be modified to obviate detrimental cognitive load and prevent students from developing misconceptions. Modifying instructions could free up working memory capacity for students to acquire key concepts effectively (Goel et al. 2010;Barsalou 2016;Johnson-Laird 1980).
Although adjusting instructional materials in terms of cognitive load can facilitate learning about key concepts, acquiring key concepts is complex. It is not merely a matter of replacing one concept with another (Limón 2001;Sinatra et al. 2014;Posner et al. 1982). Students may hold key concepts and misconceptions in mixed models (e.g., Opfer et al. 2012;Evans 2001). As they are not mutually exclusive, misconceptions can persist if they are not recognized. Recognizing misconceptions is challenging for students because they are shaped by the individual conceptual ecology (KMK 2020;Nelson 2008;Sinatra et al. 2014). The conceptual ecology provides a frame with variables that affect individual thinking and reasoning about natural selection, such as knowledge about evolution, dualistic thinking, attitudes towards evolution, and personal religious faith (Park 2007;Deniz et al. 2008;Großschedl et al. 2014). For example, knowledge about evolution can have a facilitating or inhibiting effect on learning gains, depending on its nature. If a student's conceptual ecology exhibits a low level of knowledge about evolution and many misconceptions, these misconceptions can inhibit the ability to develop scientific knowledge about natural selection. As these misconceptions are deeply rooted, educators can incorporate instructional strategies such as an explicit clarification of misconceptions into common teaching. Clarifying misconceptions explicitly simplifies the recognition of misconceptions and requires less effort on the part of students (Gartmeier et al. 2008;Oser et al. 2012). Knowledge about misconceptions can cause dissatisfaction with one's concepts (Nelson 2008) and enables students to distinguish between key concepts and misconceptions. Furthermore, students most likely revise and reconstruct their misconceptions, translating into an increase in germane cognitive load (Klepsch and Seufert 2020;Mayer 2002). It is important to note that when students understand that certain concepts are misconceptions and they recognize them in a given context, this does not necessarily mean that students will draw the connection to similar contexts. However, if they recognize misconceptions in one context and notice similarities to another context, this can prevent them from repeatedly using misconceptions and making similar mistakes in other contexts (Oser et al. 2012;Otto and Mandorli 2018).
Empirical studies on the clarification of misconceptions found heterogeneous results by showing no (e.g., Aptyka and Großschedl 2019), positive (e.g., Kampourakis and Zogza 2009;Nehm and Reilly 2007;Colton et al. 2018), and mixed-effects (e.g., Heemsoth and Heinze 2016) on conceptual knowledge. After one semester of teaching about natural selection and actively addressing misconceptions, Nehm and Reilly (2007) found small effects as the diversity of key concepts significantly increased, and misconceptions decreased. Nevertheless, students continued to use context-related misconceptions, which accounted for a significant part of their conceptual knowledge. Similarly, Colton et al. (2018) reported small to large pre-post-test effects after one semester of instructions on natural selection and clarifications of misconceptions. The results showed that students used more key concepts and fewer misconceptions. The strength of the effects differed depending on the underlying context of the pre-and post-test tasks. Heemsoth and Heinze (2016) obtained mixed results in their study on conceptual knowledge in mathematics. They identified prior knowledge as a determinant of learning outcomes. In their research, clarifying what is wrong promoted conceptual knowledge when students had high prior knowledge about the underlying topic. Conversely, students with low prior knowledge showed higher learning success after only focusing on key concepts (Heemsoth and Heinze 2016). These empirical results indicate that clarifying misconceptions can benefit students' conceptual knowledge. Moreover, the findings allude that students' prior knowledge and the underlying context could be decisive for students' learning outcomes. What remains empirically unexplained is whether the context is already important when clarifying misconceptions is used as an instructional strategy.
To summarise, analyzing learning from a diverse vantage point illuminates that different situated learning contexts involve different concepts and levels of cognitive load. A high cognitive load increases the difficulty for students to recognise misconceptions so that misconceptions may persist. When misconceptions are clarified (in targeted contexts), it can stimulate the germane cognitive load and accelerate the recognition and reconstruction of misconceptions. Eventually, this may lead to students having fewer misconceptions and developing normative reasoning.

Current study
Based on the theoretical background, we aim to fulfill four aims: First, to investigate the effect of situated learning in different evolutionary contexts on later contextual reasoning (cf. H1). Second, to examine whether instructional materials and underlying contexts vary significantly in associated cognitive load (cf. H2). Third, to explore the effectiveness of a clarification of common misconceptions for later reasoning about natural selection (cf. H3). Fourth, to analyze which instructional strategy effectively supports learning about natural selection and shows increased use of key concepts and fewer misconceptions in trait gain and trait loss scenarios in later reasoning (cf. exploratory analysis). Thus, the following hypotheses and exploratory analysis guided our research: H1 Students learning in a specific evolutionary context use more key concepts and fewer misconceptions in later reasoning about the same context than students who learned in a different one before. Hence, we hypothesize that students learning in the instructional context of trait gain perform better in subsequent trait gain scenarios than in trait loss scenarios. Students learning in the instructional context of trait loss perform better in subsequent trait loss scenarios than in trait gain scenarios.
H2 Given that trait gain contexts appear easier than trait loss contexts, we suppose students perceive lower intrinsic cognitive load when provided with the instructional context of trait gain than trait loss. Furthermore, supplying a clarification of misconceptions should ease conceptual development. Thus, students who receive no clarification of misconceptions should perceive a lower germane cognitive load than students who receive a clarification. Generally, we do not expect the extraneous load to differ between instructional materials. However, it should be noted that the context of instruction and the clarification of misconceptions might interact and affect differences in the three types of loads. Working memory is limited, and the three loads are mutually dependent.
H3 Since clarifying misconceptions should promote reconstructing conceptions, we expect students who received instructions without a clarification of misconceptions to use fewer key concepts and more misconceptions in later reasoning about natural selection than students who received a clarification.

Sample
Overall, we recruited N = 373 upper secondary school students, of which nine students had to be excluded from the data analyses as preliminary tests showed strong outliers for these students. The students were M = 16.7 years old (SD = 1.0 years, 61% female), enrolled in 16 upper secondary schools in Germany (gymnasium, comprehensive school, and vocational training) and attended the grades 10-13 (M = 10.9, SD = 0.7).
Upper secondary education is equivalent to the third level of the International Standard Classification of Education (ISCED 3; Eurydice 2021). It covers natural selection to a comparable extent for the school types concerned (Eurydice 2021

Research design and procedures
We conducted a 2 × 2 factorial intervention study with an experimental design. The study consisted of three phases: a pre-test, an intervention, and a post-test (cf. Fig. 2). It was conducted in students' regular school environments to ensure ecological validity. To warrant comparability across schools, we revised the core curricula of the lower secondary school (previous education) and upper secondary schools (current education) regarding the evolutionary contexts (Eurydice 2021;KMK 2016;MSB NRW 2019;MSW NRW 2011a, b, 2013a, b, 2015. We instructed the implementers of the study on standardized procedures. The design and instructions (e.g., paperbased material, amount of time to work on tasks, content, attention to key concepts, group-specific attention to misconceptions, learning form) were similar between the schools and classes.
In the pre-test (45 min), we collected variables considered essential for analyzing learning, including knowledge about evolution, dualistic thinking, attitudes towards evolution, personal religious faith, and demographic data (Park 2007;Deniz et al. 2008;Großschedl et al. 2014). In the subsequent intervention (90 min),  Aptyka et al. Evolution: Education and Outreach (2022) 15:5 students of each class were randomly assigned to four intervention groups (i.e., random assignment was at the student level). Each group received instructional material covering natural selection, which was manipulated in terms of the evolutionary context ( . Directly after learning with the instructional materials, students reported their cognitive load using the cognitive load questionnaire that we provided to them on the last page of the instructional materials. A few days after the intervention (on average M = 1.4), the post-test (45 min) was administered to gain insights into students' contextual reasoning about natural selection. We used three test versions comprising the same four tasks but in different orders to reduce possible influences of task sequencing effects (Federer et al. 2015). Each student received all four tasks.

Instructional materials for the intervention
The instructional materials covering natural selection (see Additional file 1) commenced with definitions of basic terms, such as 'population' , 'individual' , and 'trait' , to prevent comprehension problems. For presenting natural selection in a didactically valuable way, we developed the tasks concerning research on situated learning, knowledge building, and cognitive load theory. ). In the writing tasks, students had to explain processes of natural selection based on the four mentioned scenarios. To ensure the validity of the conclusions drawn from the intervention and the post-test and imply familiarity with the response format of the post-test, the intervention materials already contained tasks inspired by the Assessment of Contextual Reasoning about Natural Selection (ACORNS) instrument . For example, one task was defined as follows: 'Explain how a locust population without DDT resistance could develop into a locust population resistant to DDT. First, try to answer this question for yourself' . The best-practice solutions introduced an explanation of the underlying scenarios based on the following key concepts: Presence and cause of variation ('variation'), the heritability of variation ('heritability'), differential survival of individuals ('individual fitness'), and limited resources ('resource limitation'; for short definitions of the concepts, see Additional file 2). The manipulation of the instructional materials in terms of the evolutionary context (i.e., trait gain vs. trait loss) and the clarification of misconceptions (i.e., no vs. yes) was implemented as follows: The groups learning in trait gain or trait loss scenarios only differed in the fact that the first-mentioned group received scenarios of trait gain and the second-mentioned group received scenarios of trait loss (e.g., 'Random changes (mutations) […] may have led to some locust individuals to gain/to lose resistance to DDT'). Besides, the differences in clarification of misconceptions were realized through four additional reasoning tasks embedded in the existing four trait change scenarios. These tasks consisted of three steps: First, the material presented an example of a misconception about one of the four existing scenarios (e.g., locust). Second, the tasks encouraged the students to explain why the statement was technically incorrect. Third, the misconception in the example was explained by an informational text. Accordingly, half of the materials contained global informational text on the following misconceptions: Need as a driving force for evolution ('need'), the use or disuse of particular body parts ('use/disuse'), the intentionality to change ('intentionality'), and the active adaptation to environmental conditions ('adapt'; e.g., The 'adaptation' to environmental conditions has no effect on the hereditary characteristics of individuals in a population. It, therefore, plays no role in the transmission of traits to the next generation; for short definitions of the concepts, see Additional file 2). Aside from the presented manipulations, other parts of the instructional materials were isomorphic.

Measures
The following outlined measures are appropriate for the examined target sample (Beniermann 2019;Kuschmierz et al. 2020b;Nehm et al. 2012;Klepsch et al. 2017). For the measures that students rated on a 5-point or 7-point Likert scale, a high score represents an increased expression of the characteristic in question. Table 1 presents reliability scores and descriptive statistics of all measures.

Pre-test variables
Knowledge about evolution We employed the Knowledge About Evolution 2.0 (KAEVO 2.0; Kuschmierz et al. 2020b) to assess students' knowledge about evolution, especially about different aspects of micro-and macroevolution. We used the KAEVO 2.0 as this measure was validated using multiple evidence for validity (e.g., content validity, internal structure, and reliability), developed for German high school students, and it was thematically orientated towards biology curricula and textbooks contents (Beniermann 2019;Kuschmierz et al. 2020b;Beniermann et al. 2021). Moreover, its underlying context is similar to the post-test assessment (see the following section 'Contextual reasoning about natural selection').
Both allow the evaluation of knowledge about concepts of natural selection (e.g., variation) in animal and plant scenarios concerned with the gain or loss of traits ). The KAEVO 2.0 entails multiple-choice questions, true/false statements, and distractors based on existing misconceptions about evolution. In total, students could score up to 24 points by answering all tasks correctly (Kuschmierz et al. 2020b). Assessing knowledge about evolution as a covariate is crucial for the following analyses, as previous studies showed a positive effect of knowledge on learning. Depending on its nature, knowledge can act as a filter or reinforcer in learning (Barsalou 2016;Deniz et al. 2008).

Dualistic thinking
We assessed dualistic thinking with the Short Dualistic scale (SD-scale; Beniermann 2019). This scale represents an abridged version of Stanovich's Dualism scale (Stanovich 1989). The substantive, content, and internal validity were ensured using expert interviews, a pre-test for testing comprehensibility as well as reliability, and factor analyses (Beniermann 2019). The scale comprised five items on a 5-point Likert scale and has been applied in previous studies to examine the extent to which students hold dualistic theories about the brain and the mind and reject materialistic accounts (Beniermann 2019; Beniermann et al. 2021;Stanovich 1989). This variable is an important covariate because it is part of the belief system of individuals and thus of the conceptual ecology (Deniz et al. 2008). Furthermore, current research emphasises the need to investigate dualistic thinking because it negatively relates to knowledge about evolution (Beniermann 2019;Beniermann et al. 2021).
Attitudes towards evolution We captured the attitudes towards evolution by employing the Attitudes Towards Evolution 2.0 (ATEVO 2.0; Beniermann 2019). Internal validity for this version was determined using statistical analyses such as principal component analysis (Beniermann 2019). Overall, this measure contains eight items on a 5-point Likert scale that address attitudes towards the philosophical position of evolutionary epistemology. Four items focus on attitudes towards the human spirit and four on attitudes towards evolution in general (Beniermann 2019;Beniermann et al. 2021). Prior studies showed that assessing the attitudes towards evolution as a covariate is crucial for an informed interpretation and analysis of the student's learning process. They are positively related to knowledge about evolution and learning Table 1 Reliability scores and descriptive statistics of the measures * We calculated scores with the RStudio packages 'psych' (ω) and 'irr' (κ). We included the reliability indices for α and ω in this table to indicate the scales' internal consistency and ensure comparability with previous studies such as Klepsch and Seufert (2020), who only used ω for reporting the reliability. α = Cronbach's alpha, κ = Cohen's Kappa (Inter-Rater-Reliability), M = mean score, SD = standard deviation, ω = McDonald's omega  Kuschmierz et al. 2020;Fenner 2013;Lammert 2012;Deniz et al. 2008;Beniermann et al. 2021).
Personal religious faith We used the Personal Religious Faith 2.0 (PERF 2.0; Beniermann 2019) to assess monotheistic faith and religious behaviors (Beniermann et al. 2021). This measure comprises ten items on a 5-point Likert scale. Experts from philosophy, theology, religion, psychology, and sociology evaluated and modified the instrument to guarantee content validity (Beniermann 2019). It is essential to determine personal religious faith and use it as a covariate in analysing learning, as this variable is part of the conceptual ecology. The examination of this variable is also essential as current studies on the correlation between personal religious faith and knowledge about evolution are inconsistent. They mostly show no or a negative correlation between the two variables (Beniermann 2019; Kuschmierz et al. 2020;Deniz et al. 2008).

Intervention variable
Cognitive load We assessed students' cognitive load or mental effort by using the cognitive load questionnaire of Klepsch et al. (2017). The substantive and content validity of this questionnaire was demonstrated by deriving its items from literature, comparing it to former instruments, and contrasting informed as well as naïve ratings. Also, Klepsch et al. (2017) tested the internal structure of this questionnaire through statistical analyses such as expert rating agreements and confirmatory factor analysis (Klepsch and Seufert 2020;Klepsch et al. 2017). The questionnaire consists of seven items on a 7-point Likert scale.
Since the cognitive load encompasses three types, three scales are assessed. Two items of the questionnaire measure the intrinsic, two measure the germane, and three measure the extraneous cognitive load. The questionnaire has proven valuable in identifying which parts of tasks are cognitively challenging for students (Klepsch and Seufert 2020;Klepsch et al. 2017). Similarly, we used this measure to gain insight into the perceived difficulties of the interventions' instructional materials.

Post-test variable
Contextual reasoning about natural selection We applied the ACORNS  in the post-test.
The ACORNS was originally validated by providing convergent validity. This instrument is used regularly to gain insights into students' contextual reasoning about natural selection. It differs from the previously presented KAEVO 2.0 in, among other things, the response process Nehm and Ha 2011;Federer et al. 2015).
We applied four open-ended tasks and promoted students to reason about natural selection in written form. The four tasks had an isomorphic structure but differed in the evolutionary contexts (trait gain in a species of snails [toxicity] or elms [winged seeds]; trait loss in a species of penguins [ability to fly] or roses [spines]). Trained independent human raters manually coded each item according to Nehm et al. (2010). We used the manual to identify and quantify the number of key concepts and misconceptions used in trait gain and trait loss scenarios. For this analysis, we focused on the four key concepts which were used in the instructional material, namely the 'variation' , 'heritability' , 'individual fitness' , and 'resource limitation' , as well as the four misconceptions, 'need' , 'use/disuse' , the 'intentionality' , and 'adapt' (Bishop and Anderson 1990;Nehm and Ha 2011;Großschedl et al. 2018;Nehm et al. 2010;Rachmatullah et al. 2018;Rector et al. 2013). The presence of a concept was tallied and dichotomously coded for each task (e.g., concept absent = 0, concept present = 1). We calculated a sum score for the number of used key concepts for each of the four post-test tasks and assigned them to the two different scenarios, trait gain and trait loss. Correspondingly, we obtained scales for the number of key concepts used in trait gain scenarios and trait loss scenarios, where the maximum score for each scale was eight (e.g., '8 concepts used'). We calculated the number of misconceptions used in trait gain and trait loss scenarios in the same way. To ensure the reliability of the open response rating, we evaluated the data material of the two raters and received a substantial overall score (cf. Table 1; Landis and Koch 1977).

Data analysis
We analysed the data using SPSS IBM Statistics (version 27.0) and resorted to RStudio for calculating reliability measures ω and κ. Initially, we screened the data and its distribution. Missing values were partly missing at random (MAR), so we excluded the respective cases from individual data analyses. The data also showed extreme statistical outliers for the attitudes towards evolution (n = 3) and age (n = 6), which we eliminated from all analyses (cf. Fig. 2). We used descriptive statistics for the overall sample specification. We explicitly examined knowledge about evolution in the pre-test and the type of used key concepts and misconceptions in the posttest. We set up a formal model description as a guideline for our following analyses (cf. Additional file 3). For null hypothesis significance testing, we used inferential statistics and set the significance level for the statistical analyses of our research hypotheses to 5%. Previous studies have shown that pre-test knowledge about evolution, dualistic thinking, attitudes towards evolution, and personal religious faith are related to thinking and reasoning about natural selection (Park Page 10 of 21 Aptyka et al. Evolution: Education and Outreach (2022) 15:5 2007; Deniz et al. 2008;Großschedl et al. 2014). We applied these pre-test variables as covariates for all following analyses related to reasoning about natural selection to improve our power and reduce unexplained variability between groups (Maxwell et al. 2017). Since the study design allowed examining the first hypothesis, third hypothesis, and the exploratory analysis in a single step, we performed only one analysis to answer these. Although the analysis is carried out jointly, we presented the results in chronological order of the previously stated hypotheses (see the section 'Current study').
We conducted a two-way multivariate analysis of covariance (MANCOVA). Explicitly, we investigated the effect of the evolutionary context of the intervention (trait gain vs. trait loss; H1), the impact of the clarification of misconceptions (no vs. yes; H3), and the interaction effect of both (exploratory analysis) on the later use of key concepts and misconceptions, both in trait gain and trait loss scenarios of the post-test. The two-way MANCOVA was used, as it is essential for identifying the most effective instructional materials. Even if the two main effects of this analysis are not significant, the interaction term can still be significant. From a statistical perspective, the significant interaction means that the two factors should not be interpreted globally but in combination to obtain a holistic analysis (Maxwell et al. 2017). Afterward, we performed a two-way multivariate analysis of variance (MANOVA) to address our second hypothesis (H2). We examined the effects of the intervention material, explicitly the evolutionary context of the intervention (trait gain vs. trait loss), the clarification of misconceptions (no vs. yes), and the interaction effect of both on students' perceived intrinsic, germane, and extraneous cognitive load.
As a continuation of the exploratory analysis, we also aimed to provide a more general statement about the effects of the four experimental conditions of the intervention (differing in the evolutionary context [trait gain vs. trait loss] and the clarification of misconceptions [no vs. yes]) on the use of key concepts and misconceptions in the post-test per se. Thus, we calculated a total score for the used key concepts (key concepts in trait gain plus trait loss scenarios) and misconceptions (misconceptions in trait gain plus trait loss scenarios). Next, we applied a two-way MANCOVA. In this analysis, we were primarily interested in the interaction effect of the evolutionary context of the intervention (trait gain vs. trait loss) and the clarification of misconceptions (no vs. yes) for the use of key concepts and misconceptions in the post-test.

Baseline description
We used descriptive statistics to visualize the comparability of the pre-test variables and demographic data among the four intervention groups (cf. Table 2).
About the pre-test variables, we explicitly examined knowledge about evolution in more detail. Our sample achieved M = 12.39 points in the pre-test, which is classified as low knowledge according to the score categories of Kuschmierz et al. (2020b). Moreover, we examined four items of the KAEVO 2.0 that are similar to the ACORNS in that the items cover evolutionary adaptation and natural selection. In the tasks, students used more key concepts when trait gain (37.2%) compared to trait loss (28.8%) was addressed. Most of the misconceptions used in all items are based on teleological ideas, especially concerning the organism (22.4%) itself.
Subsequently, we analyzed the mean frequency of the used key concepts and misconceptions when reasoning about evolutionary scenarios within the four ACORNS items in the post-test (cf. Fig. 3). The figure shows that students used about twice as many keys concepts as misconceptions when reasoning natural selection. They used the key concept 'resource limitation' most frequently and 'heritability' least frequently. In addition, the misconceptions 'adapt' , 'need' , and 'use/disuse' occurred nearly equally often, with 'intentionality' being used least frequently.

Effects of situated learning on later reasoning
We hypothesized that students use more key concepts and fewer misconceptions in a context if they had previously learned in a similar one. We interpreted only the factor context of the intervention (trait gain vs. trait loss) of the two-way MANCOVA and used Wilks's statistics to test our first hypothesis. Results did not reveal a significant effect of the evolutionary context of the intervention (trait gain vs. trait loss) on the use of key concepts and misconceptions, both in trait gain and trait loss scenarios in the post-test, Wilk's Λ = 0.99, F(4,301) = .69, p = .603, η p 2 = .01. The results did not support our first hypothesis since groups that learned in the context of trait gain (or trait loss) did not use significantly more key concepts nor fewer misconceptions in trait gain (or trait loss) tasks after instructions. Likewise, descriptive statistics only showed marginal differences. Both groups used more key concepts and fewer misconceptions in trait gain scenarios than in trait loss scenarios (cf. Table 3).

Cognitive load while learning
Regarding our second hypothesis, we expected that only the intrinsic load would be lower in trait gain than Page 11 of 21 Aptyka et al. Evolution: Education and Outreach (2022) 15:5 in trait loss contexts of the intervention, that only the germane load would be lower when receiving no clarification of misconceptions than receiving a clarification, and that the extraneous would not differ between the intervention groups. We also tested whether the intervention context and clarification of misconceptions interact and translate into differences in the three types of loads. We conducted a two-way MANOVA using Wilk's statistics which revealed a significant difference between the experimental conditions of the evolutionary context of the intervention (trait gain vs. trait loss) for students' perceived cognitive load, Wilk's Λ = .97, F(3,221) = 2.67, p = .049, η p 2 = .04, no significant difference for the clarification of misconceptions (no vs. yes), Wilk's Λ = .99, F(3,221) = 1.07, p = .361, η p 2 = .01, and a significant interaction effect, Wilk's Λ = .96, F(3,221) = 3.08, p = .028, η p 2 = .04. Thus, the evolutionary context of the intervention (trait gain vs. trait loss) and the interaction effect relate to students' cognitive load differences. To identify the types of cognitive load which differed between the intervention groups, we conducted Bonferroni-corrected post-hoc ANO-VAs. The analyses revealed a significant difference for the evolutionary context of the intervention (trait gain vs. trait loss) for the intrinsic load, F(1,223) = 4.67, p = .032, η p 2 = .02, no significant effect for the germane load, F(1,223) = 1.69, p = .195, η p 2 = .01, and significant effects for the extraneous load, F(1,223) = 4.90,

Table 2 Descriptive statistics of the groups
We measured knowledge about evolution on a scale ranging from 0 to 24. We assessed dualistic thinking, attitudes towards evolution, and personal religious faith on a 5-point Likert scale ranging from 1 = low to 5 = high. GMC-= group that learned in the context of trait gain without a clarification of misconceptions; GMC+ = group that learned in the context of trait gain with a clarification of misconceptions; LMC− = group that learned in the context of trait loss without a clarification of misconceptions; LMC+ = group that learned in the context of trait loss with a clarification of misconceptions   Page 12 of 21 Aptyka et al. Evolution: Education and Outreach (2022) 15:5 p = .028, η p 2 = .02. The post-hoc analyses for the interaction effect did not show any significant differences for the intrinsic, germane, or extraneous cognitive load (p > .05). Groups that were provided with intervention materials on trait gain contexts (GMC− and GMC+) perceived a lower intrinsic load (M Diff = − .44, 95%-CI [− .83, − .04]; d Cohen = .29) and extraneous (M Diff = − .39, 95%-CI [0.04, 0.74]; d Cohen = 0.29) than groups that were provided with trait loss contexts. The results supported our hypothesis on intrinsic but not germane and extraneous load (cf. Fig. 4).

Effects of the clarification of misconceptions on later reasoning
The third hypothesis we posited was that students who received no clarification of misconceptions in the intervention would use fewer key concepts and more misconceptions in reasoning about natural selection in the post-test than students who received one. Contrary to our expectations, the results of the factor clarification of misconceptions (no vs. yes) in the two-way MAN-COVA did not support our third hypothesis, as they did not show significant differences between groups in the use of key concepts and misconceptions, both in trait gain and trait loss scenarios, Wilk's Λ = .97, F(4,299) = 2.01, p = .093, η p 2 = .03.

Identifying the most effective instructional strategy
To investigate whether it makes a difference in which context students received the clarification of misconceptions and identify the most effective instructional strategies of our 2 × 2 factorial design, we analysed the interaction term of the factors evolutionary context of the intervention (trait gain vs. trait loss) and clarification of misconceptions (no vs. yes) of the two-way MANCOVA in respect to the use of key concepts and misconceptions, both in trait gain and trait loss scenarios in the post-test. We did not find any supporting evidence for an interaction effect, Wilk's Λ = .98, F(4,299) = 1.74, p = .142, η p 2 = .02. From a purely descriptive perspective, the group LMC− used the most key concepts, and the group GMC+ used the fewest misconceptions, both in trait gain and trait loss scenarios (cf. Fig. 5).
Subsequently, we carried out another two-way MAN-COVA to examine the interaction effect of two factors (evolutionary context of the intervention [trait gain vs. trait loss] and clarification of misconceptions [no vs. yes] on the general use of key concepts and misconceptions in the post-test. We found a statistically significant disordinal interaction effect, Wilk's Λ = .98, F(2,301) = 3.06, p = .049, η p 2 = .04. After that, we conducted post-hoc ANCOVAs for both dependent variables. These analyses showed no statistically significant differences between the four groups regarding the use of key concepts, F(1,302) = .78, p = .377, η p 2 < .01, but did regarding the use of misconceptions, F(1,302) = 5.92, p = .016, η p 2 = .02 (cf. Fig. 6). Bonferroni-corrected post-hoc tests revealed that students who learned in trait gain contexts without a clarification of misconceptions (GMC−) used significantly more misconceptions than students who learned in trait gain contexts with the clarification of misconceptions (GMC+), p = .004 (M Diff = 1.22, 95%-CI [0.40, 2.05]; d Cohen = .54). The other groups did not differ significantly from each other.

Discussion
A large body of empirical research in evolution education has analysed key concepts and misconceptions in reasoning about natural selection Beggrow and Sbeglia 2019;Nehm and Ha 2011;Federer et al. 2015) and found that students' reasoning depends on the Fig. 4 Intrinsic, germane, and extraneous cognitive load in the intervention (mean score). Cognitive load was measured on a 7-point Likert scale ranging from 1 = low to 7 = high. Error bars show standard errors. GMC− = group that learned in the context of trait gain without a clarification of misconceptions; GMC+ = group that learned in the context of trait gain with a clarification of misconceptions; LMC− = group that learned in the context of trait loss without a clarification of misconceptions; LMC+ = group that learned in the context of trait loss with a clarification of misconceptions; *p < .05 Page 13 of 21 Aptyka et al. Evolution: Education and Outreach (2022) 15:5 underlying evolutionary context. Furthermore, previous studies stressed that it is easier for students to reason about some evolutionary contexts (e.g., natural selection covering animals; trait gain) than others (Nehm and Ha 2011;Federer et al. 2015;Großschedl et al. 2018). Moreover, previous studies repeatedly addressed the persistence of misconceptions (Ha and Nehm 2014;Nehm and Ha 2011;Nehm and Reilly 2007). However, it remained unclear which factors are linked to differences in contextual reasoning, whether students perceive different contexts to vary in difficulty, and what kind of instructional strategies promote learning about natural selection. We investigated prior situated learning to clarify which explanatory approach can explain contextual reasoning. In addition, we explored whether students perceive different contexts to vary in difficulty by monitoring their cognitive load while learning. Moreover, we explored learning instructions that differed in the underlying evolutionary context (trait gain vs. trait loss) and in the clarification of misconceptions (no vs. yes) to identify the most effective opportunities.

Evolutionary contexts and the situated learning approach
Regarding our research hypothesis on situated learning, the results indicated that the intervention groups learning with trait gain contexts did not use significantly more key concepts or fewer misconceptions than the groups learning with trait loss contexts when reasoning about natural selection in trait gain scenarios in the post-test. The same was true for the students who learned in trait loss contexts. We suspect that prior knowledge, cognitive load, the design of the instructional materials, or effectiveness of knowledge transfer between tasks may explain the lack of effect as discussed below: As our pre-test results showed, we cannot assume that each student resembled a tabula rasa when participating in the study. The students showed little knowledge about evolution and used a mix of key concepts and misconceptions to answer adaptation and natural selection items. Therefore, there exists the possibility that the students either did not have sufficient prior knowledge to draw on when working on the intervention materials or that the existing misconceptions posed a barrier to learning and overwhelmed the working memory when students endeavored to overcome them (e.g., Nehm and Ha 2011;Goel et al. 2010;Barsalou 2016).
It also seems possible that the cognitive load in the intervention was the decisive factor in why students' concept use did not significantly differ. For example, the intrinsic cognitive load differed significantly between the groups that learned in trait gain and trait loss contexts. Since the intrinsic cognitive load was already relatively high in the intervention group, which learned with trait loss contexts, it could have tied up a large proportion of the cognitive resources. As a result, students may have had fewer resources to acquire knowledge in the intervention and apply it to the scenarios in  Aptyka et al. Evolution: Education and Outreach (2022) 15:5 the post-test. Thus, the results of the second hypothesis could explain why students who learned in trait loss contexts in the intervention did not perform better in trait loss scenarios than in trait gain scenarios in the post-test. Taking this idea further, another reason could be that students benefited from the instructional materials ('source'), but the learned information was not directly accessible in other situations ('target'). This means that although students have learned about certain contexts and concepts of natural selection in the intervention, their knowledge initially remained tied to the concrete instructional situation. According to Kirsh (2009), mental representations are always bound to a specific situation. For students to transfer knowledge to other situations, they must understand the deeper structure of the given problem and internalize its abstract representation. Consequently, understanding the post-test tasks ('target') and underlying abstract structures is essential for the transfer. If students did not fully understand the post-test tasks, they might not have understood more profound analogies between the source and the target, and the transfer was unlikely to occur. The transfer of knowledge between the source and target tasks could have been hampered if the students perceived the tasks as dissimilar (Reder et al. 1994;Anderson et al. 1996). Kirsh (2009) described a comparable case in which individuals could not transfer their problem-solving strategies from tic-tac-toe to the game of fifteen, even though both games were based on the same problem and differed only in surface features.
However, the opposite could also be true, meaning that students could transfer knowledge to trait gain and trait loss scenarios of the post-test. We cannot exclude that similar structures of the intervention materials and posttest tasks contributed to the students being able to transfer their acquired knowledge equally well to all post-test tasks. Therefore, students might have had the subjective perception that the post-test tasks matched in terms of required knowledge. For instance, students who learned in trait gain contexts could apply their knowledge equally well in post-test scenarios of trait gain and trait loss.

The complex interplay of intrinsic, germane, and extraneous cognitive load while learning
Regarding our research on cognitive load, the results showed that intrinsic cognitive load was significantly lower when students received interventional material on trait gain rather than trait loss contexts. This effect supports assumptions posed by the cognitive load theory concerning intrinsic load, as intrinsic load represents the perceived complexity placed on the students by context (Klepsch et al. 2017). Former results show that students use more key concepts (Federer et al. 2015) and fewer misconceptions (Nehm and Ha 2011) when explaining natural selection in trait gain scenarios. We added to the body of research by highlighting that students perceive a lower intrinsic load while working on trait gain compared to trait loss contexts.
Due to research suggesting that a clarification of misconceptions can promote conceptual transformation (Kampourakis and Zogza 2009;Andrews et al. 2011;Limón 2001;Nelson 2008;Colton et al. 2018), we expected the clarification of misconceptions to foster deeper learning and thus increase the germane cognitive load (Klepsch and Seufert 2020;Klepsch et al. 2017). Nevertheless, we did not find significant differences between the intervention groups. Klepsch and Seufert (2020) explicated that if high intrinsic and extraneous load levels are imposed on the working memory, only a little working memory capacity is available for germane load. Aligned with the researchers, we argue that all groups' intrinsic and extraneous load could have demanded substantial working memory. Compared to other studies, the assessed cognitive load was relatively high (Klepsch and Seufert 2020;Klepsch et al. 2017). Thus, possibly insufficient capacity was available to acquire new knowledge. Another reason might be the clarification of misconceptions which was not supportive for every student. Our students showed low levels of knowledge about evolution. The results of Heemsoth and Heinze (2016) supported this argument by indicating that a clarification of misconceptions can lead to cognitive overload and disadvantages for students with little prior knowledge but advantages for students with higher knowledge levels.
From a theoretical viewpoint, we also expected that the groups' extraneous cognitive load would not differ. From a practical point of view, the reality is less straightforward. Data revealed that the extraneous load was significantly lower when students learned in trait gain than trait loss contexts in the intervention. Given that the instructional design was the same in both contexts, differing only in the information that traits were gained or lost, other confounding variables must explain this difference in extraneous load. This result could be attributable to the fact that the students were generally more familiar with contexts of trait gain. The overall load in trait gain was lower than in trait loss contexts (Nehm and Ha 2011;Orru and Longo 2019). Comparably, Shen et al. (2020) found that working memory is not significantly strained when students work with familiar compared to unfamiliar components, as they investigated learning with familiar and unfamiliar icons.
In line with the findings of Klepsch and Seufert (2020), the natural complexity of the instructional materials could have prevented students from distinguishing between the complexity of tasks (intrinsic load) and instructional design (extraneous load), leading to both being rated as high (Klepsch and Seufert 2020). Overall, the interplay of the three loads may have almost exhausted the maximum capacity of the working memory. The measured values were higher than in other studies, and working memory capacity is limited (Klepsch and Seufert 2020;Klepsch et al. 2017). As a result, students could have struggled with differentiating between the loads. It is also not yet evident what exact ratio of the three loads promotes the best possible learning success and what the critical level of cognitive overload is (Jong 2010).
The loads increased mostly uniformly from the groups GMC− to LMC+, whereas in other studies, the loads tended to change individually. Thus, more research is needed to determine whether the validity of this instrument is limited when students perceive a relatively high level of cognitive load. In addition, we would like to accentuate the practical significance of the fact that trait gain contexts are associated with a lower intrinsic and extraneous cognitive load than trait loss contexts. The effect sizes of the differences for intrinsic and extraneous load are small. The interpretation of extraneous load is limited and would require further research.

The clarification of misconceptions as instructional strategy
Contrary to our expectations, the sole inclusion of clarification of misconceptions did not significantly increase the germane load or reduce the use of misconceptions in trait gain or trait loss scenarios. A plausible reason is that students did not perceive the clarification as supportive. The intervention may have been insufficiently designed or implemented. Similar to the results of Heemsoth and Heinze (2016), a clarification of misconceptions could have been effective for students with higher prior knowledge but inhibiting for students with lower prior knowledge. Since the MANCOVA only provides a mean value for the groups, no differences are apparent. The average of the students did not benefit more from the clarification than the students who did not receive it.
Furthermore, the allotted interaction time may not have been sufficient to impact learning and subsequent outcomes in the post-test. Developing scientific knowledge of natural selection can take longer than one semester (Nehm and Reilly 2007). Misconceptions about evolution are cognitively, deeply rooted and resistant to teaching (Gregory 2009). Thus, students could benefit from more time to revise individual concepts and connect new with existing knowledge.

The combination of the clarification of misconceptions and different evolutionary contexts in learning instructions
Our exploratory analyses aimed to identify the instructional strategies that indicate the highest potential for learning about natural selection. Therefore, we investigated the interaction of the 2 × 2 intervention design and its effects on the use of key concepts and misconceptions, both in trait gain or trait loss scenarios. The results showed no evidence of an effect on the dependent variables. Nevertheless, from a general perspective of contextually detached use of key concepts and misconceptions in the post-test, the results revealed a small but noteworthy significant interaction effect. The group that learned in trait gain contexts without the clarification of misconceptions used significantly more misconceptions than students who learned in the same context and received a clarification of misconceptions. One explanation for why students only benefited from clarifying misconceptions in trait gain contexts suggests that the intrinsic and extraneous cognitive load was significantly lower in the groups GMC− and GMC+ than in the groups LMC− and LMC+. Accordingly, the cognitive load could have inhibited learning in the trait loss groups. An alternative explanation proposes that students were more familiar with the context of trait gain than with trait loss and were, therefore, better positioned to use the free working memory capacity to avoid misconceptions (Kirsh 2009;Reder et al. 1994;Barsalou 2016).
Although the students knew little about evolution during the pre-test, they could have been more familiar with contexts of trait gain. Being familiar with the underlying context could have provided a basis for transforming students' prior knowledge in the intervention because familiarity can facilitate memory formation (Poppenk et al. 2010). There may be parallels to Colton et al. (2018), who observed that learning gains between pre-and post-test scores were significantly higher in the trait gain than trait loss scenarios.
With respect to the covariates used in this model, we noticed that knowledge about evolution was of primary importance, as it was significant for all dependent variables. More prior knowledge is associated with higher numbers of used key concepts and lower numbers of used misconceptions in the post-test Deniz et al. 2008). The attitude towards evolution indicated that it is significant for the key concepts used in the trait gain scenarios of the post-test. These results are consistent with Kuschmierz et al. (2020), who summarised that the link between knowledge about evolution and attitude appears to be absent or weak, especially in primary and secondary school students. Dualistic thinking showed significant results for key concepts used in the gain scenarios and, similarly to attitudes towards evolution, appeared to be of secondary importance. Personal religious faith represented no significant covariate. The results on religiosity suggested that religiosity is not directly associated with the learning outcomes of the sample in question and may be more closely related to other constructs of conceptual ecology, such as attitudes toward evolution (Beniermann 2019;Kuschmierz et al. 2020;Deniz et al. 2008). Overall, the pre-test knowledge about evolution was the most meaningful covariate in our analyses and should be included in future studies.

Implications for research and education
Our work presents statistical and practical significance and research relevance (Mohajeri et al. 2020). It contributes to education in schools and research on evolution education, specifically on situated learning, the cognitive load, and the clarification of misconceptions.
I. In line with current research findings (Beniermann 2019; Kuschmierz et al. 2020b), our students showed insufficient knowledge about evolution. This lack of knowledge was evident in the pre-test when students increasingly chose teleological misconceptions as answers in tasks on adaptation and natural selection (Beniermann 2019;Kuschmierz et al. 2020b;Fenner 2013;Lammert 2012). Additionally, when analysing the pre-test descriptive results, we found that students used more key concepts in trait gain tasks than trait loss (see 'Baseline description' section). In addition, our descriptive post-test results are consistent with the current state of research as they indicated that the average students used more key concepts and fewer misconceptions when reasoning about trait gain compared to the trait loss scenarios Ha and Nehm 2014;Nehm and Ha 2011). The pattern, which was already discovered in the historical development of biologists' knowledge of evolution, therefore also persisted throughout our study, regardless of the instructional context (Ha and Nehm 2014). Since misconceptions still account for a considerable amount of students' answers in the pre-and post-test, we would like to reiterate the need to find conducive instructional strategies for students. II. To the best of our knowledge, previous studies have primarily investigated reasoning about natural selection using a cross-sectional study design (e.g., Ha and Nehm 2014;Nehm and Ha 2011;Federer et al. 2015). Thus, our study goes beyond previous research in that we not only looked at single events but also included prior situated learning. We found that learning instructions on trait gain or trait loss contexts do not significantly affect the subsequent use of key concepts and misconceptions. Further granular research could shed light on the effects of situated learning on factors such as transferring knowledge from one situation to another. First, research should investigate whether students understand learning concepts by, for example, investigating students' task solving while learning ('source'). Second, it is crucial to determine whether students understand subsequent reasoning tasks such as post-tests or follow-up tests ('target'). If students understand both issues, knowledge transfer should be possible if other variables do not inhibit it (e.g., negative emotions or stress; Klepsch and Seufert 2020). Third, research should inspect whether the similarity of post-test tasks can explain the lack of effects regarding situated learning. III. This study is one of the first to use the cognitive load questionnaire by Klepsch and Seufert (2020) to examine cognitive load in a natural classroom rather than a laboratory setting. While using the instrument, we found that students learning in different evolutionary contexts (trait gain vs. trait loss) showed significantly different intrinsic and extraneous cognitive load levels. Students find it cognitively less demanding to learn in contexts of trait gain than trait loss. As the extraneous and intrinsic load increased simultaneously, we recommend that researchers and educators should be aware of underlying contexts in learning instructions and their effects on the students' working memory capacity. It is essential to deliberately choose instructional contexts, material design, and methods to minimize disturbing cognitive load. IV. In addition, we found no evidence that clarifying misconceptions about natural selection led students to use significantly more key concepts and fewer misconceptions in both trait gain and trait loss scenarios. However, since similar studies could already generate significant learning gains, we recommend conducting further research on situated learning by improving interventions with the generated knowledge and circumventing the presented limitations. V. We detected that combining the evolutionary context of trait gain and the clarification of misconceptions promises the potential for reducing misconceptions. Students use significantly fewer misconceptions in trait gain scenarios when they receive a clarification than students who did not receive the extra support. We could not find the effect in trait loss contexts. Moreover, we only Page 17 of 21 Aptyka et al. Evolution: Education and Outreach (2022) 15:5 found this effect regarding the general use of misconceptions. For now, the clarification of misconceptions does not appear to be a universal solution as an instructional strategy in natural selection. Therefore, gaining new insights into instructions, including a clarification of misconceptions in other evolutionary contexts, especially less cognitively demanding contexts (e.g., natural selection covering animals compared to plants), is a worthwhile direction for future research.

Limitations
The results of our study must be interpreted in light of their limitations. The generalizability is limited to the chosen sample or similar group compositions (Hedges 2013). The results refer primarily to German students at the upper secondary school level, necessitating further research that supports the findings with a perspective on international school students. In terms of the sample, the sample size was generally appropriate for educational research purposes, given that such a number of students can be used to determine the average effects of educational interventions (Colton et al. 2018;Hattie 2008).
However, future intervention studies should increase the sample size to provide more statistical power, given the small effects found in this study.
It is advisable to collect information on how many lessons each student received on the basic topics concerning future study designs. This is important because many German states have no fixed timelines for teaching specific topics. The curricula for German schools are primarily organized so that they specify the topics to be taught and formulate clear objectives for teaching units (e.g., MSW NRW 2013b; KMK 2016). Therefore, based on the curricula, we could only ascertain in this study that the pupils must already have had some initial experience with the topic of natural selection. However, we could not determine what proportion of the overall evolution instruction our intervention accounted for.
Furthermore, we recommend researchers attempting to replicate this study to critically evaluate the choice of research design, especially for the pre-test. Our aim with the pre-test was to collect baseline prior knowledge. We conducted the knowledge about evolution with the KAEVO 2.0 to minimize a potential pre-testing effect or pre-test sensitization as best as possible (e.g., Richland et al. 2009;Salkind 2010). Hence, in this study, the intention was to avoid learning and familiarity with the ACORNS in the pre-test and to be able to write the differences in the post-test of the intervention. However, one disadvantage of this design is that it did not allow us to calculate the differences between pre-and post-test performance. We are aware that this could have undermined our results. Accordingly, to investigate differences in future studies, we suggest employing the same instrument to assess knowledge about evolution in the pre-and post-test. Thereby attention should be paid to possible confounders due to, for example, different surface features of tasks (e.g., Nehm and Ha 2011;Federer et al. 2015) and situatedness (e.g., Kirsh 2009;Reder et al. 1994). Alternatively, the above limitations could be minimized by using the Solomon four group design. This can help to screen the effect from pre-test to post-test (Solomon 1949). Nevertheless, the design has the disadvantage that it requires much effort and would have been disproportionate and uneconomical for the present study with the already existing four groups.
Moreover, according to Klepsch and Seufert (2020), researchers have mainly used the cognitive load questionnaire in studies with systematically varied variables. Consequently, the validity of this instrument was rarely investigated in classroom settings. Since the operationalization of cognitive load is highly complex, divergent results can still occur. For example, in findings where increasing germane load does not inevitably imply an increase in post-test performance or in cases where students cannot clearly distinguish between the intrinsic and extraneous load (Klepsch and Seufert 2020;Klepsch et al. 2017). Therefore, the role of individual cognitive load is still debated (Klepsch and Seufert 2021). Besides, the questionnaire consists of subjective ratings, which can vary among individuals, as everyone has a different memory capacity and perception. We recommend augmenting future studies with an additional objectively scored instrument to protect the cognitive load analysis from confounding subjectivity and strengthen arguments for convergent validity (Maxwell et al. 2017). In this regard, Kalyuga and Plass (2017) present several practical methods for measuring cognitive load (e.g., dual-task measures).
Additionally, all students had equal time to work on the intervention materials. As we aimed to consider the clarification as a complement rather than an alternative to teaching key concepts, we manipulated the factor clarification of misconceptions (no vs. yes) by integrating the clarification of misconceptions into the actual materials of the respective groups. They received four additional tasks. This procedure could have constituted a limitation as the other groups did not receive any extra tasks during the working time. An alternative would have been to provide the students without the clarification with other tasks about natural selection, but this would probably have caused advantageous knowledge about further contexts and key concept use. Off-topic tasks would have resulted in less time to engage with natural selection.
Page 18 of 21 Aptyka et al. Evolution: Education and Outreach (2022) 15:5 Thus, we anticipate that the alternative would have led to a stronger bias in the results than the current approach, in which we integrated four additional items into the intervention materials for part of the groups. Lastly, we cannot exclude the possibility that a lack of transfer hindered the effects of situated learning. Future studies should investigate how the ability to transfer contextual knowledge about natural selection can be visualized and promoted in students (Kirsh 2009;Richey and Nokes-Malach 2015;Veenman et al. 2004;Hajian 2019).

Conclusion
Overall, we replicated and corroborated current research on secondary school students' knowledge about evolution. Our descriptive results indicated that students' knowledge about evolution is low and that most of the misconceptions they resort to are teleological in origin. Furthermore, descriptive results showed that students used fewer misconceptions in trait gain scenarios than in trait loss scenarios, both in the pre-and post-test. With regard to our hypotheses, the results did not reveal a significant effect of situated learning on later reasoning about natural selection. Furthermore, the findings revealed that students who learned in the intervention contexts of trait gain perceive lower intrinsic and extraneous load than those who learned in trait loss contexts. Additionally, the clarification of misconceptions showed no benefits when disregarding the instructional contexts. The same is true when considering the interaction effect of the 2 × 2 factorial design and its effect on key concepts and misconceptions, both in trait gain and trait loss scenarios of the post-test. Nevertheless, when considering the general use of key concepts and misconceptions, learning in trait gain contexts with an additional clarification of misconceptions can lead to significantly fewer misconceptions in later reasoning about natural selection. This effect is especially notable when compared to learning in trait gain contexts without the clarification of misconceptions.
Moreover, our contribution improves understanding of natural selection learning and advises teachers on conductive instructional strategies. Our results recommend that researchers and educators pay attention to the complex interplay of prior situated learning, differences in instructional contexts, effects of an explicit clarifying of misconceptions, and cognitive load. Paying attention can aid in developing instructional strategies with an appropriate design and allotted time. In addition, we endorse considering the situated learning approach as a valuable lens to interpret concept use. We also advocate broadening research to other contexts (e.g., animals and plants) or research in regular school environments that analyses the behavior of cognitive load and achieves learning success while learning with a clarification of misconceptions.