Interestingness is in the eye of the beholder – the impact of formative assessment on students’ situational interest in chemistry classrooms

ABSTRACT Students’ interest is considered an important learning outcome, but it is also a relevant predictor for student learning, and future vocational choices. According to numerous studies, however, students’ interest in STEM fields usually declines during the course of secondary education. From the perspective of science education, it is therefore necessary to foster or at least maintain students’ interest. Despite the variety of approaches that have already been examined in order to promote student interest, the problem of low-interested students remains. Prior findings indicate that specific person characteristics and the students’ perception of the situation seem to be moderate the effectiveness of many approaches. The current intensive repeated measure intervention study addresses the investigation of a possible interest trigger (formative assessment) and also the process that influences the perception of this trigger. Based on a sample of 9th-grade chemistry students (N = 200), three different interventions of formative assessment were implemented in regular classrooms. Students’ situational interest was assessed repeatedly in short time intervals. Based on multilevel analyses, not all interventions were perceived as equally interesting by the students. While students’ individual interest influences the perception of all interventions positively, the impact of gender, chemistry grade, and enjoyment varies across the interventions.


Introduction
Chemistry education aims to support students in gaining scientific literacy. Empirical research points out that students who are more interested in chemistry gain higher achievement (Jansen et al., 2016) and are thus more likely to participate in a public discourse about scientific problems and to develop a reasoned stance in this discourse (Laugksch, 2000). Interest is also considered an important predictor for academic and vocational choices (Eitemüller & Walpuski, 2018;Maltese & Tai, 2011). However, in their review Potvin and Hasni (2014) conclude that students' interest in STEM fields declines during the course of secondary education. Especially, the observed decline in students' interest in STEM subjects is more pronounced compared to other domains (Krapp & Prenzel, 2011), which indicates that this decline is not merely due to an interest differentiation during puberty. These results are in line with the comparatively low number of students who choose a career in STEM fields.
To meet the shortage of STEM-workforce on the one hand and to promote scientific literacy on the other hand, instructional approaches must be developed that foster students' interest. However, despite the variety of approaches that have already been examined in order to promote interest, the problem of low-interested students still remains. Too little knowledge exists regarding the underlying mechanisms of how promoting students' interest actually works in the classroom. Hence, besides developing and evaluating new instructional approaches and learning materials to foster students' interest, another focus should be to better understand the process of fostering students' interest in real classroom contexts. Therefore, the aim of the current study is to investigate the effectiveness of formative assessment as a potential trigger of students' situational interest in real classrooms. Also, students' evaluation and perception of the various interest triggers will be considered to better understand the potential of each trigger to catch learners' attention in the classroom.

Interest and interest development
Interest research can look back on a long tradition. This has led to numerous changes or expansions of the central construct's definition. Renninger and Hidi (2011) reviewed these parallel definitions and summed up the following five characteristics of interest as a motivational variable on which most researchers agree: interest is specific for a content or an object (1). The person and the object/content are connected by a relation which is shaped by the predisposition of the person and the environment (2). Nevertheless, the person is not always metacognitively aware of their interest in a specific object or content (3). Furthermore, interest is based on neurological processes and closely connected to the reward circuitry of the brain (4). Activities linked to interest (e.g. information-seeking) are perceived as intrinsically rewarding. Following these characteristics, everyone can develop interest, regardless of age, gender, or socio-economic status etc. Once developed, interest manifests itself in a person's predisposition to re-engage in certain activities or contents over time. This engagement is paired with both affective and cognitive components (5).
When a learner is confronted with a new topic in a specific situation and interest evolves as a result of this confrontation, the generated interest is usually only temporary. In addition, this type of interest is characterised by a higher proportion of affective than cognitive components . Because of its potentially transitory nature, it is called 'state' or 'situational interest'. Regarding the four-phase model of interest development, situational interest is typical for the early phases of interest formation (Hidi & Renninger, 2006). In the two initial phases of less-developed interest (1. 'triggered situational interest' and 2. 'maintained situational interest') external support can be very important, as the interest is still fragile and might fade if difficulties during the learning process occur . However, if the situational interest lasts or reoccurs, a more enduring interest might develop. This type of interest is not only a psychological state, but also a (beginning) predisposition to re-engage with a particular object or topic (Renninger & Hidi, 2016). The more developed phases of interest development are called 'emerging individual interest' (3.) and 'well-developed individual interest' (4.). They are characterised by a higher value on and knowledge of the content or object of interest. The behaviour of the interested person is marked by a voluntary information-seeking about the content, e.g. he or she seeks the answers to self-posed questions (Hidi & Renninger, 2006). Hence, external support is less likely to be necessary during these phases. Nevertheless, interest might in any phase stagnate, decrease or vanish, if there is not enough support or application for it . Briefly, individual interest is stable across situations and needs less external support than situational interest.
Many studies found empirical evidence for the proposed four-phase model of interest development (e.g. Linnenbrink-Garcia et al., 2013;Tsai et al., 2008). Knogler et al. (2015) also point out that individual interest as a trait explains a certain amount of consistency in situational interest. Situational interest is, therefore, not only a precursor of individual interest but also positively influenced by already existing individual interest.

Relation of interest to other constructs
Besides the importance of interest itself, the construct is also positively associated with other educational variables: Because interest, if sustained, can increase effort and, thus, lead in the end to knowledge acquisition. Hence, achievement rises with increasing interest and vice versa (Marsh et al., 2005). Empirical research points out that these couplings were validated across different domains in school contexts (Jansen et al., 2016). Furthermore, students with higher interests in (school) science activities that stimulate cognitive activation and communication of knowledge (e.g. planning of own investigations or debating scientific topics with peers) also show higher achievement (Höft et al., 2019). The perception of one's own abilities (i.e. self-concept) is also linked to both constructs: Higher achievement is not only linked to higher interest, but also to a more advanced self-concept (Denissen et al., 2008).
Interest is also correlated with emotions like enjoyment. According to the controlvalue theory, enjoyment is a positive achievement emotion and may lead to flow experiences in which learning feels effortless (Pekrun, 2006). A student with a high trait of enjoyment is therefore more likely to experience enjoyment during a specific activity than a student with a lower trait during the same activity (Ainley & Ainley, 2011;Nett et al., 2017). However, according to Ainley and Hidi (2014), enjoyment and interest often, but not exclusively, arise together.

Fostering interest in science
Since students' interest in science declines more than their interest in other subjects during secondary school on average (Krapp & Prenzel, 2011), it is necessary to foster or at least maintain students' interest, especially in science. This process of developing interest by integrating methods that catch the learners' attention is called 'triggering' . The aim of fostering interest is addressed by many researchers. This extensive research led to a great variety of possible triggers. Potential triggers are the topic students are engaged in or science-related activities (Broman et al., 2020;Habig et al., 2018;Hoffmann et al., 1998). Other studies also observed that feedback (Matthews, 2004), utility value-intervention (Hulleman et al., 2010), group work (Jack & Lin, 2017), computers/technology (Swarat et al., 2012), or science-related curiosity (Luce & Hsi, 2015) can act as triggers for interest. However, there is no consensus about the effect of most triggers: For instance, Holstermann et al. (2010) found that not every handson activity is perceived as interesting. Consequently, some trigger-categories might be too broad: Not every hands-on activity triggers interest in each student, nor does every integration of computers or technology (Bryan et al., 2011).
Therefore, recent studies focus not only on the effects of specific triggers, but also on the perception of these triggers by the students: Durik et al. (2015) summarised that the perception and effectiveness of a specific trigger depends, among others, on the students' initial individual interest and self-concept. However,  indicate that the processes of triggering (and maintaining) interest are rather complex and not yet fully understood.

Formative assessment
To gain a more detailed insight into the triggering process, the current study focuses solely on the trigger of formative assessment and its perception in relation to different learner characteristics. This focus is based on a former exploratory study in which we found that feedback, and more specifically formative assessment, are effective triggers for situational interest across different scenarios, e.g. interactive quizzes (see Ochsen et al., 2020).
Commonly, only the teacher's role using formative assessment methods is considered (Black &Wiliam, 2009), but William andThompson (2008) also acknowledge the role of the learner and the role of peers because both are vital to the learning process as well. The main difference between teacher-assessment and self-assessment is that students have to anticipate what they are doing correctly on their own and also to diagnose the potential gap between their own work and the learning goals provided by teachers (Andrade, 2010). Typical approaches to implement self-assessments are rubrics, learning logs, or checklists, because they explicitly state learning goals. Students can use these goals to compare their own work and to classify their achievement level.
The focus of studies analysing formative assessment is often the impact on achievement and learning progress (Black & Wiliam, 2009). Empirical findings regarding the effect of formative assessment on motivational variables are rather scarce. Traditional methods of formative assessment were implemented and their effect on students' interest examined by Yin et al. (2008). Contrary to their hypothesis, they found no effect on students' interest. According to the authors, a possible reason for this result could be a variation in the teachers' classroom management and in the usage of informal formative assessment settings. By using interactive quizzes to implement formative assessment in the classrooms, Wang and Tahir (2020) showed that this approach increased interest and motivation of almost all students regardless of gender, prior achievement or interest, which is mainly traced back to the game-based nature of quizzes. Besides, the perceived usefulness of feedback was shown to have a positive influence on the students' interest (Rakoczy et al., 2018).

Research questions and hypotheses
In summary, the explicit fostering of students' interest in science is an important and necessary intention in science lessons. Based on the four-phase model of interest development, students' situational interest is a promising focus when aiming to develop a more enduring individual interest on the long run (Hidi & Renninger, 2006). Despite various research on specific triggers of students' situational interest (Palmer, 2009;, numerous findings also indicate that many triggers are not effective per se but vary in their potential to catch the learners' attention in class (Durik et al., 2015;Holstermann et al., 2010). In this context, specific person characteristics (e.g. students' individual interest, Knogler et al., 2015;enjoyment, Nett et al., 2017;self-concept, Denissen et al., 2008) and also students' perception of the trigger seem to be moderate the effect of the triggers on students' situational interest.
Thus, the scope of the current study is to contribute to this research field by examining the effectiveness of formative assessment as a potential trigger of situational interest in relation to learner characteristics like students' prior knowledge, gender, science selfconcept, enjoyment as trait, and individual interest. Based on findings of a prior study, three different methods of formative assessment were selected and developed into interventions (Ochsen et al., 2020). These methods differ in their focus, ranging from a content-oriented approach (LBB 1 ; I1) to an inquiry-oriented approach (rubric; I2) to a game-based approach (Kahoot!-Quiz; I3). The design of this study was guided by the following Research Questions and hypotheses: (1) To which extent does each formative assessment-intervention (LBB, rubric, Kahoot!) effect the students' situational interest? (2) To which extent are the effects of the interventions on the students' situational interest moderated by the students' evaluation of these interventions and their overall enjoyment? (3) To which extent are the evaluations of the interventions associated with gender, grade, scientific self-concept, individual interest, and enjoyment of the students?
We expect the three formative assessment-interventions to have a positive effect on students' situational interest, as they are all different approaches of formative assessment. Nevertheless, the effects of the interventions on the students' situational interest are expected to vary . In particular, the Kahoot!-Quiz, as an approach of game-based learning and therefore probably appealing to all students, is thought to have the strongest positive effect (Wang & Tahir, 2020), while the LBB with the clearest focus on chemical content is expected to have the weakest effect .
Besides these positive effects at the cohort level, we expect that students who rate each of the interventions more positively (in terms of fun and usefulness) will also show higher situational interest than students who rate the interventions less positively (Durik et al., 2015).
Regarding the influence of individual student characteristics, a positive correlation is expected between situational interest and enjoyment (Ainley & Ainley, 2011). We expect that students who report a higher general enjoyment in chemistry classes will experience higher situational interest during the interventions. Furthermore, we anticipate the effect size to vary between the interventions because we expect the students' enjoyment to contribute more to a higher situational interest during the content-based interventions (Nett et al., 2017).
Regarding the students' evaluations of the different interventions, we expect students' individual interest to have a positive effect on evaluations of content-and inquiry-based interventions (I1 and I2). However, we anticipate no effect or a lower positive effect on the evaluation of the Kahoot! quiz (Durik et al., 2015;Yeh et al., 2019). Moreover, the association with the chemistry grade is expected to vary across the evaluations: While better grades are likely to have a positive effect on the evaluation of more contentfocused interventions, the evaluation of the game-based intervention is expected to be independent of the chemistry grade. Due to the strong correlation between (science) self-concept and achievement, it is assumed that the former has similar effects on the evaluation of the interventions, but that they may not outweigh the effect of achievement on the evaluation (Denissen et al., 2008).

Sample
This study was conducted in five secondary schools in Northern Germany in 2019 and 2020. N = 200 students (53.3% female, with an average age of 14.51 years) distributed across eight classes and five schools took part (convenience sample). The classes were taught by six different teachers. All students attended Grade 9 at a 'Gymnasium' (the highest educational track in German secondary education). Participation in the study was voluntary for teachers as well as for students. The study was conducted within the unit 'acids and bases' which is embedded in the curriculum for Grade 9. The focus is solely on chemistry education as in Germany from Grade 7 the science domains are taught separately.

Study design
In the present study, students' situational interest was assessed repeatedly in short time intervals, i.e. multiple times per lesson, across a time period of six to eight lessons (with the total length varying between classes due to organisational components). The measurement intervals were determined by subdividing each lesson in terms of the content, method, and activity into, e.g. introductory phases, experimental phases, and intervention phases 2 ( Figure 1). The division of the lesson into phases was discussed with the teachers before each lesson. Students rated their situational interest after each of these phases using a tablet they were handed at the beginning of each lesson. The number of ratings per student in each lesson ranged from 1 to 9 (M = 4.36, SD = 1.83). Across students, lessons, and classes, a total of N = 6236 ratings of situational interest were generated during this study (including missing data).
Within the course of lessons in each class, three formative assessment interventions were implemented (I1 to I3). As it was a major concern of the study to disrupt lessons as little as possible, both the regular lessons as well as the interventions were carried out by the regular teachers of the classes after they had received detailed instructions. The implementation of each intervention was considered a lesson phase and, thus, students' situational interest was also assessed with regard to each intervention. In order to monitor the fidelity of the implementation of the different interventions as well as incidents and events that might impact data collection or analysis, all lessons were protocolled by the first author or a trained student assistant.
Besides the repeated measurement of students' situational interest, the study was framed by questionnaires which were conducted before and after the interventions (Figure 1). In the questionnaire before the interventions, students' individual interest in specific school science activities, their general enjoyment, and their science selfconcept were assessed in addition to personal variables such as gender, age, and chemistry grade. In the questionnaire after the interventions, students were asked to evaluate the three interventions they encountered in the preceding lessons.

Formative assessment interventions
The different methods employed in this study were standardised, pre-planned formative assessment procedures aligned to the content of the lesson . While all methods contain aspects of self-and teacher-assessment, each method also places a specific emphasis on either content, inquiry, or gamification.
The first formative assessment method was administered at the beginning of the unit 'acids and bases' (Intervention 1-time 1, I1-T1). As a writing-to-learn approach, a portfolio method (called Lernbegleitbogen; LBB; see supplementary material, part 1) was used which asks students to write down their understanding and explanation of a given process or phenomenon. In the current study, the LBB was contextualised by acid reflux and the appropriate medication which contains a base to neutralise the excess gastric acid. At the beginning of the course the students had no knowledge about neutralisation reactions. Nevertheless, they were asked to watch three different videos: All videos showed the process of a neutralisation reaction on a submicroscopic level, but only one video showed the scientifically correct process (Kelly, 2020). The other two videos visualise typical alternative conceptions of neutralisation reactions or submicroscopic processes in general. The students were asked to decide which video is correct and to explain their answer based on their prior knowledge. At the end of the unit, their own LBB was again distributed to the students and they were requested to watch the videos again (Intervention 1-time 2, I1-T2). The students were asked to decide and explain whether and why their first decision was correct andif necessary to revise or elaborate on their first answer, based on their gained knowledge. Repeatedly working on the same task and eventually revising one's initial answer was intended to make the learning process of the students visible to themselves and to connect their prior knowledge about molecules and ionisation with their new knowledge about acids and bases.
The second formative assessment method, a rubric (I2), is not tied to a specific time during the intervention but rather to the conduction of an experiment which contained the typical phases of scientific inquiry, e.g. formulating hypotheses or designing and conducting an investigation to check the generated hypothesis. After having conducted these inquiry activities, the students reflected on and evaluated their performance during these activities. Namely, they reflected on their use of science tools, their systematic work, and their teamwork using a rubric. This rubric was adapted from White and Frederiksen (1998). Each aspect is briefly described to clarify its meaning to students. Below this description, the students commented on what they did well and where they see potential for improvement. Finally, students evaluated their performance on a scale of 1 to 5 (see supplementary material, part 1).
Another method of formative assessment are interactive quizzes which was implemented with the software 'Kahoot!' (I3). This method offers not only the possibility of self-assessment, but also teacher-assessment: The teacher gets an insight into the knowledge level of the class by seeing all answers, but without assignment to individual students. Yet, individual students know whether they gave the right answer and they can place themselves in the class context because they know how many classmates have answered the specific question correctly. Furthermore, the software Kahoot! provides a ranking of all participants taking into account the correctness and speed of the answer. A list of exemplary items regarding acids and bases for this quiz is provided in the supplementary material (Table S1).
Despite the similarities between the interventions like the categorisation as standardised, pre-planned formative assessment procedures according to Shavelson et al. (2008), the foci of the interventions vary: I1 focuses strongly on chemical content (neutralisation reactions), while I2 focuses more on scientific inquiry. I3 combines the content focus with a game-based learning approach.

Situational interest
The measurement of situational interest was adapted from Palmer (2009) who used one item to survey situational interest in order to capture the ephemeral variable properly. Another benefit of this method in this context is that the disruption of the lesson was kept to a minimum. For these reasons, we decided to use the approach suggested by Palmer (2009) in preference to other options that measured students' situational interest either by multiple items (Rotgans & Schmidt, 2014) or only at the end of the lesson (Patall et al., 2016). These procedures could result in noticeable breaks in the flow of the lessons and potentially imply limitations in the validity of the interpretation of results when aiming to measure a temporally unstable construct in rather long intervals. Consequently, students were asked how interesting the previous part of the lesson was for them. They answered on a 4-point rating scale ranging from 1 = 'uninteresting' to 4 = 'very interesting'. Based on interviews and subsequent data triangulation, Palmer (2009) was also able to provide evidence for the validity of this method.

Questionnaires
Before the first lesson of the study, a questionnaire was distributed to students (a complete list of all items in the questionnaire can be found in the supplementary material, Table S2). The questionnaire aimed to cover the constructs that are expected to influence students' situational interest as well as their perception of the different interventions, but which are not expected to be significantly affected by the interventions. These constructs, i.e. students' individual interest, general enjoyment, and self-concept, have the status of covariates. The choice of instruments for measuring these constructs relied on prior studies that investigated these constructs in a comparable educational context and that provided sufficient details regarding the quality of the instrument. Individual interest was measured using the adapted RIASEC + N-Model which was validated in prior studies (Höft et al., 2019). The model is adapted from Holland's RIASEC model of vocational interest (Holland, 1997) and aims to capture a multifaceted picture of (school) science. Accordingly, the instrument measures students' individual interest in different school science activities, e.g. carrying out experiments, planning investigations, or debating about science topics with peers. Students' ratings of how much they were interested in doing these activities in chemistry lessons on a 4-point Likert scale (ranging from 1 = 'I am not interested at all in doing this' to 4 = 'I am very interested in doing this') were condensed to a general factor of their individual activity-related interest.
Students' general enjoyment was assessed using three items, e.g. 'Chemistry is fun' (Pekrun, 2018). Science self-concept was assessed using eight items from different studies (e.g. Marsh et al., 2008;Schöne et al., 2012). For both construct, each item is rated on a 4-point Likert scale (1 = 'I totally disagree' to 4 = 'I totally agree'). The reliabilities of the three scales were analysed by calculating Cronbach's α. The results indicate a sufficient reliability for each scale (α = [.84, .88], Table 1). Personal variables such as students' gender, age, and their last chemistry grade were assessed with single items. All students identified themselves as female or male, no other identities were mentioned.
The questionnaire after the interventions included evaluations of the three interventions described above (see supplemental material, Table S4). Participants were asked to rate how strongly they agreed with each statement on a 4-point rating scale (from 1 = 'I strongly disagree' to 4 = 'I strongly agree'). Each evaluation contained the following statements: 'Intervention X was helpful for me to gain new knowledge', 'I enjoyed working on the task' and 'I would like to work on similar tasks more often.' Because of the associated experiment to I2, the items were slightly adapted and a fourth item was added: 'The rubric showed me the aspects of experimenting that I master well and the ones with room for improvement.' Reliability was again calculated using Cronbach's α. The results indicate an abundant reliability (α = [.77; .79], Table 1).

Treatment of missing data
We used multiple imputation (MI) to handle the missing data and conducted all analyses on the basis of the imputed data because some of the variables contained a substantial number of missing values (Table 1; Schafer & Graham, 2002). Specifically, following recent recommendations in the statistical literature, we used multilevel substantivemodel-compatible MI which allowed us to take both the nested structure of the data and the hypothesised interaction effects into account (for a discussion of these methods, see Enders et al., 2020;Lüdtke et al., 2020). To implement this procedure we used the R package mdmb  and generated 100 imputations (Graham et al., 2007). The convergence of the imputation algorithm, which we checked with theR statistic and by investigating diagnostic plots, was satisfactory witĥ R values strictly below 1.05. To pool the results across the imputed data sets, we used Rubin's rules as implemented in the R package mitml (Grund et al., 2016).

Data analysis
Multilevel analysis was calculated using the statistical software R (version 3.5.1) with the packages lme4 and mitml (Bates et al., 2015;Grund et al., 2019;R Core Team, 2018). This method considers variabilities in nested data, e.g. like students nested in classes. In the context of the current study, the ratings of situational interest (Level 1) are nested within students (Level 2) which are nested in classes (Level 3; see Figure 2). In a nutshell, the central idea underlying the application of multilevel analysis to the collected data is to study the simultaneous effects of variables at the levels of single interest ratings (e.g. specific interventions preceding these ratings), students (e.g. individual characteristics like students' self-concept), and classrooms (e.g. the time course of lessons) on students' individual interest by means of 'regression-type models that comprise error terms for each of those levels separately' (Snijders, 2011, p. 880). Ignoring the hierarchically nested structure of the data set could lead to a violation of independence assumptions 37.08 M = mean; SD = standard deviation; α = Cronbach's alpha, all items were rated on a 4-point rating-scale. a Higher values indicate higher expression of the scale. b The first intervention was administered twice (T1 and T2). c Lower values indicate higher achievement ranging from 1 = best to 6 = worst, as common in Germany. and to biased results (Snijders & Bosker, 2012). Another advantage of multilevel analysis is the consideration of within-and between-person variables. In the present analysis, the repeated measurement of students' situational interest represents a within-person variable, while students' individual interest, enjoyment, and self-concept represent between-person variables.
Due to the small sample size at Level 3, the multilevel analyses only took the first two levels directly into account (i.e. measurement occasion nested in students). School-and class-level could not be taken into account. However, we used group mean centreing to control for the differences at the class level, respectively, by centreing all Level-1 and Level-2 variables at their class means. A random intercept model is calculated to answer Research Question 1. 3 In this model the slope is fixed but the intercept varies across the students. This model is extended to answer Research Questions 2 with cross-level interaction effects. These interaction effects are calculated between two variables on different levels and the dependent variable (measures of situational interest).
With regard to Research Question 3, the hierarchical structure of the data could be ignored because all relevant variables are assessed on the same level (Level 2). For this purpose, structural equation modelling is used with the evaluations of the interventions as dependent variables. These are predicted by gender, chemistry grade, science selfconcept, enjoyment, and individual interest. Estimating these interrelations while taking measurement error into account is a benefit of the SEM framework, as opposed to manifest approaches such as analysis of (co)variance or path analysis (McArdle, 2009). The statistical software R (version 3.5.1) was used with the package 'lavaan' (R Core Team, 2018;Rosseel, 2012).

Descriptive statistics and preliminary multilevel analysis
Descriptive statistics are depicted in Table 1. The variables individual interest, enjoyment, and science self-concept were measured in the questionnaire before the interventions. The interventions were rated by the students during questionnaire after the study was conducted and the measures of situational interest were taken continuously throughout the study. To assess the impact of the different interventions on the students' situational interest, a multilevel model was prepared by stepwise adding groups of predictors that are assumed to impact students' situational interest (Table 2, Mod0 and Mod1). First, the null model was calculated to analyse the between-and within-group variance. The intraclass correlation (ICC = 0.177) indicates that 17.7% of variance is accounted for by the student level. Afterwards, time 4 as a control variable and learner characteristics (grade, gender, enjoyment, individual interest and science self-concept) were added to the model (Mod1). Regarding the variable time, a significant, but small negative effect was identified (β = −0.05, standard error (SE) = 0.01, p < .001), indicating that students' situational interest decreased slightly over the course of the unit. Appending gender and chemistry grade as predictors revealed no effect of these variables on the students' situational interest. While enjoyment (β = 0.13, SE = 0.05, p < .01) had a positive effect on situational interest, no impacts of individual interest (β = 0.08, SE = .04, p = .083) and science self-concept (β = 0.09, SE = .05, p = .061) were found. To assess the overall explained variance of the model independent of the level, R 2 1 (SB) is used (Snijders & Bosker, 2012). The previous predictors explain 13.9% of the overall variance. Taking a closer look solely on Level 2, as all predictors are Level 2 variables, a proportion of 68.3% of variance is explained. The more detailed model (Mod1) fits the data best (see Table 3). Table 2. Multilevel analysis predicting students' situational interest by time, individual characteristics, three interventions (I1 with two administrations T1 and T2, I2 and I3), students' evaluations of these three interventions and interactions between evaluations and situational interest during the interventions in terms of standardised regression coefficients based on random intercept models. To which extent does each formative assessment-intervention effect the students' situational interest?
In a next step, the three interventions were added as dummy codings to the model (Mod2 , Table 2). Consequently, all phases in which none of the three interventions were implemented serve as a reference category in the sense of a baseline of students' situational interest during the unit. The dummy variables, thus, indicate the intervention-related change in students' situational interest. As I1 was administered twice, in the beginning and in the end of the unit, two dummy codings for the LBB were added. The first administration (I1-T1) has a large negative effect (β = −0.63, SE = 0.07, p < .001) on students' situational interest. This effect is smaller for the second administration (I1-T2), but remains negative (β = −0.17, SE = 0.08, p < .05). For I2 (rubric on phases of scientific inquiry), no effect on the situational interest is observed (β = −0.06, SE = 0.08, p = .470). Hence, this intervention does not increase or decrease students' situational interest compared to the baseline (i.e. the non-intervention lesson phases). In contrast, students report substantially higher situational interest in the third intervention I3 (β = 0.98, SE = 0.08, p < .001). This model fits better than Mod1 (Table 3).
To which extent are the effects of the interventions on the students' situational interest moderated by the students' evaluation of these interventions and their enjoyment?
First, the evaluations of the interventions were added to the model (Mod3 , Table 2). Only the evaluation of I2 is significantly related to the students' situational interest. The positive effect indicates that students who rated the rubric better also showed a higher situational interest (β = 0.10, SE = 0.04, p < .01). Next, cross-level interactions between the intervention and the related evaluation were calculated (Mod 4). A positive interaction effect between the students' situational interest during the different interventions and their evaluations would imply that students who rated the interventions more positively benefited more from these interventions. On the contrary, a negative interaction effects represents that students who rated the intervention negatively benefited more from the related intervention. The results indicate that students who perceived the intervention LBB (I1-T2, β = 0.20, SE = 0.08, p < .05) more helpful and positive also experienced higher situational interest during these interventions. This is neither valid for the first application of the LBB (I1-T1; β = 0.01, SE = .08, p = .818) or Kahoot! (I3; β = 0.22, SE = .11, p = .060), nor for the rubric (I2; β = −0.10, SE = .08, p = .210). This model fits better than Mod3 (Table 3). Also, cross-level interactions between the interventions and students' perceived enjoyment were added to Mod4, but no significant effects prevailed. 5 This indicates that the amount of situational interest which students experience during the interventions seems to be unrelated to the general enjoyment that students experience during chemistry lessons.
To which extent are the evaluations of the interventions associated with students' gender, grade, science self-concept, individual interest, and initial enjoyment?
To gain deeper insights into the triggering process of situational interest, we were also interested in analysing which affective and motivational constructs influence students' evaluations of the interventions. Structural equation modelling revealed differences in the perceptions of the interventions by gender (Figure 3). Male students evaluated the LBB (I2; β = 0.21, p < .05) and Kahoot! (I3; β = 0.25, p < .01) more positively than female students. No gender effect was observed for the rubric (I1). There were also differences between the interventions with regard to the chemistry grade: While students with a higher grade (i.e. lower achievement) rated the interventions LBB (I1) and the rubric (I2) better, no such effect could be found for the evaluation of the Kahoot! quiz (I3).
Analysing the affective-motivational constructs revealed additional differences between the evaluations of the interventions: While students with a higher individual interest rated all interventions higher (r = [0.27; 0.32]), a higher enjoyment only affected the evaluation of the rubric positively (β = 0.29, p < .05). The students' science self-concept had no influence on any evaluation. The proportion of explained variance varies across the interventions: While it is relatively high for the rubric (I2, 31.5%) and LBB (I1, 27.5%), it is at the same time notably lower for Kahoot! (I3, 14.2%).

Discussion
The scope of the study entails examining the effectiveness of formative assessment as a trigger of situational interest and factors that potentially influence its perception.
Three different examples of formative assessment were implemented in regular science lessons to adequately reflect the variability of this method. The results revealed that only one of the three methods of formative assessment revealed a positive effect on students' situational interest. The evaluations of the interventions were analysed further in order to gain more knowledge of the triggering-process. The effects of grade, gender, individual interest, enjoyment, and science self-concept on the evaluation of the interventions vary across the different interventions.

Impact of interventions on students' situational interest
The comparison of the three formative assessment interventions (RQ1) indicates that they address students' situational interest to varying degrees: While the quiz triggered students' situational interest, which is in line with other research (Wang & Tahir, 2020), the other interventions did not (I2) or even had a negative impact on situational interest (I1). Hence, not all methods of formative assessment foster situational interest, contrary to our hypothesis. Yin et al. (2008) reported similar results for their formative assessment interventions. They attributed their results to several reasons, one of which was that the teachers were not involved in the process of developing the interventions. Instead, they were only asked to implement the prepared intervention materials in their lessons. Especially regarding the rubrics, this might be problematic: If the rubric was designed in cooperation with students, both teachers and students can jointly define common learning goals, which can be helpful for students by fostering not only interest, but also achievement (Granbom, 2015). Justifications based on design and content are also conceivable. LBB and rubric are, in contrast to the game-based interactive quiz, more similar to traditional classroom activities and therefore maybe not perceived as more interesting, in terms of innovativeness, than the other classroom activities (Kickmeier-Rust et al., 2014). However, quizzes have also limitations, e.g. more complex tasks cannot be integrated well. Also, the quiz only required students to choose between different answer options, while the LBB and the rubric required students to express their own thoughts and knowledge when writing down their answers. Each students' appreciation for these different demands might also reflect their interest in engaging in these activities.
Students who reported higher situational interest during the LBB (I1-T2) also better evaluated this intervention (RQ2). This is neither valid for the rubrics nor for the Kahoot! quiz. Nevertheless, students who generally reported higher situational interest rated the rubric better. The differentiated perception and evaluation of the interventions can be explained in more detail by taking the learner characteristics into account.

Impact of learner characteristics on students' evaluations of interventions
To consider only the students' enjoyment did not lead to meaningful results (RQ2). Contrary to our hypothesis, no significant interaction effects prevailed. Because of the reported positive correlations between interest and enjoyment (Ainley & Ainley, 2011), we assumed that students with higher enjoyment would profit higher from the interventions than students with lower enjoyment. In this study, however, the measurement of enjoyment was conducted at the beginning and therefore represents the average expression as a trait during chemistry lessons. Enjoyment, nonetheless, has also a situation-specific proportion (Pekrun, 2006). The measurement in this study does not determine the amount of enjoyment experienced in a specific situation (e.g. during the interventions) which could be the cause for the non-significant interaction effects.
Taking other learner characteristics into account might also be important for the perception of the interventions: This approach was extended and besides enjoyment, grade, gender, science self-concept, and individual interest were contemplated as predictors of the students' intervention evaluation (RQ3). Here, male students rated the quiz and LBB better than female students on average. Due to the relative novelty of the software Kahoot!, the previous findings about gender differences in the perception of interactive quizzes (especially Kahoot!) are unclear. However, effects of gamification in connection with formative assessment on interest were already examined. The element of competition was identified to be more interesting to male students (Kickmeier-Rust et al., 2014). This element is very prominent in Kahoot! because an automated ranking is created after each question during the quiz. Previous research showed that game-based learning can foster interest independently of prior achievement (Yeh et al., 2019). This result was replicated in the current study specifically for Kahoot!. The other interventions, however, were rated higher by students with lower achievement which could indicate that these interventions are particularly appealing to this group of students. This is in line with prior research on formative assessment (Cauley & McMillan, 2010). As a limitation, it must be mentioned that achievement was assessed in terms of students' previous chemistry grade, not by means of an achievement test.
Positive effects of individual interest on the evaluations of each intervention were found, consistent with the research hypothesis. All three interventions are therefore rated higher by students who already have a high initial interest in chemistry. Interestingly, the interventions were perceived as more helpful and interesting by low-achieving students, but not by students with low individual interest. Because of the strong positive couplings between individual interest and self-concept (Denissen et al., 2008), the latter one was expected to have similar effects on the students' evaluations. This assumption could not be confirmed. Rather, no effects of the students' self-concept were found. This result is contrary to findings from Durik et al. (2015) who emphasised the role of the students' self-concept as a moderator of perceived situational interest.
This study provides evidence that the categories in which triggers (e.g. formative assessment or feedback) are often classified are too rough. The results indicate that formative assessment methods are not per se perceived as interesting by the students. Comparing the results of this study with prior research also reveals that learner characteristics influence the emergence of situational interest to different degrees in different situations. This conclusion is supported by Grünkorn et al. (2020) who found in an international comparative study that the perception of lessons is not solely dependent on the quality of the offer (in this case, quality of the interventions), but also to an substantial degree on the learner.

Limitations
The study was conducted in a part of a federal state in Northern Germany. Moreover, all students attended the Gymnasium, which is the highest track of secondary schools in Germany. Thus, the results are not representative for the federal state nor for the student population in secondary education.
The measurement of situational interest consisted of one item which might be a restriction as well. We adapted this measurement from Palmer (2009) to catch the instable and transient state of situational interest and to disrupt the lessons as little as possible. In terms of practical implementation, the standardisation of the interventions is limited by the fact that the interventions were taught by the teachers. Before the specific lesson they were precise instructions and the implementations were observed by the researcher who also recorded the other lessons, but naturally the exact implementation varies from teacher to teacher. This decision was made again in order to disrupt the lessons as little as possible. Due to study design it was not possible to create a control group with students who did not receive the formative assessment interventions. Consequently, no statement can be made about whether or not students improved because of these interventions or about any effects on the students' learning process. Subsequent studies might want to focus on both students' perceptions of the interventions as well as learning effects due to the interventions.
Moreover, the study is limited in that measures of situational interest were only selfreports. Despite their significance, the correlations reported in the SEM are rather low. Subsequent studies could include more open questions or interviews to gain more insights into students' reasoning when evaluating classroom interventions. Also, person characteristics like gender were added as a dichotomous variable in the analysis. While no additional gender identities were mentioned by students and, thus, could not be incorporated in the analysis of this study, representing a spectrum via a categorical variable might limit the generalizability of the obtained results with regard to these rather broad and certainly heterogeneous groups of students. Here, capturing a more differentiated representation of students' social identities might provide a more detailed picture of the triggering process of situational interest on the individual level.

Conclusions and implications
The current study provides further insights into the triggering process of interest. A more fine-grained examination of this process is the goal of recent interest research (Durik et al., 2015;. In accordance with recent research this study shows that triggers of situational interest are perceived differently by students. The perception depends on several learner characteristics and on the situation itself. Results of this study support the assumption that this process is rather complex. It provides clear indications that potential triggers should be examined in more detail to better understand for whom these triggers actually provide support to promote and sustain interest over the long term. These insights are needed to gain even deeper insights into effective ways to promote and maintain interests for larger parts of the student population.
Moreover, it was elucidated that formative assessment has the potential to foster situational interest during chemistry lessons. However, not every method is equally effective.