Gender gaps in the performance of Norwegian biology students: the roles of test anxiety and science confidence

Understanding student motivational factors such as test anxiety and science confidence is important for increasing retention in science, technology, engineering, and math (STEM), especially for underrepresented students, such as women. We investigated motivational metrics in over 400 introductory biology students in Norway, a country lauded for its gender equality. Specifically, we measured test anxiety and science confidence and combined students’ survey responses with their performance in the class. We found that female students expressed more test anxiety than did their male counterparts, and the anxiety they experienced negatively predicted their performance in class. By contrast, the anxiety male students experienced did not predict their performance. Conversely, men had higher confidence than women, and confidence interacted with gender, so that the difference between its impact on men’s and women’s performance was marginally significant. Our findings have implications for STEM instructors, in Norway and beyond: specifically, to counter gender-based performance gaps in STEM courses, minimize the effects of test anxiety.

as of this writing has a female prime minister leading a cabinet of 45% women, including ministers of finance, foreign affairs, and higher education. Yet, as women move through the academic and career trajectory, they become less represented due to myriad barriers to retention. Females outnumber males in almost all collegelevel subjects in Norway, except STEM subjects (other than biology), in which almost 70% of the students are males (Ministry of Education and Research, 2015). Even in disciplines that have relatively high female enrollment at the undergraduate level (e.g., non-STEM, biology), women are still underrepresented at the higher levels (e.g., professors, top administrators), and this phenomenon was implicated in a recent national survey of biology students and teachers (Hole et al., 2016). Given the global demand for STEM professionals (e.g., Caprile, Palmén, Sans and Dente, 2015;National Science Board, 2015), these disparities can cause concern. The uneven female-male ratio (especially in high-status positions) is in itself a barrier to recruitment, and to equalize the field, it is important to first identify mechanisms that hinder or prevent female participation and retention in STEM and then develop instructional interventions for overcoming these. A relatively gender-equal society such as Norway provides an interesting test case for identifying and investigating the underlying causes for the less obvious and therefore more implicit barriers to progression in STEM.
Science confidence (Bussey & Bandura, 1999;Dix, 1987;Fenollar et al., 2007;Cotner et al., 2011;Nissen and Shemwell, 2016;Sawtelle, Brewe and Kramer, 2012) refers to a student's perception of their own abilities to execute specific scientific tasks and is closely related to self-regulatory learning and self-efficacy (Stankov, Lee, Luo and Hogan, 2012;Ainscough et al., 2016). Confidence plays a vital part in females' persistence, retention, and performance in STEM subjects (Macphee, Farro and Canetto, 2013;Lundeberg, Fox and LeCount, 1992), and in general, studies find that females tend to have less science confidence than males (Cotner et al., 2011;Trujillo & Tanner, 2014;Robnett et al., 2015;Ballen, Wieman, Salehi, Searle and Zamudio, 2017). Several theoretical explanations for framing the relationship between confidence, performance, and retention have been suggested, including stereotype threat (Steele 1997;Wheeler and Petty 2001;Cohen and Garcia, 2008)-whereby an awareness of a negative stereotype is subconsciously felt and operationalized-and social cognitive career theory (Bandura 1986;Lent et al., 1994)-whereby a perceived lack of belonging in a discipline informs an individual's self-evaluation and sense of a future in that discipline.
Test anxiety is defined as "the set of phenomenological, physiological, and behavioral responses that accompany concern about possible negative consequences or failure on an exam or similar evaluative situation" (Zeidner 1998). Due to performance pressure, social pressure, and time constraints, higher levels of test anxiety may reduce performance (Lundeberg et al., 1992;von der Embse, Jester, Roy and Post, 2018). Several theoretical perspectives have been advanced for framing studies of test anxiety (Zeider, 2010;Sommer and Arendasy 2014), for example a cognitive-interference approach to this phenomenon. According to cognitiveinterference theory, the experience of test anxiety diverts mental resources (e.g., short-term memory, cognitive processing, problem solving) that are otherwise needed for test-taking (Zeidner 2010;Eysenck et al 2007;Sarason 1984). Significantly, test anxiety may not be felt equally by all students, and its impacts may vary by student characteristics. Studies in the USA indicate that underrepresented minority and female students in STEM courses exhibit more test anxiety than do their nonminority or male counterparts (Payne, Smith and Payne, 1983;Hembree, 1988;Cassady and Johnson, 2002;Chapell et al., 2005;Ballen, Wieman, Salehi, Searle and Zamudio, 2017;von der Embse et al., 2018;Harris et al., 2019). Further, Ballen, Salehi and Cotner (2017) and Salehi et al. (2019) have demonstrated that test anxiety in women-but not in men-is negatively and significantly associated with performance on exams, possibly explaining some of the performance gaps that have been documented in STEM fields (e.g., Koester, Grom and McKay, 2016;Matz et al., 2017). Harris et al. (2019) found nominal gender differences in reported test anxiety and no gender-specific effect of test anxiety on performance in a large biology class, but there was no gender gap in performance in the class under study, and hence no problem to be solved.
In this study, we draw on survey, demographic, and performance data from 3 years of an introductorybiology course at a large university in Norway to explore the possible gender-specific impacts of-and interactions between-test anxiety and science confidence. Our specific research questions were: 1. In this sample of biology students, are there gender differences in this sample of biology students in test anxiety, science confidence, and performance? 2. If performance differences exist, does test anxiety or science confidence predict performance in ways that can explain these differences?
It is especially important to understand these effects because confidence and test anxiety are at least potentially responsive to interventions while other student characteristics (e.g., gender) are less so.

Participants and procedure
The present study is part of a larger project including video recordings of lectures, assessment of teachers, and student surveys initiated by the bioCEED Centre of Excellence in Biology Education (bioCEED, 2013) at the University of Bergen (UiB). The present study reports data collected in three sections of an introductorybiology course taught by the same instructor in Fall 2016, Fall 2017, and Fall 2018. Participants were over 400 undergraduate students in biology. All students were asked to provide gender information. We acknowledge that gender is a complex social and biological construct, and thus the students were given the possibility to specify their gender identity if it did not fit into the category of male or female. However, none of the participants identified themselves as other than male or female, and thus the sample was collapsed into a dichotomous variable. Gender distribution was 36% males and 64% females. The instructor of the course is male.
Critically, the focal course is taught by an acclaimed professor who typically implements evidence-based pedagogies in class. Students have multiple opportunities to contribute in class, via small-group and large-group discussion and an electronic classroom-response system, and tests employ a variety of assessment techniques.
Participants were recruited in class. The students completed a pre-course survey in the first week of the term. Students were informed about the general purpose of the study-without any reference to gender-and that their participation was voluntary. Students also consented to having their survey responses matched, by a third party not involved in the research, with their performance in the course and their overall high-school score (overall high school score refers to the average grade derived from final assessment in each of the students' subjects, in addition to grades on the oral and written exams; the maximum score is 60). The final year the survey was administered online, but students were given time in class to complete the surveys on their web-enabled devices (computer, tablet, or phone).
Our study design was approved by The Norwegian Centre for Research Data. Specifically, students were informed that the data would be treated confidentially and anonymized in any publications and after the end of the project. Lastly, student participants had the opportunity to withdraw from the study at any time. No rewards were given for participation.

Test anxiety
We employed the 4-item measure for test anxiety retrieved from the short version of the Motivated Strategies for Learning Questionnaire (MSLQ: Duncan and McKeachie, 2005;Pintrich, Smith, Garcia and McKeachie, 1991). An item example is "I am so nervous during a test that I cannot remember facts I have learned." The participants answered on a 7-point Likert scale ranging from 1 (not at all true of me) to 7 (very true of me). Cronbach's alpha level for the composite scale was acceptable (0.841)-a finding consistent with prior work , Salehi et al., 2019. Since this measure was not proximal to any course assessment, we consider it a measurement of trait rather than state anxiety (von der Embse et al., 2018). Sixteen other items from the abbreviated MSLQ were included in the survey; however, responses to those items are not included in the current analysis or discussion.

Science confidence
We used a 13-item scale to measure students' confidence in comprehending, critically assessing, and communicating scientific concepts. The items of the scale are drawn and adapted from previous studies investigating students' science confidence (Lopatto, 2004;Seymour, Hunter, Laursen and Deantoni, 2004), though the validity of the scale was not separately evaluated for this population. The scale used in the present study has been employed among biology students and found reliable (Cotner et al., 2011;Cotner, Thompson and Wright, 2017). Participants answered on a 5-point Likert scale including: 1 (not confident), 2 (a little confident), 3 (somewhat confident), 4 (highly confident), and 5 (extremely confident). An example item is "presently, I am confident I can make an argument using scientific evidence." The 13-item scale produced a satisfactory alpha level (0.872). The science-confidence items are included in Supplemental File 1.

Academic performance
Student academic performance was measured by total points earned in the class, on a 0-100 scale. Point totals are a combination of performance on four exams distributed throughout the semester: (i) multiple choice and writing definitions, (ii) numerical competence with graphical visualization and interpretation of results, (iii) an oral five-minute presentation on a self-elected topic, and (iv) an essay plus short written explanations and definitions. Assessment, and hence the score, emphasizes communication skills, mainly writing and logic, in addition to disciplinary knowledge. Evaluation criteria and assignment types were identical across the 3 years of this study.

Analytical strategy
Our analysis explored the relationships between three predictor variables (gender, test anxiety, and confidence) and academic performance. Because the data in this study were nested in semesters, we used multilevel regression modeling, with class as a random effect, to control for within-semester correlation. For all Likert scale variables, we transformed the categories into numeric values and treated the dependent variables as continuous to facilitate interpretation. Non-parametric tests have yielded similar results to those we report (Murray, 2013;Norman, 2010). The threshold for statistical significance was set at p = 0.05, with p values between 0.05 and 0.10 regarded as marginally significant. Overall high-school score was our only measure of incoming aptitude and preparation, but reporting of this measure was too unreliable to allow us to include it in our statistical models (only about 1/8 of students in this study reported a high-school score).
Because our models lacked a measure of student incoming preparation (analogous to ACT or SAT scores, or GPA in previous classes), we did not expect the models to predict a great deal of the overall variation in total points. Instead, our interest was in sorting out gender-specific effects of particular covariates.

Results
Descriptive statistics showed that female students began class with significantly higher levels of test anxiety, but nearly identical levels of confidence, when compared to male students (see Table 1.) An independent-sample t test indicated that on average, female students in this class earned significantly more total points than male students did (female mean = 61.09, male mean = 57.37, p = 0.009).
Our initial mixed models produced a Hessian matrix error, indicating that the amount of variation in the outcome associated with the random variable "year" was very small, so that the random variable was not needed in the model. Accordingly, we proceeded with the analysis using ordinary least squares (OLS) regression. Because our main interest was in the differential effects of confidence and test anxiety for male and female students, we estimated separate OLS models for the genders.
Results indicated that pre-class test anxiety was negatively predictive of class performance for female students, with an effect size of about ¼ of a standard deviation, but test anxiety had no discernible predictive power for male students (Fig. 1). For women, each onepoint increase in test anxiety was associated with a 2.136 point decrease in total points (Table 2).
By contrast, pre-class confidence nominally predicted class performance for male students in a negative direction, with a marginally significant effect size of about 1/6 of a standard deviation, while confidence had no predictive power for female students. For men, each one-point increase in confidence was associated with a 3.535 point decrease in total points (Table 2).
To assess the significance of the different ways in which test anxiety and confidence affected the performance of male and female students, we estimated a model for both genders combined, which included interaction variables. This model showed that the interaction between gender and test anxiety was significant at the p ≤ 0.05 level, with female students disadvantaged relative to male students by the anxiety they reported. The interaction between gender and confidence was marginally significant (p = 0.051), with female students possibly gaining an advantage relative to male students through the confidence they reported (see Table 3). Although low N prevented us from including highschool points as a predictor in our regression models, we did examine the association between high-school points and our predictor variables of interest, namely test anxiety and confidence. These bivariate correlations suggest similar patterns as in the main models-opposite effects of both test anxiety and confidence on the performance of females vs. males but should be interpreted with caution since they are based on a much smaller sample than our other analyses-a sample which may differ from the larger group in unknown ways (Table 4).

Discussion
The present study has been a first step toward investigating motivational differences across gender in a Norwegian sample in higher education. The primary aim of this study was to test whether there are gender differences in two STEM-related motivational constructsscience confidence and test anxiety-in a relatively gender-equal society. We found significant gender differences in test anxiety but not in science confidence, and we found differences in how these constructs predicted learning outcomes for the two genders. While the scope of our study-a single instructor, for a single course, at a single institution in Norway-prohibits extrapolation to Fig. 1 Differential impact, by gender, of test anxiety on total points in the course. Note. For women, but not for men, test anxiety was a significant negative predictor of performance  Norwegian higher education in general, our findings can serve as an initial exploration into factors that may influence gender-based attrition in STEM. These findings also serve to undermine the hypothesis that the connection between test anxiety and gendered performance differences do not exist outside of the United States. First, female students started class with more test anxiety than male students did, and the anxiety they experienced negatively predicted their performance in class. By contrast, male students experienced less test anxiety than female students, and the anxiety they did experience seems unrelated to their class performance. These findings echo those of Ballen, Salehi and Cotner (2017) and Salehi et al. (2019), which suggest that female students may be subject to interference by test anxiety, "which explains depressed performance by identifying factors that disturb the process of information recall and utilization during testing situations" (von der Embse et al., 2018, p. 484). The ultimate impact of test anxiety in this sample of students did not contribute to a performance gap between men and women. Rather, in contrast to prior studies in the USA (e.g., Salehi et al., 2019), women outperformed their male peers, in spite of their higher test anxiety and its relationship to performance. The fact that women in this course did not underperform relative to their male peers may be a function of their sheer numbers (with more women than men, and many of these women having below-average test anxiety), the discipline (biology in Norway is not associated with the same gender-based challenges as some other STEM disciplines; e.g., physics, computer science), or the evidence-based pedagogy of the instructor (e.g., using diverse strategies to assess students). Further studies in STEM fields beyond biology, with faculty employing more traditional pedagogies, will shed light on the merits of these possible explanations.
Our data do not allow us to exclude entirely the deficit model, however, which proposes that test anxiety is the result of perceived deficits in preparation, skills, etc. on the part of students (von der Embse et al., 2018). The fact that anxiety was negatively correlated with highschool points for female students is some indication that a deficit model may explain some of the association of anxiety with class performance for female students, consistent with the findings of Salehi et al. (2019).
Second, male students started class with more confidence than female students did, and the confidence they reported was negatively (though not significantly) associated with their performance in class. By contrast, the confidence female students reported was irrelevant to their class performance. And confidence interacted with gender, so that the difference between its effects on the two genders was marginally significant. These data suggest that male students may be subject to an overconfidence effect, whereby attention and motivation are undermined by misplaced confidence in their own abilities (Marshman, Kalender, Nokes-Malach, Schunn and Singh, 2018). The fact that confidence was not correlated at all with high-school points for male students lends some credence to this supposition.
These findings are similar to the gender differences in confidence (Cotner et al., 2011;Nissen & Shemwell, 2016;Sawtelle et al., 2012) and certain motivational constructs (Glynn, Brickman, Armstrong and Taasoobshirazi, 2011) found in college students in the USA. These similarities are surprising; while there are certainly many cultural similarities between the USA and Norway, the status of women is different between the two countries according to a number of indicators (e.g., UNESCO, 2015) and we would have expected those gender differences to impact links between motivational factors, gender and academic performance. The fact that gender differences remain, and are similarly predictive, across different cultures, may suggest some biological basis to these differences. For example, men tend to be more confident with regard to almost everything; this phenomenon may be mediated by testosterone, a steroid hormone that is expressed far more in men than it is in women. Several studies have suggested a link between risk-taking (itself a proxy for confidence) and testosterone levels in both men (Booth, Johnson and Granger, 1999;Coates and Herbert, 2008;Sapienza, Zingales and Maestripier, 2009) and women (van Honk et al., 2004).
However, the literature (discussed above) documenting tractable impacts of the environment on performanceand gaps in performance-is extensive, and we hesitate to invoke biological explanations without ruling out environmental ones. Specifically, the classroom environment may foster the gender differences we have documented here. For example, instructors may harbor biases (e.g., implicit bias; Staats, 2015) and anxieties that lead to subtle behaviors impacting their students. Canning et al (2019) recently documented how the courses of STEM faculty with a "fixed" mindset respecting intelligence demonstrate greater performance gaps between underrepresented students and their well-represented counterparts. And Beilock, Gunderson, Ramirez, and Levine (2010) has illustrated that K-12 teachers' math anxiety negatively predicts their female students' math performance. Others have attested to the positive power of simply revealing one's own biases (Staats, 2015, Moss-Racusin et al., 2016; but see Kalev, Dobbin and Kelly, 2006). For example, Chang et al. (2019) documented attitudinal and behavioral changes associated with bias training, but their work suggests that meaningful change likely requires more than the one-off diversity-training sessions offered at many universities. Given the critical role of awareness, and the general perception of Norway as a gender-equal society, sustained bias training at places like University of Bergen may be warranted. Further, classroom environments vary with respect to gender-equitable participation, which may be a proxy for confidence and/or sense of inclusion (Caspi, Chajut and Saporta, 2008;Eddy, Brownell and Wenderoth, 2014;Ballen et al., 2019;Neill, Cotner, Driessen and Ballen, 2018). Ballen et al. (2019), in a multi-institutional study including biology courses in Norway, illustrated that smaller class sizes and diverse teaching methods were associated with gender-equitable in-class discussions. Thus, class size and pedagogy may also be associated with confidence and test anxiety, further impacting the performance and participation of women in STEM courses.

Limitations
There are several limitations worth mentioning when interpreting our findings, in addition to the single-instructor focus of this work discussed above. First, due to a lack of randomization and experimental data, we cannot infer causation. Future studies should investigate if females, compared to males, experience test anxiety in performance situations and how this manifests itself in performance and affect. Moreover, triangulation of the data (e.g., observational data, mixed-method) could have further accounted for some of the unexplained variance in the data. Second, our model is rather simple; future studies could elaborate on our model and include more motivational constructs. Third, given the low response rate on high-school entry grades, we were unable to investigate how prior achievement impacts test anxiety and science confidence. Last, we acknowledge that other unmeasured factors (e.g., cognitive differences, socio-economic status, and personality differences) could have served as mediators or predictors in our model.

Conclusions
Despite the limitations, the present study reveals some interesting relationships between science-related gender differences and motivational variables in a population that has thus far been unexplored along these dimensions. While in this particular course, the impact of test anxiety was not manifest in lower grades among women, that may not always be the case. Different courses, in different STEM disciplines, implementing different pedagogies, may yield different outcomes. Our future work aims to address this possibility. The fact that the instructor of the sampled courses is an award-winning educator implementing several evidence-based teaching strategies-group discussion, polling for formative assessment, and diverse testing strategies-may also limit the ability to extrapolate from our findings.
In light of our results, some practical implications can be suggested-especially in contexts in which the ultimate outcome of these interactions leads to a gender-based grade difference. Gender difference is a factor that biology teachers can be aware of, and, based on our regression analysis, we suggest implementing strategies to enhance students' science confidence and reduce test anxiety. Prior work has suggested that strategic use of role models, either in the class or as embedded examples, can reduce the gaps in confidence (Cotner et al., 2011) and retention (Bettinger and Long, 2005;Hoffmann and Oreopoulos, 2009) in STEM disciplines. Also, implementing active-learning techniques in the classroom may be especially beneficial for women and underrepresented minority students (Haak et al., 2011, b;Lorenzo, Crouch, & Mazur, 2006). However, because the interaction between gender and confidence was relatively weak compared to that between gender and test anxiety, an emphasis on test anxiety may deliver more positive results. Mitigating the impacts of test anxiety might increase students' performance (Ballen, Salehi and Cotner, 2017) and, consequently, their science confidence. Strategies could include allowing exam re-takes to reduce perceived risk, setting realistic standards on tests and examination grades, implementing writing exercises targeting testing (Ramirez and Beilock 2011), having several low-stakes tests (rather than a few high-stakes exams; Cotner and Ballen 2017), and helping students focus on intrinsic aspects of learning, as opposed to extrinsic aspects (Deci & Ryan, 1985;Hill & Wigfield, 1984).
Assuming these gender differences with respect to science confidence and test anxiety are consistent in future studies, for example in STEM disciplines beyond biology, the next steps are to implement strategic interventions explicitly targeting known deficiencies. While it may be relatively straightforward to investigate any relationship between variation in affective traits (such as self-beliefs, engagement, and motivation) and performance and retention, designing effective interventions is more challenging. Also, interventions that show promise in one context may not apply to others. Cross-cultural comparisons may help clarify which interventions are broadly applicable, as opposed to those that are restricted to certain populations.