Investigating students’ course performance by groups’ gender compositions

Instructors and researchers have been exploring the eﬀects of students’ groups in physics class on their experiences and course performance in a variety of ways. Here, we analyzed students’ course performance based on the gender composition of the studio group they worked in throughout a full semester. Data were collected from one semester of two calculus-based introductory courses at an engineering university. Students regularly attended studio sections where they worked in groups of three in problem solving and hands-on activities. We explored students’ exam and non-exam course performances based on the gender composition of their studio group and found little to no diﬀerences between students’ course performances based on the gender composition of their groups. Including measures of test conﬁdence and sense of belonging mitigated average diﬀerences in exam performance between men and women. We ﬁnd no evidence to support particular gender compositions of studio groups for improving students’ course performance.


I. INTRODUCTION
Effective group composition in active engagement physics classrooms has been a topic of discussion for at least the past three decades.Yet, studies find different outcomes, suggesting different classroom populations and environments may call for a variety of different techniques for approaching group compositions.However, some of these differences are due to the myriad of potential ways to define group composition, such as students' relative exam performance, incoming physics experience, gender, and preferences for group dynamics.This is in addition to differences in pedagogical approaches and many possible student outcomes to analyze (e.g., persistence in a course sequence, affective experiences, course performance).In this study, we explore students' exam and non-exam course performances based on the gender composition of their studio group.We find little to no differences between students' course performance based on the gender composition of their groups.
Some of the existing research on group composition in introductory physics classes focuses on groups' gender [1,2] and physics ability [1][2][3] compositions and analyses a dimension of course performance (via exams or standardized diagnostics) as an outcome.For example, Heller and Hollabaugh proposed that cooperative groups are, on average, most effective with students with different incoming levels of physics ability (defined by individual exam scores) and that women are more successful in a course when grouped together [1].Contrastingly, Callan et al. compared groups with heterogeneous and homogeneous physics abilities (defined by pre-instruction FMCE scores) and gender compositions and found no differences in students' post-instruction FMCE performance [2].
Other studies, especially in the introductory physics lab setting, find that students fall into gendered roles while working in groups.For example, Quinn et al. identified that men and women implicitly take up systematically different roles within their lab groups when a variety of roles are available within a group [4].Similarly, Doucette et al. found that some roles that women tend to assume in mixed gendered groups may inhibit growth of students' physics identity [5].These studies suggest that gender composition of groups may significantly change students' experiences in introductory physics classes.
In several biology courses, Ballen et al. demonstrated that exam and non-exam performance vary between men and women, with men, on average, outperforming women on exams and women outperforming men on nonexams [6].The exam performance differences are mitigated by taking into account students' self-reported test anxiety.Studies have documented that women experience higher frequencies of test anxiety than men in a variety of undergraduate-level disciplines [7][8][9], which suggests that similar results to Ballen et al. may appear in introductory physics.However, Stang et al. found that in an introductory physics course test anxiety did not measurably contribute to students' final exam scores when including a self-efficacy predictor [10], indicating that other attitudinal experiences than test anxiety may play a more prominent role in students' course performance.
Several studies highlight the importance of students' feeling of belonging in physics and university-level courses (e.g., Refs.[11][12][13][14][15]), and mens' and womens' sense of belonging in introductory physics courses may differ within the same course (e.g., Ref. [15]).For example Rainey et al. found that students who persist in a physics major report greater sense of belonging than those who choose to leave, with perceived competence as a contributor to belonging [13].Students' affective experiences and performance in these courses often coincide with a strong sense of belonging in the course [11][12][13][14][15].
Therefore, in this study we also focus on gender composition of groups with the inclusion of measures of test confidence and sense of belonging because they have been shown to differently affect mens' and womens' course performance in science courses.We provide a preliminary example of the relationships between students' inclass group composition, self-reported testing confidence, sense of belonging, and course performance, separated by exam and non-exam performance.In doing so, we intend to highlight avenues for researchers to investigate decisions about group composition in introductory physics courses.

II. DATA AND CONTEXT
Data were collected from one semester of two calculusbased introductory courses in mechanics and electricity and magnetism (E&M) at Colorado School of Mines, an engineering university.In these courses, students attended (either remotely or in-person): (1) discussion sections where an instructor led the students through activities such as clicker questions, individual problem solving, and short discussions with peers and (2) studio sections where students worked in groups of three in group problem solving and hands-on activities.Studio groups were assigned at the beginning of the semester and maintained throughout the semester, except in the case of students withdrawing from the course.Students who opted to take the course fully remote were grouped with students who were attending in-person.Otherwise, groups were randomly assigned with no regard to student demographics or incoming physics experience.
As part of the course, students took surveys at the beginning and end of the course that included items intended to measure students' feelings of test confidence and belonging in the course.For example, an item pertaining to test confidence (when reverse scored) prompts students to rate their level of agreement with "I have an uneasy, upset feeling when I take a physics test."An item pertaining to sense of belonging is "I feel like I can be myself in this physics class."The items are part of a larger set of items developed by Singh's group and results from subsets of items have been presented in studies such as Refs.[16][17][18].We performed exploratory factor analyses on students' responses to the items and found that items factored similarly to those found by Singh's group (e.g., Ref. [16]).Therefore, we made no modifications to the items corresponding to test confidence and belonging.To score test confidence and belonging, we averaged students' individual scores on items pertaining to those constructs.Each item had options ranging from strongly disagree (scored as 1) to strongly agree (scored as 5).We reverse scored test confidence items because agreement with the items were indicative of test anxiety rather than confidence.Possible scores for test confidence and belonging range from 1 to 5.
Demographic data were from the university registrar, to which students self-reported their demographic data.Unfortunately, the registrar only allows a binary selection of gender, so our analyses treat gender as a binary.In our analyses, we then used students' gender relative to their studio group gender-composition as predictors: (1) single-gender (SG), (2) mixed gender with predominantly women (MGW), and (3) mixed gender with predominantly men (MGM).We excluded students from groups with gender-balanced composition due to small numbers of participants; there were no research consenting students in these groups in mechanics and only 6 men and 7 women in E&M.
We modeled exam performance and non-exam performance to distinguish whether differences occurred across assessments of varying stakes.Exam performance is modeled using a repeated measures mixed model with random intercepts to account for variations in individuals' exam performances.We excluded final exam performance from these analyses because both courses had an optional final exam, so most students opted out of the final exam.Non-exam performance is modeled using multiple linear regressions and is treated as an overall outcome.We express all results as scores out of 100% for ease of interpretation for the reader.
For both exam and non-exam performance outcomes, we tested four models.This study focuses on the effects of gendered group composition on students' performance, so we consistently include gender relative to group composition in all models.We included combinations of test confidence and belonging in three of the four models because they are both closely tied to course performance outcomes in prior studies.These choices in models allows us to disentangle possible effects of group composition, test confidence, and sense of belonging for course performance in this population.

III. RESULTS
Table I shows descriptive statistics for students' course performance with separations by gender and group composition.On average, women tended to outperform men on non-exam course components, however, men outperformed women on exams.Consistent trends based on Table II shows the average scores of test confidence and belonging for the same populations of students.Across both courses, women systematically self-reported less test confidence, which implies higher frequencies of test anxiety compared to men.For sense of belonging, men tend to feel stronger belonging in the course than women.However, the difference in average belonging is less than that of test confidence.For our population, test confidence and belonging are substantially correlated (Pearson's coefficient of 0.68 for mechanics and 0.55 for E&M).Similar to course performance measures, there are no apparent trends that indicate group composition is correlated with test confidence or belonging.

A. Exam performance
Tables III and IV show predicted exam performance for mechanics and E&M, respectively.In both courses, the inclusion of the test confidence score mitigates gender differences, which agrees with results of analyses in biology education research [6].However, a slight trend may persist in E&M for women to score lower than men even when test confidence is taken into account.Belonging appears to serve a similar role as test confidence, likely due to the relatively high correlation between the two.However, with the inclusion of belonging, in E&M the gender difference trend also remains but, again, appears to mostly mitigate differences in mechanics.When  both test confidence and belonging measures are included (Model 4), results are similar to including only one of the measures.In these analyses, there is no indication that the gender composition of the studio groups correlates with students' exam performance.

B. Non-exam performance
Tables V and VI show students' performance on nonexam components of the courses, as weighted by course syllabi.There is a ceiling effect for scores across both courses (explaining the poor model fitting), which is expected with grades that are substantially participationbased.Women tended to perform better than men on non-exam components of the course with the exception of men in women-dominant groups in E&M.This effect remained even with the inclusion of students' sense of belonging in the course, suggesting that on average women outperform men on non-exam course components regardless of their group's composition.Perhaps expected, testing confidence did not measurably affect students' nonexam performance in E&M.However, students' sense of belonging in the course consistently was a strong predic- tor of non-exam performance in E&M.In mechanics, test confidence appears to play a measurable role, however, we suspect this may be due to the correlation between test confidence and belonging.We suspect that students who perform better on non-exam components have a network of support and/or collaborators within the course, which leads to a strong sense of belonging.

IV. DISCUSSION
In this study, we found no measurable differences in course performance according to the gender composition of students' studio groups.In both mechanics and E&M, men tended to outperform women on exams but women outperformed men on non-exam components of the course.Students' self-reported test confidence and sense of belonging substantially mitigated gender differences in exam performance, which suggests that the role of various experiences in physics classrooms affects students' performances in measurable ways.However, treating test confidence and sense of belonging as continuous variables may obscure some features of the data and may play a role in the null results.
Many of the course structures are intended to reduce test anxiety for students.In most semesters, students take four exams with each weighted at about 15% of their grades.Collectively exams count for a substantial part of overall grades, however, each individual exam is relatively low stakes.Leading up to an exam, we provide many supports for studying including practice exams and multiple types of review sessions.However, we intend to explore additional ways to support students in reducing test anxiety in these courses, such as voluntary workshops that are tailored to our introductory physics courses.
Our courses are implicitly intended to promote students' sense of belonging by providing ample opportunities to interact with peers and instructors in both studio and discussion sections.However, we have not explicitly examined our course structures for building belonging.Neither course grades on a curve so that students are encouraged to collaborate when working through homework and studying for exams.Instructors regularly encourage students to collaborate, both inside and outside of the classroom environments, and to attend teaching assistant and faculty office hours.Both courses use a variety of assessment methods, many of which are low-stakes and based on in-class collaboration with peers.We hope to investigate how different course structures support or detract from students' sense of belonging to further develop course structures to support belonging.
Our work is limited by few participants who were in particular group compositions and by use of registrar data for identifying students' gender.For example, in mechanics there were very few participants in MGM groups.Due to the typical three-person group size, we also have very few to no groups that were mixed gender in equal proportions.Additionally, the survey items for test confidence and sense of belonging were administered only at the end of the semester after the exams.Students' experiences could fluctuate throughout the semester depending on many influences such as course content, how studio group members are engaging, and support within the course as a whole.Lastly, the study takes place during the COVID-19 pandemic, so the course structure is quite different and relied on studio groups to communicate to each other across a video platform.Developing effective group dynamics in this environment may require new sets of skills that college students and instructors had not yet had to develop compared to most prior studies within physics education research.
Developing effective grouping in courses with collaborative components remains an open avenue for investigations in physics education research.Here, we find little evidence to support particular gender compositions of studio groups for improving students' course performance.However, much of students' experience in a course may be due to their interactions with their peers, especially those that they frequently work with inclass, and may not directly affect performance outcomes.Simultaneous investigations of students' course performance and affective experiences may begin to provide insight to how various group compositions affect students' experiences in physics.

TABLE II .
Test confidence and belonging by group composition.Possible scores range from 1 (low test confidence or sense of belonging) to 5 (high test confidence or sense of belonging).

TABLE III .
Mechanics repeated measure mixed models predicting exam performance.Scores are reported out of 100%.Only 13 students were in MGM groups, so we also removed those students from this analysis.The base for comparison is men in single-gendered groups.