For physics majors, gender differences in introductory physics do not inform future physics performance

Analysis of institutional data for physics majors showing predictive relationships between required mathematics and physics courses in various years is important for contemplating how the courses build on each other and whether there is need to make changes to the curriculum for the majors to strengthen these relationships. We used 15 years of institutional data at a US-based large research university to investigate how introductory physics and mathematics courses predict male and female physics majors’ performance on required advanced physics and mathematics courses. We used structure equation modeling (SEM) to investigate these predictive relationships and find that among introductory and advanced physics and mathematics courses, there are gender differences in performance in favor of male students only in the introductory physics courses after controlling for high school GPA. We found that a measurement invariance fully holds in a multi-group SEM by gender, so it was possible to carry out analysis with gender mediated by introductory physics and high school GPA. Moreover, we find that these introductory physics courses that have gender differences do not predict performance in advanced physics courses. In other words, students could be using invalid data about their introductory physics performance to make their decision about whether physics is the right field for them to pursue, and those invalid data in introductory physics favor male students. Also, introductory mathematics courses predict performance in advanced mathematics courses which in turn predict performance in advanced physics courses. Furthermore, apart from the introductory physics courses that do not predict performance in future physics courses, there is a strong predictive relationship between the sophomore, junior and senior level physics courses.

Analysis of institutional data for physics majors showing predictive relationships between required mathematics and physics courses in various years is important for contemplating how the courses build on each other and whether there is need to make changes to the curriculum for the majors to strengthen these relationships. We used 15 years of institutional data at a US-based large research university to investigate how introductory physics and mathematics courses predict male and female physics majors' performance on required advanced physics and mathematics courses. We used structure equation modeling (SEM) to investigate these predictive relationships and find that among introductory and advanced physics and mathematics courses, there are gender differences in performance in favor of male students only in the introductory physics courses after controlling for high school GPA. We found that a measurement invariance fully holds in a multi-group SEM by gender, so it was possible to carry out analysis with gender mediated by introductory physics and high school GPA. Moreover, we find that these introductory physics courses that have gender differences do not predict performance in advanced physics courses. In other words, students could be using invalid data about their introductory physics performance to make their decision about whether physics is the right field for them to pursue, and those invalid data in introductory physics favor male students. Also, introductory mathematics courses predict performance in advanced mathematics courses which in turn predict performance in advanced physics courses. Furthermore, apart from the introductory physics courses that

Introduction
Increasingly, universities are using evidence to improve student learning as well as to promote equity and inclusion so that students from diverse background and demographics can succeed [1][2][3][4][5][6][7][8][9]. Institutional data can play a critical role in understanding the successes and the areas where changes are needed in physics departments [10].
Holistic consideration of how physics departments are currently succeeding in supporting their undergraduate majors is crucial in order to make appropriate changes to the curricula and pedagogies for the majors based upon metrics informed by data and ensure that students are adequately supported and advised. These considerations include how prerequisite physics and mathematics courses predict performance in subsequent physics courses throughout the curriculum for physics majors, and such investigations are vital regardless of the theory of change [3,7] a physics department adopts and implements based upon its institutional affordances and constraints. At the same time, with advances in digital technology in the past decade, data analytics can provide valuable information that can be useful in transforming learning for all students [11,12]. This research is inspired by the fact that while many investigations in physics education have focused on evidence-based classroom practices to improve student learning at all levels in the physics undergraduate curriculum [13][14][15][16][17][18][19][20][21], there is significantly less focus on the connection between how student performance in different subsequent courses builds on prior courses in the physics curriculum overall. Information obtained from data analytics on large institutional data in these areas can be an important component of understanding, e.g. the role the earlier courses play in later physics course performance as well as contemplating strategies for strengthening these ties, and improving physics major advising, mentoring and support.
Moreover, it is important that physics departments take a careful look at the extent to which their programs for the majors are equitable and inclusive. It is imperative that we structure our programs to provide adequate support, advising and mentoring to all students, including women and students from diverse ethnic and racial backgrounds who have traditionally been left out [22,23] in order to ensure that all students have sufficient opportunity to excel as a physics major. Further, studies of institutional data are being increasingly used to investigate issues of equity and inclusion, especially for women and underrepresented minorities in physics and science, technology, engineering, and mathematics (STEM) disciplines as a whole [21,.
In this study we focus in particular on the issue of gender equity in physics education. There has been much important work done in recent years studying the many and various ways that women are not being adequately supported by their physics departments [8,21,. In particular, inequitable gender differences in introductory physics classes have been documented with a variety of measures including motivational characteristics [40-42, 54-56, 59, 60, 62, 65-74], grades [20,46,47,56,61,[65][66][67][68]73], and performance on conceptual inventories [8,24,25,50,51,57,58]. The overarching theme of these studies' conclusions is in line with broader studies of gender inequity in physics and STEM [26, 34-36, 43-45, 48, 49, 53, 64] which highlight the significant obstacles faced by women in STEM which arise from societal stereotypes and biases. We seek to extend these critically important studies by looking at gender differences in the grade outcomes of physics majors not just in introductory physics but throughout a physics curriculum.
In order to gain an understanding of these issues central to improving physics education for the majors, this research harnesses data analytics in the context of a large state-related university to investigate how well the performance of physics majors in physics and mathematics courses throughout a physics curriculum predicts performance in subsequent physics courses. We note that the first-year physics and mathematics courses are very similar at most colleges in the US. Moreover, many of the advanced courses in these subjects also have well-defined curricula that are common across many colleges and universities. These courses for the majors have been offered over decades under the assumption that the later physics courses would build on the earlier ones coherently to help the majors build a robust knowledge structure of physics and develop their problem solving, reasoning and meta-cognitive skills.
Here we discuss an investigation that uses data analytics applied to 15 years of institutional data for physics majors to analyze not only these relationships between course performance in different years, but also whether there are any gender differences in these curricular relationships. Course grades are an important measure because in the short term they are the measure that students themselves will see and use to inform their attitudes and decisions about physics, while in the long-term they are the measure that will be reflected on their transcript and affect future career opportunities. Further, it is important to further our understanding of course grades since they are a consistently available measure in institutional data going back years or decades.
The investigation can be useful for other institutions who may perform similar analyses in order to contemplate strategies for improving education for physics majors in a holistic manner. In particular, institutions could compare their findings with the baseline data from a large staterelated university presented here for the synergy observed between the required courses in the curriculum for the physics major.

Research questions
Our research questions regarding the physics curriculum for the majors at a large state-related university are as follows.
RQ1. Where in the curriculum do gender differences in course performance occur for physics majors (i.e. in introductory and advanced physics and mathematics courses)? RQ2. Does performance in introductory physics and mathematics courses predict performance in advanced physics and mathematics courses? RQ3. Does the degree to which earlier course grades predict later course grades differ for men and women?

Measures
Using the Carnegie classification system, the university at which this study was conducted is a public, high-research doctoral university, with balanced arts and sciences and professional schools, and a large, primarily residential undergraduate population that is full-time and reasonably selective with low transfer-in from other institutions [75]. De-identified data were provided by the university on all students who had enrolled in introductory physics from Fall Introduction to matrices and linear algebra Linear algebra 5 Applied differential equations Diff. eq.
2005 through Spring 2019. The data include demographic information such as gender. We note that gender is not a binary construct. However, the university data includes 'gender' as a binary categorical variable. Therefore, that is how the data regarding gender are represented in these analyses. From the full sample from 2005-2019, a sub-sample was obtained by applying several selection criteria to select out physics majors from those from other majors who took introductory physics. In particular, in order to be kept in the sample, students were required to (1) declare a physics major at any point or be a non-engineering student enrolled in the honors introductory sequence and (2) enroll in Modern Physics. Note that all of the courses we consider in this analysis in table 1 are the required lecture courses in the curriculum for the physics major. We consider only required courses in order to maintain as consistent a population as possible. Further, we consider only lecture courses since the contemporary laboratory courses have very high and narrow grade distributions (with over 90% of students receiving an A) that are less suited for investigations of gender differences. After applying the selection criteria, the sample contains 451 students, which are 19.5% female and have the following race/ethnicities: 80.5% White, 10.9% Asian, 2.4% Latinx, 2.2% African American, and 3.8% other or unspecified. The majority of the considered courses are taught in an active learning style, with the remainder taught in a traditional lecture style. We further note that multiple instructors have taught each course in the physics curriculum within the investigated time period, since the physics department has a policy that the same instructor cannot teach the same course for more than two years. However, in some courses such as the honors introductory sequence, those multiple instructors have all been men.
The data also include high school GPA on a weighted 0-5 scale that includes adjustments to the standard 0-4 scale for Advanced Placement and International Baccalaureate courses. This weighting was performed by the university prior to the data being supplied for research. Further, students' declared majors are recorded in the data separately for each term in which they are enrolled.
Finally, the data include the grade points and letter grades earned by students in each course taken at the university. Grade points are on a 0-4 scale with A = 4, B = 3, C = 2, D = 1, F = 0, where the suffixes '+' and '−' respectively add or subtract 0.25 grade points (e.g. B− = 2.75), with the exception of A+ which is reported as the maximum 4 grade points. In this study we consider course grades to be a proxy of student learning. The extent to which this assumption is true is not as important as the extent to which it aligns with how course grades are viewed by students and future employers or graduate schools. Students will use their course grades to inform decisions about their future, from their academic major to their career path.

Analysis
In order to evaluate the grades that the physics majors earn in physics and mathematics courses, we grouped students by the gender variable and computed standard descriptive statistics (mean, standard deviation, sample size) separately for each group. Gender differences in course grades were initially evaluated using Cohen's d to measure the effect size [76,77], as is common in education research [78].
The extent to which performance (i.e. grades earned) in earlier physics and mathematics courses predicts performance in later physics and mathematics courses was evaluated using structural equation modeling (SEM) [79]. SEM is the union of two statistical modeling techniques, namely confirmatory factor analysis (CFA) and path analysis. The CFA portion tests a model in which observed variables (or 'indicators') are grouped into latent variables (or 'factors'), constructed variables that represent the variance shared among all indicators that load on that particular factors. The degree to which each indicator is explained by the factor is measured by the standardized factor loadings, λ (with 0 λ 1), where λ 2 gives the percentage of variance in the indicator explained by the factor. The path analysis portion then tests for the statistical significance and strength of regression paths between these factors, simultaneously estimating all regression coefficients, β, throughout the model. This is an improvement over a multiple linear regression model in which only a single response (target or outcome) variable can be predicted at a time, which problematically disallows hierarchical structures [80]. By estimating all regression paths simultaneously, all estimates are able to be standardized simultaneously, allowing for direct comparison between standardized β coefficients throughout the model.
In this paper, we report the model fit for SEM using the comparitive fit index (CFI), Tucker-Lewis index (TLI), and root mean square error of approximation (RMSEA). Commonly cited standards for goodness of fit using these indices are as follows: for CFI and TLI, Hu and Bentler [81] found that many authors [81][82][83] suggest values above 0.90 and 0.95 indicate a good fit and a great fit, respectively. For RMSEA, several authors [81,84] suggest that values below 0.10, 0.08, and 0.05 indicate a mediocre, good, and great fit, respectively. Finally, these model estimations can be performed separately for different groups of students (e.g. men and women) using multi-group SEM. These differences are measured in a series of tests corresponding to different levels of 'measurement invariance' in the model [79], with each step fixing different elements of the model to equality across the groups and comparing to the previous step via a likelihood ratio test (LRT). A non-significant p-value at each step indicates that the estimates are not statistically significantly different across groups. 'Weak' measurement invariance is demonstrated by fixing to equality the factor loadings, 'strong' invariance is demonstrated by further fixing to equality the indicator intercepts, and finally 'strict' invariance is demonstrated by further fixing to equality the residual error variance of the indicators. If measurement invariance holds, then all remaining differences between the groups occur at the factor level, either as differences in factor intercepts or β coefficients. Further, if no differences are found in β coefficients, then any remaining group differences in factor intercepts may be modeled by including a categorical grouping variable which directly predicts the factors.
Using SEM, we model student progression through the physics curriculum by grouping courses together into factors by their subject (physics or mathematics) and the order in which the courses are typically taken by physics majors. In particular, we group introductory courses taken within the first year together (Physics 1 and 2 as 'Intro Physics' and Calculus 1 and 2 as 'Intro Math'), and consider the remaining courses beyond the first year separately. This separation between first-year and other courses is designed to test our hypothesis that performance in introductory physics and advanced physics courses are different constructs, and the relationship between these two factors in an SEM model will determine how closely related they are, controlling for performance in mathematics courses. We use multi-group SEM to test for gender moderation, i.e. to test for gender differences in the model, including mean differences of courses (indicators) and course factors. Since we found no gender differences anywhere except in factor-level intercepts, we ultimately model the gender differences not with multigroup SEM, but with a categorical 'Gender' variable directly predicting items with different intercepts.
Due to the nature of institutional grade data, modeling students' progress through an entire curriculum involves a large amount of missing data due to various factors. These can include students receiving credit for courses taken elsewhere (e.g. over the summer at a different college), not completing the curriculum, skipping courses that are normally required with special permission, and the inevitable errors that occur in large datasets. We note that due to strict requirements by the department, very few students receive credit for Physics 1 from Advanced Placement or International Baccalaureate, and no students can receive such credit for Physics 2. The default approach to missing data in many modeling programs, listwise deletion, is then not desirable since it leaves very few students in the sample and can bias the results [85]. Considering this, we employed full information maximum likelihood (FIML) using the R package lavaan [86] in order to impute missing data within the SEM model [79].
In addition to the aforementioned benefits of using SEM such as simultaneous estimation of all model elements and the ability to use FIML for missing data estimation, the basic structure of SEM also provides benefits to the modeling process. In particular, by first using CFA to group indicators into factors and then performing path analysis on those factors, the effect of measurement error is minimized since the error variance will be left at the indicator level and does not contribute to the estimation of regression coefficients at the factor level [79].
All analyses were conducted using R [87], making use of the package lavaan [86] for the SEM analysis and the package tidyverse [88] for data manipulation and descriptive statistics.

Results
In order to investigate for gender differences in course grades and answer RQ1, we grouped students by the gender variable and first calculated the standardized mean difference, Cohen's d, to measure the effect size of the gender differences [76,77]. Table 2 shows these results for the required physics and mathematics courses for prospective physics majors in their first year courses, regardless of whether they continued on in the curriculum, while table 3 shows these results for only those who at least continued through Modern Physics. Though all later analyses are performed on the student population shown in table 3, namely those physics majors who persist at least through the second year, the contrast between those students and the firstyear prospective physics in table 2 shows that on average higher than average performing Table 2. Descriptive statistics are reported for prospective physics majors in introductory physics courses. To be included in this table, students need only have declared a physics major, not necessarily enrolled in advanced courses, in order to briefly examine all students who declared a physics major during their first year. The reported statistics include the sample size (N ), mean grade points earned (μ), and standard deviation of grade points (σ) in each course for men and women separately, along with Cohen's d measuring the effect size of the gender difference. d < 0 indicates the mean for men is higher, d > 0 indicates the mean for women is higher.

Course
Gender Moreover, there is significant attrition in the number of students between tables 2 and 3. This attrition is among students who declared the major, which is certainly an underestimation since not every prospective physics major will declare the major before dropping. This is a trend which could be investigated in a future study in order to contextualize the high attrition rate in physics by comparing attrition rates among different STEM majors. We find that, on average, men performed slightly better than women in all introductory physics courses, with Cohen's d ranging from −0.12 to −0.24 among all prospective physics majors (table 2), indicating a small effect size, and ranging from −0.16 to −0.49 for those who continue to Modern Physics (table 3), indicating a small to medium effect size. Gender differences in mathematics and advanced physics courses (table 3)  The statistical significance of these gender differences is first tested using a multivariate analysis of variance (MANOVA) on three clusters of courses in table 3, namely introductory physics, advanced physics, and mathematics. Courses were clustered in order to keep the number of students in the MANOVA from dropping too low, since MANOVA employs listwise deletion. These results support the patterns noted before: that introductory physics (F(2, 371) = 3.13, p = 0.045) displays a consistent pattern of men earning higher grades than women, albeit only marginally significant at the p < 0.05 level with the listwise deletion employed by MANOVA. Further, there is no consistent pattern in either advanced physics (F(6, 119) = 1.52, p = 0.179) or advanced mathematics (F(5, 108) = 1.07, p = 0.379), evidenced by p > 0.05 for each of these tests. A more sophisticated test of these gender differences will occur in the investigation of RQ3, where we can use multi-group SEM to test for gender differences among all elements of the model, including differences in the means earned by men and women in each course. In addition, multi-group SEM allows us to perform these tests while using FIML to estimate missing data, a significant improvement over listwise deletion. Table 3. Descriptive statistics are reported for physics and mathematics courses taken by physics majors who have at least taken physics courses up to and including Modern Physics. Reported are the sample size (N ), mean grade points earned (μ), and standard deviation of grade points (σ) in each course for men and women separately, along with Cohen's d measuring the effect size of the gender difference. d < 0 indicates the mean for men is higher, d > 0 indicates the mean for women is higher. Three multivariate analyses of variance (MANOVA) are reported, with courses grouped to reduce listwise deletion into introductory physics, advanced physics, and mathematics.

Course
Gender Turning then to RQ2, we use SEM to test for the degree to which performance in earlier courses predicts that of later courses in the curriculum. The full 451 student sample was used in all SEM models, with FIML employed to impute missing data. We grouped courses into four broad categories: introductory physics (with the regular and honors sequences combined), advanced physics (all physics beyond the introductory sequence), and introductory mathematics (Calculus 1 and Calculus 2), and allowed regression paths forward in time from introductory to advanced courses.
The final model is shown in figure 3 (CFI = 0.947, TLI = 0.933, RMSEA = 0.053), in which non-significant regression paths have been trimmed from the model. One notable feature of figure 3 is that introductory mathematics strongly predicts advanced mathematics, as expected, which covaries strongly with advanced physics. However, introductory physics does not predict advanced physics at all while introductory mathematics does, indicating that the primary predictor of success in advanced physics courses is success in mathematics courses. Figure 3 is not the only possible model for the relationships among courses. In particular, the majority of students take all of the advanced mathematics courses either before or concurrently with all advanced physics courses beyond Modern Physics. A model in which advanced mathematics predicts rather than covaries with advanced physics is shown in figure 4 (CFI = 0.946, TLI = 0.934, RMSEA = 0.053). Yet another model is shown in figure 5 (CFI = 0.950, TLI = 0.936, RMSEA = 0.052), in which the advanced physics factor has been split according to the typical time-order in which students take the courses. No models tested show introductory physics predicting advanced physics when controlling for introductory and/or advanced mathematics, including those not shown here such as a model in which introductory mathematics is allowed to predict introductory physics, rather than covary with it.
To test for gender differences and answer RQ3, we first used multi-group SEM to estimate the model separately for men and women, and then used a series of likelihood ratio tests to test for differences in the model [79], first testing factor loadings, then indicator intercepts, then residual variances, then finally regression paths. In each step, the model fit was moderate to good, with CFI > 0.90, TLI > 0.90, and RMSEA < 0.08. Each step produced statistically non-significant changes from the previous according to LRTs, indicating that the estimates could be fixed to equality across the two groups (p > 0.10 for each step). The only statistically significant gender differences occurred in the intercepts of high school GPA, which is not an indicator for any factor, and the introductory physics factor. Since there were no statistically significant gender differences in regression coefficients, we converted the model from a multigroup SEM to a model that includes gender as a binary categorical variable (1 for 'F' and 0 for 'M') predicting high school GPA and introductory physics. In all three models (figures 3-5) the gender differences take on the same form: on average, women have slightly higher high school GPA (β = 0.14, p = 0.002), while men are predicted to have slightly higher grades in introductory physics (β = −0.19, p < 0.001) when controlling for the high school GPA difference, and no other gender differences are predicted anywhere else in the model. To expand further, the statistically significant path from gender to introductory physics means that men are predicted to have higher grades in introductory physics than women with the same high school GPA. For the courses other than introductory physics, this means that the inconsistent gender differences observed in mathematics and advanced physics courses in table 3 are statistically non-significant when controlling for high school GPA, which either directly or indirectly predicts every other course in the model. A diagram of the SEM model designed to test for the relationship between physics and mathematics courses in the physics curriculum, as well as gender differences therein. Reported next to each line are the standardized values for factor loadings, regression coefficients, and covariances. The gender variable was coded as 1 for 'F' and 0 for 'M', so paths from gender with β > 0 and β < 0 indicate a higher mean for women and men, respectively, in the predicted variable. All drawn paths are significant to the p < 0.001 level except those denoted with a superscript * , which are significant to the p < 0.01 level. All missing paths are not statistically significant, with p > 0.05.

Discussion
In answering each of the research questions, the introductory physics sequence stood out as behaving differently from the other courses, and the overall picture paints the introductory sequence as the only gender-imbalanced part of the entire physics curriculum (pertaining to differential performance of men and women). In particular, answering RQ1, tables 2 and 3 together with the gender differences observed in the SEM models in figures 3-5 show that introductory physics courses are the only ones in the curriculum with statistically significant gender differences, with men earning higher grades on average than women. The SEM models provide further context, showing that all other gender differences are non-significant when controlling for high school GPA, which is higher on average for female physics majors than their male counterparts. Thus, even though men only earn higher grades in introductory physics with a small effect size, that small effect size is slightly larger in magnitude and opposite in sign to the effect size of women's higher average high school GPA. One hypothesis for why there is gender difference in performance in introductory courses is that those courses are taken in the first year in large-enrollment classes. Due to societal stereotypes and biases associated with physics, women may have a lower sense of belonging and self-efficacy [89][90][91] in those types of impersonal, non-equitable, and non-inclusive learning environments which can impact learning [38-42, 54-56, 59-64, 92-94].
Even though the observed gender differences only occur in the introductory courses, this situation is pernicious and deeply troubling. The first experience of these women in first-year physics is in courses where they are faced with enormous pressure from societal stereotypes [21, 38-45, 48, 49, 53-56, 59, 60, 63] that results in course performances inconsistent with their experiences in high school as well as concurrent experiences in their mathematics courses.  figure 3, with the Advanced Math factor allowed to predict Advanced Physics. Reported next to each line are the standardized values for factor loadings, regression coefficients, and covariances. The gender variable was coded as 1 for 'F' and 0 for 'M', so paths from gender with β > 0 and β < 0 indicate a higher mean for women and men, respectively, in the predicted variable. All drawn paths are significant to the p < 0.001 level except the one denoted with a superscript * , which is significant to the p < 0.01 level. All missing paths are not statistically significant, with p > 0.05.
To be precise, the women in these introductory physics courses will be given inaccurate data about their ability to succeed in physics during the time in which they are making crucial decisions about their future. This situation is extremely problematic, only serves to perpetuate the societal stereotypes and biases that puts an undue burden on these women to begin with [21, 38-45, 48, 49, 53-56, 59, 60, 63].
In answering RQ2, the SEM model in figure 3 shows that performance in introductory physics does not predict future grades earned in advanced physics courses when controlling for performance in introductory mathematics, and this is true for both men and women. We also note that whether we allow advanced mathematics to covary with advanced physics (figure 3) or predict advanced physics directly (figure 4), we find no statistically significant regression path from introductory to advanced physics. However, allowing advanced mathematics to predict (via a regression path) rather than covary with advanced physics leads to advanced physics being predicted solely by advanced mathematics (and not by introductory mathematics). That is, in figure 4, introductory mathematics strongly predicts advanced mathematics, which in turn strongly predicts advanced physics. One reason for why introductory mathematics only predicts advanced physics via advanced mathematics (figure 4) is that the content of Calculus 1 and 2 courses (e.g. evaluating limits and simple differentiation and integration) is less directly relevant to success in advanced physics courses. While one is expected to know simple differentiation and integration in advanced physics courses, most of the variance in advanced physics performance is due to student proficiency in vector calculus, linear algebra and differential equations (in fact, in these physics courses, students generally get full credit for leaving the final answer as an integral if the limits and integrand are correct).
Further, figure 5 explores the relationship among future physics courses and finds statistically significant regression paths from Modern Physics (the sole required 2nd year physics  figure 3, with the Advanced Physics factor split by the year in which the courses are typically taken. Reported next to each line are the standardized values for factor loadings, regression coefficients, and covariances. The gender variable was coded as 1 for 'F' and 0 for 'M', so paths from gender with β > 0 and β < 0 indicate a higher mean for women and men, respectively, in the predicted variable. All drawn paths are significant to the p < 0.001 level except the one denoted with a superscript * , which is significant to the p < 0.01 level. All missing paths are not statistically significant, with p > 0.05. course) to 3rd year physics to 4th year physics, even when controlling for advanced mathematics. Yet still, figure 5 shows no connection from introductory physics to any other courses. This makes the lack of a connection from introductory physics to future physics courses unique in the physics sequence.
Considering our findings from RQ1, this lack of a connection between introductory and advanced physics makes the situation even more problematic. The inaccurate feedback that women are receiving in introductory physics, which is pressuring them away from pursuing physics majors, does not have any predictive power for how well these women could succeed in advanced physics courses.
One hypothesis for why only advanced mathematics courses predict performance in advanced physics courses while introductory physics courses do not is that advanced physics courses essentially test student facility with mathematical procedures as opposed to their conceptual understanding which is typically the focus in introductory physics courses. In particular, students can typically do very well in advanced physics courses if they have just enough knowledge of advanced physics in order to recognize which mathematical procedure to use (e.g. solving a boundary value problem) even if their conceptual foundation in physics is weak (which is the focus of introductory physics). The lack of gender differences in all mathematics courses as well as the advanced physics courses supports this hypothesis, indicating that there is a fundamental difference in how introductory physics courses are taught. In fact, our earlier investigation pertaining to conductors and insulators suggests that advanced physics students on average do not perform better on conceptual questions at the level of introductory physics than introductory physics students [95]. Moreover, in another investigation, many students in advanced graduate courses did not perform significantly better than introductory students and admitted that they had no time to think about concepts and were essentially solving mathematics problems without learning physics from their advanced courses [96].
Finally turning towards RQ3, we find that in all three SEM models tested (figures 3-5), there were no significant gender differences in any predictive paths in the model. The gender differences only occurred in two places: the intercepts of high school GPA (higher on average for women) and the introductory physics factor (higher on average for men). Introductory physics did not predict forward at all in the SEM model, but high school GPA predicts every course factor in the model either directly or indirectly. This means that any gender differences elsewhere in the model (i.e. not in introductory physics) are consistent with those observed in high school GPA.
We note that this model focuses on the relationships between the grades earned and does not account for other ways in which gender disparities in introductory physics can affect students (e.g. through self-efficacy, sense of belonging, etc). However, grades earned play a key role in students' crucial decisions about whether to remain in college and which major to pursue [97][98][99][100]. In particular, one mechanism by which this occurs is the feedback loop between course grades and self-efficacy [89][90][91][100][101][102][103]. Other studies at this same university have found significant gender differences favoring men in the physics self-efficacy of students in large introductory physics courses [41,42,55,56,65,67,68], consistent with studies at other universities [104,105]. Although our analysis only includes students who had not only declared a physics major but had also completed the modern physics course in the sophomore year, the gender differences in grades earned in introductory physics courses could have a large, gender differential impact on students' choice to pursue physics and related majors, despite the fact that performance in introductory physics does not predict performance in advanced physics courses. These findings further suggest the need for efforts towards improving equity and inclusion in introductory physics courses, including interventions designed to boost students' self-efficacy, growth mindset and sense of belonging in physics [106][107][108][109].
In conclusion, a completely cohesive curriculum for physics majors should not only be consistent in academic content from year to year, but also in its positive and inclusive environment so that students from all demographics can excel including those groups which have traditionally been underrepresented in physics. We urge researchers at other institutions to perform similar analyses in order to evaluate the efficacy of the assumptions underlying the curriculum for physics majors, and how well the various courses required for physics majors cohere. Furthermore, it is critical that other physics departments investigate gender differences within their curricula in order to counteract situations such as the one presented here, where women in introductory courses may be using inaccurate data about their performance in physics to inform decisions about their future. The situation observed in this study indicates that actions to remedy these pernicious gender differences are imperative, and it is crucial that all physics departments make every effort possible to seek and stamp out similar trends at their own institutions. Ignoring such trends only serves to perpetuate the biases and stereotypes that disproportionately affect women in physics.