An investigation into the impact of question structure on the performance of first year physics undergraduate students at the University of Cambridge

We describe a study of the impact of exam question structure on the performance of first year Natural Sciences physics undergraduates from the University of Cambridge. The results show conclusively that a student’s performance improves when questions are scaffolded compared with university style questions. In a group of 77 female students we observe that the average exam mark increases by 13.4% for scaffolded questions, which corresponds to a 4.9 standard deviation effect. The equivalent observation for 236 male students is 9% (5.5 standard deviations). We also observe a correlation between exam performance and A2-level marks for UK students, and that students who receive their school education overseas, in a mixed gender environment, or at an independent school are more likely to receive a first class mark in the exam. These results suggest a mis-match between the problem-solving skills and assessment procedures between school and first year university and will provide key input into the future teaching and assessment of first year undergraduate physics students.


Introduction
The Department of Physics (the Cavendish Laboratory) at the University of Cambridge is committed to the advancement of women in science. The Department holds Institute of Physics (IoP) Juno Champion status (Institute of Physics) and is the first physics department in the UK to be recognized with an Athena SWAN Gold award (Athena Swan). One aspect of the department's gender equality activities is the encouragement of women to study physics as part of the Natural Sciences undergraduate course and to ensure that their learning and achievements are not affected by their teaching environment or assessment procedures.
Gender gaps, particularly in the highest undergraduate degree classification (first class mark), have been documented by many studies over many decades, and across many institutions (Rudd 1984, McNabb et al 2002, Richardson and Woodley 2003, Simonite 2005, Barrow et al 2009. These previous studies cite many possible reasons for the gender differences they document. The most compelling hypothesis is the dependence on the entry qualifications of the cohort, in particular that female cohorts have a narrower distribution of entry qualifications with a mean lower than that of their male counterparts. This gender specific pre-entry distribution is consistent with observations in these previous studies that proportionally fewer women get firsts but proportionally fewer women get thirds and below. Rudd (1984) comments that from the entry qualifications the female cohort has 'fewer geniuses but fewer dunces'.
The previous studies cited here combine very broad investigations into gender differences, independent of subject (Barrow et al 2009) and independent of institution (Rudd 1984 andMcNabb et al 2002), as well as those that have a particular focus on subject (Simonite 2005). With such breadth in epoch and detail one would expect a cohort dependence in the results of the studies however, the conclusions describe a consistent under performance of females at the highest level with evidence that factors such as male prejudice in marking (Newstead and Dennis 1990), institutional attributes, academic aptitude, and medical and psychological characteristics are not responsible (Rudd 1984, McNabb et al 2002. As part of the Cavendish Laboratory's continued monitoring of undergraduate achievement, it is observed that there is a gender difference in the distribution of marks achieved by undergraduates at the end of their first year of study. This leads to approximately 11% of women attaining a first class mark compared with 30% of their male counterparts. (Many studies into such observations for the Universities of Oxford and Cambridge have previously been documented-both independent of subject (McCrum 1996, Leman 1999, Mellanby et al 2000, Surtees et al 2002 and subject specific (Simonite 2005).) The observations of our first year cohort, regarding the deficit of firsts in the female population, agree with the picture presented in previous studies. However, our results differ if we also consider the gender difference down to the 2:1 level. Most of the studies we cite (Rudd 1984, McNabb et al 2002, Richardson and Woodley 2003, Barrow et al 2009 record that while there is a gender gap in the proportion of firsts, women proportionally outperform men at the 'good' degree levels of 2:1 or above. The relative underachievement of female students in first year physics at Cambridge is contrary to the known exam performance at school and is not observed in later years when the students begin to focus on their chosen subjects-a particularly puzzling result in light of the aforementioned research and conclusion that pre-entry qualifications correlates well with degree performance at university.
The hypothesis that we therefore present, supported by focussed discussions with the undergraduate community, is that the gender difference in exam performance may arise from a difference in the structure of exam questions, which is highly scaffolded at school level compared to a less-structured form in first year undergraduate physics. To assess the impact of exam question structure on the performance of undergraduates, we have developed and conducted a mock first year physics exam. Here, we report the key findings of the mock exam. The results will inform the teaching of physics and other subjects within the University of Cambridge Natural Sciences course in the future.

Undergraduate physics at the University of Cambridge
The University of Cambridge consists of both subject specific departments and colleges. The colleges are a vibrant and academically supportive residential environment where students live, work, eat, have access to resources (like libraries) and also receive some teaching from affiliated academic staff. It is through the colleges, rather than the departments, that the University of Cambridge admits its undergraduate students for each subject.
Over 600 undergraduate are admitted annually to read the Natural Sciences course, which includes a wide range of physical and biological subjects, and ultimately leads to a degree in one of 16 subjects. All students studying physics do so through the Natural Sciences degree. In the first year, students study three experimental subjects (physics is one of eight options) and mathematics. In the academic year 2013-14, 448 students chose physics as one of their options. In the second year, students develop a stronger subject focus; approximately 150 continue to read physics and one or two other options. The Department's studies show that, as a fraction of students who intended to study physics in all four years at entry to the University, approximately 50% women and 70% men declare their intention at the end of the first year (and prior to exams) to continue to study physics in the second year. Students choose their specialist subject at the start of the third or fourth year. Approximately 120 and 100 students read physics in the third and fourth year, respectively. The university runs three 8 week terms, October-December, January-March and April-June. End of year examinations typically take place at the end of May and start of June.

Method
To investigate our hypothesis, that providing scaffolding within physics problem solving exam questions increases a students mark and that this increase will be greater for female students, we constructed a mock exam using questions taken from previous first year physics papers. While our primary aim for the study is to investigate the effects of question structure on student performance by gender, it also gives an opportunity to investigate whether other correlations exist. The mock paper consisted of two sections; the first (Section A) contained a set of four short questions; the second (Section B) contained two longer and more involved questions. Two versions of the paper were produced, which contained the same questions placed in the same order. However, in the first paper (Paper S), the first and following alternate questions were written in a scaffolded form. In the second paper (Paper U), the second and following alternate questions were scaffolded. The order of the scaffolded and university style questions differed between the two papers to remove any bias that may occur as a result of the order in which the students met the scaffolded questions. The two papers, both time-limited to 2 h, are shown in the appendix. The first year physics students volunteered to sit the exam at the start of their final term. The students were randomly assigned one of the two papers and were required to answer all questions, such that no bias was incurred through question choice.

Volunteer data
Prior to the mock exam, volunteer students registered for the exam via an online form two months before, before their vacation, to minimize any bias that may occur from the questions asked in the form (for example, their declaration of gender). The registration form asked them to submit the supplementary data listed in table 1. This included whether their school was based in the UK or overseas, the type of school (independent, state, academy or other), whether it was single-sex or mixed environment, and their final examination results. Within this last category we specifically asked for the examination type (A2-level, international baccalaureate (IB), Scottish highers, Pre-U), and where possible their numerical mark. The summary of our findings are presented below. Only one student declined to declare their gender.

Question choice and structure
All questions chosen for the mock exam were selected from past physics papers taken between 1993 and 1999. The following criteria were used to select the questions: • They needed to test topics covered in the first two terms of the first year physics course. • In the original form of the question, the majority or all of the marks were allocated at the end of the question with little or no scaffolding present. • Some questions focussed on topics that females were deemed less confident with according to anecdotal opinions of the students and university tutors (supervisors). • For the two longer (Section B) questions, one was set on a topic (reference frames and kinematics) previously studied at school and the other was on a topic (special relativity) introduced within the first year University physics course.
Once selected, the questions were restructured into a scaffolded format, reminiscent of current A2-level style questions. Each question was broken down into multiple parts with a small number of marks allocated for each part, rather than indicating the total number of marks at the end of the question. Different types of questions suggested different forms of scaffolding. For example, some explicitly asked the candidate to draw a diagram, and others to define terms at the beginning of the question or to calculate numerical answers at each step of a long question. This last type of scaffold is contrary to the usual recommendation for undergraduates, which is to perform calculations symbolically and only substitute numbers at the end.
The two exam papers were produced with half of the questions in the original university style format and half in the scaffolded question style. Each paper alternated between university and scaffolded styles. The mock papers, containing all the questions, can be found in appendix.

Conduct of exam and marking
The exam took place under end-of-year exam conditions on the 22 April 2014 in the lecture theatres at the Cavendish Laboratory.
The completed papers were marked during the two days following the mock exam. This enabled feedback to be given to the students and their colleges in advance of the start of the third University term. While this was an education research exercise, it also provided an important learning experience for the students and preparation for the end of year exams six weeks later. To facilitate the timely marking, we engaged seven people to mark the papers. Each marker was provided with a mark scheme for the questions. First year physics laboratory demonstrators were chosen (and paid demonstration rates) to mark the papers. They already had experience of the level of learning and understanding of the cohort of students. The gender distribution of the markers was roughly equal with three men and four women.
All the marking took place in a single room, thereby enabling any questions about the marks to be allocated to be discussed between markers. The markers did not have access to the individual student information summarized in table 1. The authors of this paper looked over each mock exam paper as the marking was completed to check the summation of marks, to enter the marks into a spreadsheet, and to ensure consistency across markers. The identity of the marker was also recorded in the spreadsheet against each paper. The resulting mark distributions of all seven markers are all consistent.

Student cohort
The total number of first year Natural Sciences students who chose physics as one of their options in October 2013 was 448. Of these, 320 students (the cohort) volunteered to sit the mock exam paper. The cohort consisted of 26% women, which can be compared to the national average of 20.6% who sat physics at A2-level in 2011 (Institute of Physics 2012). The majority (80%) of the students were educated in UK schools. Although, educational diversity is expected within the University's large undergraduate population, the previous qualifications of the cohort is weighted towards the UK A2-level system, as shown in table 2. Since the students were admitted through the same admissions process, we consider the A2-level mark distribution as representative of the cohort as a whole. Of the 251 students who sat A2-levels, all had taken physics and mathematics, and 70.9% (73.0%) of the women (men) had also taken further mathematics. Figure 1 and table 3 illustrate the high average A2-level marks (from a total of 600), and relatively small standard deviations, in the class's starting knowledge. They also indicate that A2-level further maths has a greater dispersion than single maths and physics and therefore discriminates more between students of high ability. Figure 1 and table 3 also show that the female students have performed equally well (if not better) at A2-level than the male students in their year group. It is apparent from this table that the pre-entry characteristics of our cohort are therefore quite different from the results of students featured in previous larger and more general studies (Rudd 1984

Results
Our primary objective is to establish whether or not scaffolding in examination questions preferentially assists female students compared with their male counterparts for a cohort who had experienced the same physics course. We also investigate the effect of scaffolded questions according to previous examination performance and school background.

Analysis by gender
The mock exam mark distribution for the cohort by gender is shown in figure 2(a). The overall mean is (55 ± 14)%, which is comparable to, but slightly lower than, previous end of first year exam mark distributions. For example, the corresponding mean and standard deviations in 2010 and 2013 were (58 ± 15) and (59 ± 14)%, respectively. We therefore conclude that the paper was set at an appropriate level and marked accordingly. In the Figure 2. Distributions of (a) marks and (b) degree class (1st (>67%), 2nd (>47%), 3rd (>37%), and fail (<37%)) for the mock exam cohort. subsequent analysis we consider the distributions of first, second and third class degree marks. Since we do not apply scaling of marks, as for the end of year exams, we set the first, second and third class boundaries to > 67, > 47 and, > 37%, respectively. The class distribution for the whole cohort and by gender is shown in figure 2(b). In figure 2 we observe the phenomenon that prompted our study; the percentage of female students receiving a first (13.0%) is significantly smaller than their corresponding male counterparts (21.6%), with the average mark on the paper also differing by 5.6% in favour of the male students. This difference in the mean marks by gender corresponds to a 3.2σ effect.
Each of the two mock examination papers allocated half of the marks to scaffolded questions and the remainder to university style questions. The separate class distributions for the scafolded and university style questions are shown in figure 3, and illustrates the dramatic differences between the marks achieved in the two different styles of questions, in particular in the extreme degree classifications of first class marks and fails. The average percentage mark achieved for the university style questions (49.6%) is 10.1% below that for the scaffolded questions (59.7%), equivalent to a 7.2 standard deviation effect. In addition, a 14.3% difference in the percentage of first class marks achieved between scaffolded and university style questions is observed.
The effect of scaffolding of questions is also considered by gender. The average percentage mark attained by female students for scaffolded questions is 13.4% higher than for  university style questions (4.9 standard deviations). This can be compared to 9.0% (5.5 standard deviations) for the male students. Overall, 19.5% more females achieve first class marks for the scaffolded questions compared to the university style questions, with 31.2% fewer of them failing. For the male students the difference was slightly less marked with 13.2% more achieving first class marks for the scaffolded questions with 14.4% fewer failing. We therefore conclude that scaffolding of exam questions is beneficial to all undergraduate students and that the female students benefit preferentially.

Analysis by A2-level performance
The results presented so far strongly agree with the hypothesis that scaffolding in questions correlates with exam performance. To further support this evidence we make an additional two comparisons. The first is to establish whether the degree classification is correlated to A2level examination performance. The second is to investigate the correlation between A2-level performance and the scaffolded and university style questions. Distributions of the average mock mark versus the A2-level performance in (a) physics, mathematics and further mathematics, (b) university and scaffolded style questions for students who took A2-level physics and mathematics, and (c) university and scaffolded style questions for students who took A2-level physics, mathematics and further mathematics. Students were sorted into bins of size 20 according to their A2-level mean mark across subjects and the mean mock mark for each bin was then calculated to produce the distributions shown here. Figure 4(a) shows the correlation between the marks scored at A2-level (physics, mathematics and further mathematics) and the mock exam. A correlation is observed for all three A2-level subjects. For those students who took both physics and mathematics at A2level, the correlation between their average A2-level mark and the mark they obtained in the mock exam for the scaffolded and university style questions is shown in figure 4(b). It can be seen that the performance of students depends strongly on the style of exam question, and is apparent for all A2-level marks. For those students who took physics, mathematics and further mathematics at A2-level, the correlation between their average A2-level mark and the mark they obtained in the mock exam for the scaffolded and university style questions is shown in figure 4 (c). Once again, it can be seen that the performance of students depends strongly on the style of exam question, and is apparent for all A2-level marks. In addition, there is an indication that the scaffolded style questions partly reduces the correlation between the A2-level mark and the mock exam mark.

Analysis by previous education
With a large cohort of 320 students we also investigate further diversity and dependencies of the results. In particular, we consider school location, school type and mixed or single-sex schooling. Figure 5(a) shows the degree class distribution by gender and by location (UK or overseas). It can be seen that the proportion of overseas students attaining a first class mark is higher than that for the UK students. When we further divide the students by gender, we also see a marked difference between the first class marks of male and female students; 20.7% of UK males and 12.1% of UK females attain a first class mark, compared to 25.6% of overseas males and 15.8% of overseas females.
The dependence of the performance on single-sex versus mixed school education is also analysed, independently of UK or overseas teaching, as shown in figure 5(b). Although, the number of females who received single-sex teaching pre-university (28) is small, the singlesex schooling appears to have a negative effect on the fraction achieving first class marks in the mock exam, even though the average percentage is slightly higher. For the male students there is a small difference in the percentage achieving first class marks and a negligible difference between the average marks.
Finally, we consider only those students educated in the UK as a function of school type (independent, state, academy and other). Since the numbers of students are small for the academy and other school categories, we consider independent school versus state school background only. The proportions of the cohort UK students taking the mock exam are broadly representative of the state-independent school distribution at the University of Cambridge (62% state and 38% independent). The distribution of class marks is shown in figure 5 (c). The average percentage mark is 6.4% (3.4 standard deviations) higher and the fraction of students achieving a first class mark is 10.2% higher for independent school students compared to the results for state school students. Figure 5(d) shows that women from an independent school background perform as well, if not better, than their male counterparts.

Discussion and implications
The structure of the Natural Sciences degree at the University of Cambridge has provided us with unique access to a broad cohort of students who, on entrance, are undecided about their future scientific specialization. The results and experience the students gain in this first year can strongly influence their choices. This study shows that there is a need to help them to Eur. J. Phys. 36 (2015) 045014 V Gibson et al bridge the gap between the skills development and assessment they experience at school and that which is expected at university. Our results have shown that providing scaffolding helps both genders achieve better results but builds the confidence of women preferentially. Our future aim therefore will be to help students, throughout the year and through all our avenues of teaching, develop their thinking skills so that they are able to create their own scaffolding and conceptual structure. As students develop their confidence and enjoyment of physics their choice to take physics at the next level with be positively impacted. The Isaac Physics project (Warner and Jardine-Wright 2014) provides problem solving practice for schools students to positively impact on their experience and confidence enabling them to begin to constructing their own strategy gradually.
Research has shown that a student's belief in their own ability in science is positively linked to their desire to continue to study and a lack of self-confidence was recorded for girls in particular (Kahle et al 1993). Furthermore, a paper for the United Nations Division for the Advancement of Women on the barriers to the realization of the potential in gifted girls states that 'lower confidence in one's abilities and/or lower self-esteem, which were often found in gifted female teenagers, might have long-term impact on their achievement in future' (Brankovic 2006). While we only consider here the effects within gender for physics, previous research links lack of confidence to performance in mathematics, as discussed in Meece Figure 5. Distributions of degree class (1st (> 67%), 2nd (> 47%), 3rd (> 37%), and fail (< 37%)) for students educated in schools (a) in the UK and overseas, (b) in single-sex or mixed, and (c) as a function of school type (d) as a function of gender and school type. and Jones' paper on gender differences in motivation and strategy use in science (Meece and Jones 1996).
Through continued support, reinforcement of structure and the identification of concepts we believe that students will not only get better marks but also develop a better understandingit is difficult to identify a strategy to solve a problem unless you really understand the concepts underpinning that problem.
The results presented in this study are limited by one year of data for a cohort who have yet to complete their first year exams. Therefore while we strongly suspect that at the end of the first year the percentage of women achieving firsts will be as reported here and for many years previously-but we have yet to track this particular cohort. From previous cohorts we have evidence that as students progress, and specialize in physics, the percentage of women achieving firsts increases and the effect reported here is reduced in years two, three and four.
Our future work will include setting a mock exam, in this template, for our first year students and continue to collect data to verify the consistency of these initial findings. A larger sample of data will also enable us to study a statistically significant sample of students who progress to university through examination systems other than A-levels (for example, IB, Pre-U). Furthermore, we will be able to test the impact of changes in the support and teaching methodologies we implement to help students self-scaffold and test the hypothesis that if they develop these skills through the first year of physics study and through pre-university intervention the difference in results between scaffolded and university style questions will be minimized. As we in physics collect evidence to support our hypothesis and prove that the students development of strategic thinking impacts positively on their results and understanding that other departments within the university (chemistry, mathematics and engineering) whose first year gender distributions mirror those of physics will follow our example.

Conclusions
As part of our Department of Physics activities directed towards gender equality, we have investigated the impact of exam question style on the performance of first year Natural Sciences students who take physics as one of their options. The exam questions are designed to bridge the gap between the traditional scaffolded school style questions and the lessstructured style questions commonly encountered as part of the first year assessment procedure.
We report a number of key findings: • There is no gender bias in the performance of the cohort who took A2-level subjects (physics, maths and further maths) at school, with the women performing equally well (if not better) than their male counterparts. • The mock exam mark distribution confirms the same trend as observed in the end of first year exams, with the percentage of women receiving firsts significantly lower than their male counterparts. • Scaffolded type questions significantly improve the performance of both men and women from all school backgrounds, with the women benefiting preferentially compared to the men. • There exists a correlation between the performance at A2-level (physics, maths and further maths) and the mock exam. The correlation is less pronounced for the scaffolded questions compared to the university style questions. • Students who received their school education overseas or in a mixed education environment are more likely to receive a first class mark in the first year physics exam.
• Students who received a UK independent school education performed better in the mock exam than those from a state school background, with women from independent schools performing as well as the men from independent schools.
These results suggest a mis-match between the problem-solving skills and assessment procedures between school and first year university, and are consistent with the findings of Warner (2013) and Hyde and Mertz (2009). They will provide key input into the future teaching and assessment of first year undergraduate physics students.

Acknowledgments
We would like to thank the Institute of Physics (IoP) for funding this research. We also gratefully acknowledge the support of the Cavendish Laboratory's undergraduate Teaching Committee, the markers of the exam scripts and the students who volunteered to undertake the mock exam. 2. In a poorly maintained train, the thin cavity of a double glazed window is partially filled with rain water. As the train decelerates along a horizontal track, a passenger notices that the water surface is at an angle of 15°to the horizontal.

Summary
(a) Draw a labelled diagram of the forces on a single water molecule.
[3] (b) Find the deceleration of the train. [2] 3. Why does the front end of a car dip upon braking? [5] 4. The wave function for an electron is split by a barrier into two parts which follow paths differing in length by 1 μm before they merge again. When the electron energy is 10 MeV the interference is constructive. (a) Write down the requirements for constructive and destructive interference.
[1] (b) What is the wavelength of the electron of energy 10 MeV?
[1] (c) By how much must the energy be increased for the interference to become destructive? [3]

Section B
5. (a) Discuss the use of the zero momentum frame for treating problems of collisions between particles in two dimensions. Your answer should include appropriate diagrams.
[3] (b) A collision occurs between two (non-relativistic) bodies of equal mass m and velocity vectors v 1 and v 2 .
(i) Find the velocity vectors of the bodies in the zero momentum frame.
[1] (ii) Write down an expression for the kinetic energy that can be lost in the zero momentum frame.
[1] (iii) How much kinetic energy is available for conversion to other forms of energy?
[1] (c) A particle of mass m travelling with speed V along the +x direction collides elastically with a stationary particle of mass 2m. The particle of mass m is deflected through an angle of°30 .
(i) Draw a diagram of the particles before the collision, in the laboratory frame.
[1] (ii) Draw a diagram of the particles before the collision, in the zero momentum frame.
[1] (iii) Draw a diagram of the particles after the collision in the zero momentum frame.
[2] (d) Transform back to the laboratory frame and, using velocity triangles or otherwise, (i) find the velocity vector for mass, m.
[2] (ii) find the velocity vector for mass, 2m. [3] 6. Explain what is meant by the relativistic effect of time dilation and give an example of an experiment that demonstrates this effect.
[5] Twins Alice and Bob go travelling in space. They each carry a clock to record how much they age during the trip. Alice leaves Earth and travels at a steady speed of 5c/13 to a space station 1 light year away. Bob leaves Earth at the same time as Alice, but travels at a speed 5c/13 in the opposite direction. When Alice reaches the space station she immediately turns around and travels at a speed of 12c/13 towards Bob, eventually catching up with him. Find the elapsed time on (a) Earth's clocks, (b) Bob's clock and (c) Alice's clock between leaving Earth and meeting in space. [10] Paper S Section A Section B 5. Discuss the use of the zero momentum frame for treating problems of collisions between particles in two dimensions. Your answer should include appropriate diagrams.
[3] A collision occurs between two (non-relativistic) bodies of equal mass m and velocity vectors v 1 and v 2 ; how much kinetic energy is available for conversion to other forms of energy?
[3] A particle of mass m travelling with speed V along the +x direction collides elastically with a stationary particle of mass 2m. The particle of mass m is deflected through an angle of°30 . What are the final velocity vectors of the two particles in the laboratory frame? Your answer should be illustrated by appropriate diagrams in both the laboratory and zero momentum frames. [9] 6. (a) Explain what is meant by the relativistic effect of time dilation.
[3] (b) Give an example of an experiment that demonstrates this effect.
[2] Twins Alice and Bob go travelling in space. Alice leaves Earth and travels at a steady speed of 5c/13 to a space station 1 light year away. Bob leaves Earth at the same time as Alice, but travels at a speed of 5c/13 in the opposite direction. When Alice reaches the space station she immediately turns around and travels at a speed of 12c/13 towards Bob, eventually catching up with him. (c) Draw a space-time diagram indicating four events: Alice and Bob leave Earth (A), Alice reaches the space station (B), Alice passes Earth (C) and Alice and Bob meet again (D).
[2] (d) For a clock on Earth: (i) What time has elapsed between events A and B?
[1] (ii) What time has elapsed between events A and C?
[1] (iii) What time has elapsed between events A and D?
[1] (iv) How far have Alice and Bob travelled?
[1] (e) Alice and Bob each carry a clock to record how much they age during the trip.
(i) What is the elapsed time on Bob's clock between events A and D?
[1] (ii) What is the elapsed time on Alice's clock between events A and D? [2] (iii) What is the age difference between Alice and Bob when they meet? Who is older? [1]