Effect of Reflective Exercises on Academic Performance and Course Evaluations in a Biomedical Sciences System Course

Introduction: Reflection is integral component of self-directed learning, and may enhance empathy, self-awareness, professionalism, and attitudes about learning. We sought to determine if addition of reflective exercises also enhances academic performance in first year dental students taking a systems-based, case-based biomedical science course. Methods: This a mixed method, retrospective study evaluating academic performance and lived student experience of students enrolled in the "Blood and Lymphatic System" course of 2015 and 2016. Courses were identical except for addition of reflective exercises in 2016. We evaluated class characteristics, academic performance as determined by ungraded pre/post test surveys and graded written assessments, and student perception of the course through Likert-type scale survey ratings and both manual and machine-coded textual analysis of written student comments. Results: Addition of reflective exercises to this course did not produce any significant changes in academic performance in two statistically similar student cohorts. However, the reflection exercises increased themes of empathy, professionalism, interdisciplinary collaboration in course evaluations, and the reflections themselves provided insights on student learning. Discussion and Conclusion: While reflective exercises seem to have little effect on academic performance, they can be used to gauge themes and sentiments of students enrolled in a biomedical system course.


Introduction
Health sciences education traditionally has been focused on knowledge acquisition and clinical apprenticeship. Boehm T MedEdPublish https://doi.org/10.15694/mep.2021.000012.1 Page | 2 However, since the development of various learning theories, health science educators have sought to improve student learning and nurture development of life-long learners. Lifelong learners can determine their own learning needs and set their own learning goals through reflection on their own learning progress (Pee et al., 2000).
Reflection is a key element underpinning learning theories used in healthcare education, such as experiential learning (Kolb, 1984;Schon, 1987) or deep learning (Biggs and Moore, 1993). For example, in the experiential learning model, learning occurs through a cycle of 1) identifying and describing a relevant experience, 2) analyzing and evaluating one's personal response by reflecting on that experience, 3) considering and adapting one's working hypotheses and assumptions to the experience and 4) responding to a new experience with the gained insights. As seen here, reflection is an essential piece in the process of learning.
Reflection is a metacognitive activity (Hewson, Jensen and Hewson, 1989) that is deliberative (Shulman, 1985) and involves evaluating existing knowledge, attitudes and skills for a given field of practice (Cruickshank, 1987). It involves re-experiencing the last experience for the purpose of articulating what went well, and what did not, and for making normally tacit knowledge explicit, and thus scrutinizing existing knowledge (Collins, Brown and Newman, 1987). Reflection enables self-directed learning, which is a process in which individuals diagnose their own learning needs, formulate learning goals, choose resources and learning strategies, and evaluate learning outcomes (Knowles, 1975). Self-directed learning is a hallmark of lifelong learners, and this skill can be encouraged at the course design level with problem-based learning and self/peer evaluation (Towle and Cottrell, 1996).
On a smaller scale, reflection can also be encouraged with various assignments and classroom activities. One method are "progress files", "clinical blogs" or portfolios that contain student/doctor achievements, reflective diaries, selfassessments and encounter evaluations (Pee et al., 2000;Shaughnessy and Duggan, 2013;Wakeling et al., 2019). A commonly reported reflection exercise is reflective writing, where students write about their experiences after role playing or patient encounters (DasGupta and Charon, 2004;Beylefeld, Nena and Prinsloo, 2005;Shapiro, Kasman and Shafer, 2006;Levine, Kern and Wright, 2008;Westmoreland et al., 2009;Bradner et al., 2015;Borgstrom et al., 2016;Smith, 2018). A similar reflection exercise, but verbal, is debriefing after a patient encounter with a preceptor or faculty (Lewin et al., 2014;Okubo et al., 2014). Last, a simple, but yet powerful reflective exercise involves sending single or periodic email surveys that ask students what they learned and how they felt about a particular class session or activity, or what they would do different next time in a challenging patient encounter (Henderson and Johnson, 2002;Mori, Batty and Brooks, 2008).
There may be multiple benefits of these reflective exercises. Most of these were increases in soft skills such as a gain in empathy and self-awareness (DasGupta and Charon, 2004;Levine, Kern and Wright, 2008), improved attitudes about self-directed learning (Mori, Batty and Brooks, 2008), increased awareness of professionalism and empathy (Dhaliwal, Singh and Singh, 2018), increased learning satisfaction (Westmoreland et al., 2009), deeper understanding of patients and improved talent for humanizing the health care experience (Bradner et al., 2015). Generally, these were found after analyzing student/doctor responses qualitatively by recurrent themes. However, one report suggested that debriefing also leads to improved clinical reasoning (Okubo et al., 2014) as evidenced through an objective structured clinical exam, and the authors suggested that reflection improved knowledge acquisition.
Given the relative lack of studies that evaluate the effect of reflective exercises on knowledge acquisition, we aimed to assess if addition of repeated reflection exercises in the context of a case-based system course produced any quantifiable increase in academic performance. In addition, we explored qualitatively what effect reflective exercises may have on student's perception of the course and the course material itself. Boehm T MedEdPublish https://doi.org/10.15694/mep.2021.000012.1 Page | 4 paper on "What have you learned that you did not know before, and how would you apply it to your practice?" A peer grading exercise that involved developing a management plan for a medically compromised patient and having other students providing feedback on it Debriefing at the end of two case studies in the middle of the course that specifically asked reflective questions such as "Why do you think a medical consult was requested?", "Why was this treatment done?", "How would you go about telling the patient about this condition?" or "Would you have treated this patient differently?" To ensure participation in the reflective exercises, each reflective exercise was incentivized by awarding a student a small credit (5%) towards the course grade. To make up for these additions, the contribution towards the course grade was reduced from 25% to 20% for quizzes and the literature seminar. Contribution towards the course grade remained unchanged between 2015 and 2016 for attendance (5%), pre/posttest completion (5%), the midterm (15%) and the final exam (25%).

Pretest/Posttest survey:
The survey consisted of the following multiple-choice items that were designed to test detailed knowledge recall of the various disciplines represented in this course (see supplemental file "Supplementary File 1.pdf"). Pre/posttests were scored individually against a key, and recorded as pre/posttest survey score for each student in both years.

Data collection:
We obtained from the Western University of Health Sciences College of Dental Medicine's Office of Academic Affairs the number of students in each class, along with the percentage of males, average age, ethnicity/race, incoming Grade Point Average (GPA) and incoming Dental Aptitude Test (DAT) scores. We also obtained pre and post-test survey results with the number of correct answers, along with exam grades and course grades.
Qualitative analysis: Where applicable, we followed the COREQ guidelines for qualitative research (Tong, Sainsbury and Craig, 2007) for qualitative analysis of student course evaluations. For course evaluations, a female Western University of Health Sciences staff member (college-educated, with a five year long experience collecting these surveys but not possessing a dental background and not otherwise engaged in teaching) sent an email instructing students to fill out an online form powered by Qualtrics (XM) (Provo, Utah and Seattle, Washington).
Students received this survey in the same form for other courses and were used to taking this type of survey from previous courses. The students were not aware of being research participants but may have been aware of the addition of reflective exercises through conversations with more senior students. All students of both classes submitted these course evaluation surveys as completion of these surveys was required for course completion.
The relevant question for this study were "Q5-Please provide any additional constructive feedback regarding the instructor's teaching" and "Q6-Comments", and previous questions solicited Likert-style ratings of course resources. The course evaluation survey content was manually coded by author TB searching for specific keywords connected to expected benefits of reflection (i.e. "learning", "reflection", "understanding") and various spelling permutations that were tied to specific categories (i.e. "learning") as determined prior to analysis. Search results were verified against the context of the entire comment and tabulated as one instance per student for each category using Microsoft Excel® (Redmond, Washington). In addition, total number of comments, comments related generally to course content and or specific to the professor teaching the course were counted, and the general tone of these comments recorded.
To validate the manual coding, machine-based textual analysis was used to capture prominent themes and sentiments using NVivo 12 Plus (QSR International, Burlington, MA). For this, student responses were imported as PDF files and coded using the auto-code wizard functionality.
We also applied the same methodology to analyze the content of students' reflection assignments to characterize common themes. NVivo 12 Plus was used to generate a word cloud from the thousand most frequent stemmed words.
Statistical analysis: For statistical calculations and data representation, the R statistical package was used (Vienna, Austria). Student average ages, incoming GPA, incoming DAT, pre/posttest survey means were compared with the unpaired student t-test, while student gender percentages, ethnicity and course grades were compared with Fisher's exact test. Exam scores were found to be non-parametrically distributed and compared using the Kruskal Wallis rank sum test across groups. Sentiment and keyword count proportions among student comments for the 2015 and 2016 years were compared using chi-square proportion testing.

Results/Analysis
The classes taking the 2015 and 2016 DMD 5715 Blood and lymphatics systems course were similar in class size, gender and ethnic/racial make-up, age and incoming GPA and DAT performance ( Table 1). Given that no statistically significant differences were observed, we assumed that any differences in academic performance between the 2015 and 2016 courses would be due to the addition of reflective exercises. Admissions Test (DAT) Scores Abbreviations: "BCP": Biology, chemistry, physics. "G.Chem": General chemistry. "O.Chem": Organic chemistry. "P. Ability": Perceptive Ability. "Quant. Res.": Quantitative Reasoning. "Read. Comp.": Reading Comprehension. "Total Sci.": Total Sciences.
We used identical pre/posttest surveys to measure factual knowledge recall during the course. In both 2015 and 2016 courses, students were able to answer significantly more items correctly suggesting that both courses were effective in teaching factual knowledge (see Figure 2 for 2015 class performance). The pre/post test only evaluates recall skills, whereas the written assessments in this course also test higher learning skills such as application and analysis of clinical scenarios. Nevertheless, performance on the posttest correlates weakly with exam performance (R 2 =0.15), and similarly with the course grade. There was a statistically significant increase in the number of correct answers between pre and posttest surveys in the 2015 course, suggesting effective learning (p<0.05). The boxplot for 2016 course pre and post test score data appears similar to this figure.
To determine if reflective exercises aided academic performance on assessments evaluating knowledge acquisition, we compared pre/post test surveys, exam grades and course grades. No statistically significant differences were found for the pre/post test surveys ( Figure 3A) and course grades ( Figure 3B). Exam scores varied greatly and there was no apparent trend (2015 exam scores: mid-term average 80%, standard deviation 8%; final exam 73%. standard deviation 8%; 2016: mid-term 65%, standard deviation 19%; final exam: 82%, standard deviation 12%) with all scores statistically significantly different from each other (p<0.001, Kruskal Wallis). This suggests that addition of reflective exercises had no measurable effect on academic performance in this course.  (A) Boxplot of pre to post test score increases for the 2015 course lacking reflective exercises and the 2016 course with the added reflective scores. Addition of reflective exercises did not lead to any statistically significant differences in pre/post test score gains. (B) No statistically significant differences were observed were seen in course distribution (shown) or average course grade. This confirms the pre/post test survey findings and suggests that addition of reflective exercises had no significant effect on academic performance in this course.
Since reflective exercises may improve professionalism, satisfaction, and student attitude, we sought to determine if this was reflected in the course evaluations. Therefore, we analyzed the free text responses in the course evaluation for manually coded nodes such as "learn", "understand", "professionalism" or "self (directed learning)" that correlate to intangible benefits of reflection reported previously. Only the 2016 course with the inclusion of reflection exercise elicited student comments on the reflection exercise, albeit a single comment: "(response no. 19) Self-reflection helped solidify information". Surprisingly, the overall tone of the course evaluation comments felt more negative (Table 2), and a machine-based sentiment analysis confirmed this impression (Figure 4), with a significant reduction in mean sentiment (p<0.05, Mann-Whitney U test). This impression was also reflected in the more negative responses on the quantitative portion of the course evaluation survey (Table 3).

Table 3: Quantitative Ratings of the Course Experience as provided by the Course Evaluation Survey
Survey Item with Likert type rating (1=strongly disagree to 5=strongly agree) Course evaluations -reflection (N=67) + reflection † (N=69) "I reviewed the specific learning objectives for the course" 63 (93%) "YES" 67 (97%) "YES" "The content was presented in a sequence that effectively supported my learning" 68 (100 %) "YES" 63 (91%) "YES" "Overall, the course was well designed" 4.5±0.63 4.0 ± 0.92* "The large group sessions were effective in summarizing the important facts" 4.5±0.55 4.3 ± 0.65 Given that the unexpectedly negative feedback was associated with no course changes other than the addition, we sought to determine if the reflection exercises themselves provided clues for this shift in attitude. Using the same approach, we analyzed individual responses given during the email survey. Surprisingly, manual coding showed a positive response in general, and students recognizing the value of reflection as reported in previous literature (Table  4).

Table 4: Themes expressed by students in their email survey
Keyword/Theme Reflection assignments Total written comments excluding "n/a" 69 "learn" / about new specific condition 26 (38%) "learn"/ about treatment 32 (46%) "understand…" / deeper understanding 11 (16%) "empathy" / Empathy 5 (7%) "collabora.." / Collaboration 6 (9%) "self.." / Self-awareness 1 (1%) "profession.." / Professionalism 6 (9%) Course mechanics* 1 (1%) Professor-specific* 0 (0%) Manually coded themes seen in the individual student's reflection exercises suggest that students see the value of reflective exercises, and the more critical course evaluation feedback seems to have been triggered by some other factor than reflective exercises. * Course mechanics and professor-specifics were coded manually with a variety of words (i.e. "allotted hours", "grading", and any wording referring to the professor) Machine based coding was not useful to validate the sentiment of the reflection exercises as it wrongly coded statements about disease and medical treatment as "negative". Instead, machine coding of the responses demonstrated themes that reflected practical application of the knowledge acquired in this course ( Figure 5). The themes expressed in the reflection assignment also provided valuable feedback on what the students remembered as important in this course. This shows synonyms for the 1000 most frequently mentioned word combinations of three in the student responses. While "patient" and "learned" were expected from the setup of the assignment, note that many terms directly tie to specifics of patient care for the various blood conditions mentioned here.

Discussion
In this study, we failed to see a significant increase in scores between pre and post-tests, or class performance suggesting that the reflective exercises had no immediate effect on academic performance. Yet, the reflective exercises themselves provide useful insights into possible benefits that are not readily measured by standard academic measures and course evaluation surveys.
Reflection as practiced in this course did not significantly improve academic performance, as this course may already have maximized student learning through other methods. As a system-based course using multiple teaching methods, the 2015 course iteration already contained case-based learning, interaction with faculty in classroom discussions, group projects and assessments. All of these are known to enhance learning in some students. However, it is possible that addition of reflective exercises may still produce measurable academic performance improvement Boehm T MedEdPublish https://doi.org/10.15694/mep.2021.000012.1 Page | 13 in settings that favor passive learning (Wingfield and Black, 2010).
Another explanation may be that the reflective exercises were not specific enough to reinforce the learned material for better academic performance. We opted to provide a variety of reflection exercises ranging from very broad to very specific so that general reflection skills applicable to the remainder of the career could be nurtured. It is conceivable that more specific, repetitive, guided reflection on key topics assessed on the exams would yield a measurable increase in exam performance. Yet, we rejected such strategy as it also risks rote memorization of facts, which we felt was inappropriate for the goals of this course.
A third alternative explanation may be that improvements in learning triggered by reflective exercises in a health care education setting were not captured by the assessment tools of this study. Analogous to this study, Miller Juve (Miller Juve, 2012) aimed to determine learning improvement in anesthesiology residents as consequence of eight weekly, formalized reflection sessions using Gibbs' model (Gibbs, 1988) of reflection. Instead of measuring specific attainment of knowledge, Miller Juve aimed to determine improvements in attitudes, abilities and characteristics of learning of these residents using a validated formal assessment tool, the Self-directed learning readiness scale/learning preference assessment (SDLRS/LPA) (Guglielmino, 1978) and a follow-up survey on attained knowledge. Neither showed any statistical difference before and after the reflective sessions, despite the SDLRS/LPA usually regarded as a reliable and valid measure of self-directed learning. Given the dearth of quantitative data on the effects of reflection, it is possible that measurement of positive effects typically attributed to reflection await development of a validated and reliable assessment tool.
A fourth alternative explanation may be that improvements in learning caused by reflective learning may be of a different nature than increase in factual knowledge. Many studies suggest more intangible benefits that could not have been measured with the assessment tools used in this study. For example, George et al. reports that selfreflection leads to setting of more complex learning goals (George et al., 2013), and there are several studies suggesting an increase in self-awareness, professionalism and understanding as described above. The reflection assignments seem to confirm these effects, and further studies may confirm this.
Last, it is possible that the tools used in this study to measure knowledge acquisition were not adequate for the task, and therefore failed to show differences in knowledge acquisition. The pre/post test survey tool may not have had enough questions to uncover subtle differences, and it does not correlate strongly with exam performance. While it is possible that a tool with more questions covering a broader area of knowledge or clinical applications could have uncovered some measurable effects of reflection, it would have been too intrusive in teaching and it was designed to be a compromise between low intrusiveness and high comprehensiveness. While a high degree of correlation between pre/post survey score increase and final exam performance would be an ideal validator of this survey tool, it was likely not possible as exams and surveys are independent. Each exam was unique, with no questions being reused, and none of the survey questions were given on the exams. In addition, the survey tool featured questions focused on simple recall of detailed knowledge, while a high proportion of exam questions tested clinical application and critical thinking skills.
Qualitative analysis of course evaluations revealed several significant trends associated with the introduction of reflection exercises. Compared to the 2015 course that lacked reflection exercises, introduction of reflection exercises led to a significantly reduced number of comments, a lower count of "enjoyment" and "learning" keywords and reduction of professor-specific comments. The general tone of the comments changed from quite positive to more critical, and the length of the comments decreased. It is unknown if the addition of reflection exercises led to this change, or if the 2016 class had a more reluctant and critical class character than the previous class. Comparing student responses between 2015 and 2016 for another class, the system course "DMD5130 Musculoskeletal system course" taught immediately after the course described in this study and including the author of this study as instructor, the same pattern was seen and confirmed with machine based sentiment analysis. This musculoskeletal system course was taught the same in both years. Yet, the 2015 class provided more comments (18 vs. 8) than the 2016 class, and the general tenor was somewhat more positive in 2015. Consequently, the shift in sentiment was unlikely caused by addition of reflective exercises.
Instead, the reflection exercises seemed to have been positively received as based on their content, and demonstrated notions of empathy, awareness, and clinical application. In addition, these exercises provide valuable insights in what the students remember from the course which can help in future course design. As these reflective exercises are simple to create, and take little classroom time, these exercises likely provide intangible benefits that deserve further study.

Conclusion
In the context of a case-based system course, addition of reflective exercises did not produce any significant effect in student learning as measured with pre/posttest surveys, exam scores and course grades. We observed a significant decrease in positive sentiment on course evaluations when we introduced reflective exercises, but connection to the reflection exercise seems unlikely as this was also observed in other courses taken by the same student cohorts that lacked reflective exercises. Based on textual analysis, the effect of reflective exercises in early professional students is most likely seen in skills such as professionalism, interdisciplinary collaboration, empathy and learning attitudes.

Take Home Messages
Reflective exercises may not produce measurable gains in academic performance Class sentiment may vary significantly between course iterations Class sentiment changes and addition of reflection exercises may not be correlated Reflective exercises may encourage collaboration, professionalism, and empathy in some students

Notes On Contributors
Dr. Boehm is an associate professor at Western University of Health Sciences, United States of America where he teaches microbiology, the blood and lymphatic system course mentioned in this manuscript, a variety of preclinical and clinical courses of periodontics. He also is practicing full-time at the Western University of Health Sciences The Dental Center as periodontist. ORCID ID: https://orcid.org/0000-0002-5200-6178