Evidence-based medicine skills that last: A transferable model utilizing integration, spaced learning, and repetition with a single study design among second-year medical students

Introduction. The medical education literature lacks descriptions of evidence-based medicine (EBM) curricula with competency-based learning outcomes. The objective of this report is to describe an approach to designing, implementing, and assessing long-term learning in an integrated second-year EBM curriculum. Methods. Two complementary approaches were used. The primary deliberate approach incorporated large-group randomized controlled trial (RCT) critical appraisal sessions into existing organ system modules. The second approach added brief applications of EBM content to small-group case-based learning sessions. To assess learning, an open response written examination mapped to EBM competencies was administered at the beginning of the third year. Results. Data were available for 241 students. Using only walking knowledge, 47% of students at the beginning of the third year discussed two major weaknesses of an RCT; an additional 39% did so for only one. The ability to formulate a clinical question, describe elements of an appropriate search strategy and determine applicability to different patients was demonstrated by 84%, 87%, and 81% of examinees, respectively. Conclusion. This early work demonstrates that durable learning of EBM skills, including critical appraisal, is achievable among second-year medical students. Further work to improve learning in the second year and extend learning into subsequent years is forthcoming. Lupi C, Lefevre F, Ward-Peterson M MedEdPublish https://doi.org/10.15694/mep.2017.000221 Page | 2


Introduction
The evidence-based medicine (EBM) movement started in the 1990s, and coalesced around the three domains of clinical practice guidelines, systematic reviews, and critical appraisal in the mid 2000s. Over this time, medical educators have recognized the value of EBM education for trainees in developing critical thinking, statistical reasoning, and practice-based learning. With its place long well-secured among graduate medical education (GME) competencies, a high bar for competency in EBM has now been set for undergraduate medical education (UME). The Association of American Medical Colleges (AAMC) has designated evidence-based medicine skills as one of the Core Entrustable Professional Activities (EPAs) for Entering Residency, an effort being undertaken in the United States "to provide expectations for both learners and teachers that include 13 activities that all medical students should be able to perform upon entering residency, regardless of their future career specialty" (AAMC).
This designation is highly important as a goal for the current evolution in competency-based curricular design. UME programs are falling short in graduating students with the desired EBM skills. A survey of internal medicine program directors' views on the AAMC EPAs showed that EPA 7 (Form clinical questions and retrieve evidence to advance patient care; AAMC, 2014) was among those with the greatest gap between expected and observed performance on day 1 of internship (Angus et al., 2017). A review of the national level 1 milestones (for a beginning intern) across Accreditation Council for Graduate Medical Education (ACGME) residency specialties reveals a very low bar for functional EBM skills, with almost none describing basic proficiency asking clinical questions or searching or appraising the literature. These low expectations were presumably based on the experience of milestone developers with their own interns (ACGME). It is worth noting that all of the current milestones describe a learner with a much less developed skillset for EBM than that promised by EPA 7.
These findings from GME are not surprising. EBM educators at the UME level have long faced truly substantial challenges, including learner difficulty with uncertainty, suboptimal role models and the complexity of the skillset. Learning this complex skillset requires applying the best approaches of a competency-based curriculum at a time when medical education struggles to shed its old structure as a collection of individual time-bound courses (Holmboe et al., 2017;Djulbegovic and Guyatt, 2017). This timing certainly offers an opportunity for teaching in a field that coalesced only a decade ago, and it brings the challenge of determining the developmental path to competency that should inform the design of competency-based curricula.
Survey data confirm the creativity of EBM educators as they begin to address these challenges through integration, faculty development, and longitudinal approaches (Maggio et al., 2016). Indeed, a handful of descriptions of longitudinal curricula have emerged in recent years and include reports of both clinical and preclinical efforts. The majority however do not include learning data at all, or provide only limited data on short-term knowledge gains rather than longer-term acquisition of knowledge or skills (Rao and Kanter, 2010;Mojica, 2013;Chitkara et al., 2014;Elçin et al., 2014;Ahmadi et al., 2015).
To the authors' knowledge, only two reports of longitudinal curricula have offered long-term learning outcomes. Aronoff et al. (2010) exposed third year students to online modules over 18 weeks, and provided faculty feedback on four structured assignments over the clerkship year. The validated Fresno test, which covers some (although not all) EBM skills, was used as the assessment tool. Students achieved a statistically significant gain of 10 points, from 66.6 to 77.7 points out of 212 maximum between pre-and post-tests. The educational significance of this gain was unclear (Aronoff et al., 2010). The second report of a longitudinal curriculum described a dedicated course focusing on key study design types at the end of the second year followed by individually graded assignments integrated into  (West et al., 2011). Neither of these studies reported the learning outcomes according to sub-competencies or skills.
To determine which educational approaches could prove both feasible and effective at their own institutions, educators urgently need a variety of both descriptions of teaching interventions and relevant learning outcome data. The objectives of this report are to describe our approach to designing, implementing, and assessing early competence in an integrated second-year EBM curriculum, with a strategic focus on critical appraisal of randomized controlled trials (RCTs), and to report associated long-term learning outcomes aligned to UME-level expectations.

Methods
Prior to implementation of the integrated curriculum, our EBM coursework consisted of two traditional foundational stand-alone courses in epidemiology and evidence-based medicine at the beginning of the first and second years, respectively, both delivered by physician-epidemiologists. In addition to these courses, we implemented the integrated curriculum in second-year organ system modules and the year-long clinical skills course ( Figure 1). Neither the organ system modules nor the clinical skills course originally included EBM instruction.
We chose to focus efforts toward longer-term learning of the skills that comprise the functions of EPA 7 appropriate to the second-year student: 1) formulating the clinical question; 2) assessing validity of reference sources; 3) practicing early skills in critical appraisal of medical literature; and 4) practicing early skills in assessing the generalizability of the evidence to a specific patient (AAMC, 2014). We selected content and pedagogical approaches that were feasible within our limitations of curricular time and faculty resources, as well as theoretically capable of achieving longer-term learning.
We took two complementary approaches to integration over the second year, one deliberate and the other opportunistic ( Figure 1). Our primary deliberate and innovative approach incorporated large-group RCT-based sessions into the existing organ system modules. In the second approach, we took the more common method of incorporating EBM learning, as "EBM moments," into small-group case-based learning sessions embedded in the Clinical Skills course (Maggio et al., 2016). Several factors facilitated these approaches, starting with a committed and skilled group of EBM faculty, some with PhDs in clinical epidemiology and others with formal training in EBM education. The process was also supported by the significant role of one of us (CL) in overall curricular management, support from other curricular deans, and our institution's selection for participation in the AAMC's consortium of schools piloting implementation of the EPAs.
For the large-group EBM sessions, we obtained 1-2 hours of class time in most organ system modules. We chose to expose students repetitively to only one study design, the RCT. We chose one design, rather than multiple designs, to create several opportunities for learners to engage in the comparative thinking necessary for the higher order cognitive tasks of EBM-making judgments of validity and applicability. We chose RCTs for two reasons. First, we reasoned that a strong understanding of how each of the purposeful elements of proper RCT design reduces bias prepares learners to better recognize the potential for bias in other designs, as well as the complexities of the appraisal process for systematic reviews. Second, since RCTs underlie the majority of treatment guidelines, we expected that a reasonable foundation in appraisal of RCTs would enhance students' understanding of the preappraised secondary sources that will inform many of their point-of-care clinical decisions, preparing them to more judiciously select those pre-appraised sources. Thus, the curriculum could prepare them as "users" of EBM, while also providing important initial instruction for those learners who may choose to become "doers" of EBM.
We employed a large-group active learning format. Our approach permitted repetitive and spaced learning of the EBM process and included multiple points of expert feedback in each learning session, necessary components in moving learners toward competence in the development of complex skills such as critical appraisal. The sessions followed the pedagogical format for case-based RCT appraisal that has been used at the Duke Teaching and Leading EBM Workshop (Duke, 2017). These used the FRISBE heuristic, a mnemonic instrument to support long-term retention of critical appraisal elements (Maggio et al., 2015). The heuristic consists of the following elements: follow-up, funding, randomization and allocation concealment, intent-to-treat, similarity at baseline, blinding, and equal treatment (Duke, 2017). During the large-group sessions, students were provided a short patient case followed by the relevant RCT. Students worked in pairs with a version of the article marked to highlight the FRISBE information and then completed a worksheet addressing PICO construction, basic searching, critical appraisal, statistical calculations, overall estimation of internal validity, and application to the patient. The cases used for the large-group sessions addressed disease processes emphasized as core course material in that organ system module (Table 1), thus reinforcing core course material. In seven of the organ system modules, summative examinations were developed in-house, and course directors utilized 2-3 questions from our content. Studying for these examinations thus provided students additional EBM learning opportunities over the year. For the small-group case-based EBM moments, we identified opportunities for integration into existing clinical skills case-based learning. Most focused on the use of likelihood ratios to establish a post-test probability of disease ( Table  2). The EBM moments necessitated recall and application of basic EBM knowledge and concepts appropriate for this level of learner.
To assess learning, we developed a tool that used an authentic RCT and assessed the four EBM skills (ask, acquire, appraise, and apply) that can be assessed using a patient case. We used the validated and widely-used Fresno test to assess asking and acquiring (Ramos et al., 2003), and added internally-developed open-response questions to assess reasoning in critical appraisal, applicability, and communication with the patient. We built the examination on a patient case relevant to an authentic RCT with two major methodological weaknesses. A question on applicability provided three hypothetical patients, one presenting with an overt exclusion criteria, another clearly fitting the study criteria, and a third who had reasons to both justify and not justify application of results. Another question asked for a verbatim explanation of the study results to a patient. Through faculty consensus, we developed rubrics with cutoffs and required behavioral criteria to align with EPA 7 entrustable behaviors. We assessed quantitative Lupi C, Lefevre F, Ward-Peterson M MedEdPublish https://doi.org/10.15694/mep.2017.000221 Page | 7 communication using 7 of 9 items in a tool developed by Han and colleagues (Han et al., 2014).
All students completed the assessment at the beginning of their third year. We deliberately chose to make this a formative test and to provide no other incentive or opportunity for preparatory study. Therefore, results represented "walking knowledge," or longer-term rather than episodic learning. Each open-response item was graded by two raters; prior research demonstrated acceptable to excellent interrater reliability for these questions (Lupi et al., 2016). Descriptive statistics included frequency distributions, and mean and standard deviation for Fresno Test scores; analysis was conducted using Stata 14 (College Station, Texas). We obtained ethical approval from the Florida International University Health Sciences IRB (protocol number 16-0056).

Curricular Design, Integration, and Faculty Development
We utilized existing curriculum management structures to facilitate integration over two years. For the large-group sessions, course directors provided 1-2 hours per organ system module. While development of these sessions incurred the usual substantial burden of active-learning design, delivery of each session by one faculty member and the option for use over subsequent years improved feasibility. For the small-group EBM moments, course directors had only to approve content additions to existing cases that did not alter the remainder of the case.

Student EBM Competency
Since implementation in 2016, all students have been required to complete the EBM assessment; data were available for a total of 241 students (two classes). The average score on the Fresno Test was 152.1 ± 22.3 (range: 82-205). Using only walking knowledge, approximately 47% of students at the beginning of the third year correctly identified and discussed the two major weaknesses of the RCT; an additional 39% did so for only one (Table 3). Thus, approximately half the class performed an assessment of internal validity independent of faculty guidance. Performances for the arguably easier skills of question formulation, searching, and applying were almost all above 80%. We also uncovered relative strengths and deficiencies in learning among our domains. For example, nearly 81% of the class correctly ranked the applicability of study results to a set of specific patients. On the other hand, skills in quantitative communication with patients, an area not covered in the large-group sessions owing to limited time, were clearly lacking. The large group sessions did not succeed in training the majority of students to calculate perhaps the most clinically important statistical outcome for an interventional trial -the number needed to treat (NNT). Nearly 75% could provide the absolute risk difference (ARD). A summary of these and student performance on other learning outcomes is shown in Table 3.

Discussion
We have described a longitudinal integrated approach to teaching and assessing the skills of clinical question formulation, searching, appraising, and applying evidence, and have provided outcome data of retained learning mapped to these competencies for two classes of students. Our outcomes for material taught through the large-group Lupi C, Lefevre F, Ward-Peterson M MedEdPublish https://doi.org/10.15694/mep.2017.000221 Page | 10 sessions are promising, and suggest that for a sizeable proportion of the class, this relatively efficient approach was successful.
The learning impact of the large-group curriculum may have been limited by a lack of individual student accountability that could have heightened student engagement. We are now considering aggregation of the completed session worksheets into a portfolio that will be reviewed with individual students in the third year.
Results for the one skill specific to the small-group sessions that we assessed-diagnosis math-failed to demonstrate retained learning. As the results for NNT and ARD calculation were also mostly disappointing, this likely reflects the need for even more frequent and/or focused exposure to achieve retained learning of this material. Students actually had the opportunity to practice these calculations in seven sessions. It should also be noted that the small group sessions did not have the benefit of consistent faculty expertise. One important and unaddressed barrier in the small-group cases has been the lack of opportunity to provide faculty development. While the facilitator guides included written material and links to videos for faculty preparation, anecdotal evidence suggested that some faculty de-emphasized the EBM moments owing to lack of knowledge and training, resulting in wide variations in the coverage, practice, and feedback accorded to these "moments." Another factor may have been student deemphasis of the material owing to a lack of summative assessment on this material over the course of the year.
The main limitation to our work is the lack of a comparison group; this was not feasible within the constraints of our curriculum. While we do not have a baseline assessment to compare performance of previous classes prior to implementation of the longitudinal curriculum, others have documented Fresno benchmarks among EBM learners and practitioners. Smith et al. (2016) collected data on the Fresno test performance from 417 third-and fourth-year medical students, a population not substantially different from our own. The average score achieved by this group of learners was 100.1 (Smith et al., 2016). Ramos et al. (2003) administered the Fresno test to family practice residents and another group of self-identified EBM experts. The average scores were 95.6 for the residents and 147.5 for the EBM expert group (Ramos et al., 2003). Although we recognize the limitations of comparisons with populations that are not identical to our own, we believe it is worth noting that at the beginning of their third year, our students performed as well or above the mean scores in both of these studies.
In addition to promoting retained learning, our approach has several other strengths. By integrating the large-group sessions within organ system modules, we were able to reinforce didactic material that had been introduced earlier in the course. The placement of this material in a clinical context and with an EBM focus allows students to better appreciate the links between the foundational and clinical sciences. To the greatest extent possible, we chose RCTs for these sessions that represented landmark studies in the area. This increased the students' familiarity with studies that were likely to be cited during clinical training, and exposed to the existing research base for each chosen topic.
Another limitation in the transferability of our assessment tool is the faculty effort required for the scoring of openresponse questions. We were able to enlist the help of staff with advanced degrees in public health and the sciences to mitigate the burden. To reduce faculty time required for grading, we will likely remove some of the Fresno questions on skills and knowledge and rely exclusively on our own internally-developed items, which already test this content at a higher cognitive level.
Further work is underway to expand faculty development for small group facilitators and to create additional learning opportunities in the second year through increased use of periodic summative assessment. We are also working with faculty overseeing the communication skills curriculum to address quantitative communication skills earlier and with some frequency. Finally, we plan to continue to validate our assessment tool and to use it as a progress test for use in the third and fourth years. Beyond aiming to improve learning in the second year, we are working to extend learning into the third year with multiple strategies: case-based EBM assignments, short oral article presentations, blended online and face-to-face active learning instruction in clinical practice guidelines, systematic reviews and the judicious use of point-of-care resources, and an EBM objective structured clinical examination.

Conclusion
This early work demonstrates that significant learning of the complex EBM skillset, including the foundational skill of critical appraisal, is likely achievable among second-year medical students using the strategies of repetitive, spaced, and active learning and integrated into a traditional curricular structure under the guidance of a faculty expert. We have not demonstrated impact from using brief applications of EBM principles into small-group discussions. We have designed an open-response competency-based test producing data that simulates EBM skills and maps to undergraduate competencies. Further work to improve on the learning from this segment of the curriculum, and extend learning and application into subsequent years, is underway.

Take Home Messages
Case-based learning of critical appraisal of authentic trials can be successfully integrated into early stage traditional undergraduate medical curricula. Repeated, spaced appraisal of RCTs using a large-group active learning strategy over one year leads to retained skills in formulating questions, and appraising and applying the evidence. Integration of brief EBM content into case-based learning was less successful in achieving retained learning. Faculty development continues to challenge the delivery of effective EBM education.

Notes On Contributors
Dr. Carla S. Lupi is the Associate Dean for Faculty and Professor of Obstetrics and Gynecology at Florida International University Herbert Wertheim College of Medicine, where she also serves on the team working with the AAMC Pilot for the Core Entrustable Professional Activities for Entering Residency.