Looking into the future of clinical skills assessment in undergraduate medical education during the COVID-19 Pandemic

In light of the COVID-19 pandemic in 2020, United States Medical Licensure Examinations (USMLE) announced its momentary cancellation of its Step 2 Clinical Skills (CS) Examination. This suspension brought to attention the need to evaluate the current methods of clinical skills assessment. Objectively, this period in medical education marks a time for change and improvement. Although this may seem radical, medical education has been continuously changing over the past few decades. The utilization of long case, short case, and viva voce examinations for clinical skills assessment morphed into using the Objective Structured Clinical Examination (OSCE) and Step 2 CS. While OSCEs and Step 2 CS are currently mainstay assessment methods in medical education, the new challenges that COVID-19 has imposed requires medical educators to improve these methods in order to maintain social distancing guidelines. Special consideration should be made to incorporating modalities such as video conferencing, artificial intelligence, virtual reality, and workplace based assessments. The momentary suspension of medical school activities was clearly unexpected, but it is vital that medical educators continue to improve clinical skills assessment in conjunction with the present times.

appropriate actions taken when being presented with a specific circumstance in an exam, "shows how" as the ability to perform in a particular clinical circumstance, and "does" as a physician's behavior in the real-life clinical setting (Dauphinee, 1995). For example, performance-based testing is based on a candidate's ability to "show how" (Dauphinee, 1995). Prior to the 1970s, clinical testing within medical education focused on "knows" and "knows how" via the administration of multiple choice questions and patient management problems (Dauphinee, 1995). However, the advent of the OSCE by R.M. Harden in 1979 ushered in a new era in medical education that focused on enhancing clinical skills at the bedside in conjunction with medical knowledge gained in lecture halls (Harden et al., 1975).
Over the following decades, OSCEs have served as the primary method of clinical skills assessment at the medical school level. Since 1979, its format has constantly evolved to address its value as a reliable and valid assessment method. The number of clinical encounters and time allotted for each station has served as two important variables that affect the psychometric properties of this examination. However, despite constantly changing the OSCE format, the range of reliability and validity scores seemed to be too broad. Thus, there was a need for a standardized assessment of clinical skills that could improve the psychometric properties of the OSCE. In the early 2000s, the focus on developing clinical skills in future physicians gained traction nationwide and resulted in the development of Step 2 Clinical Skills (CS) by the USMLE as a solution to standardize testing of clinical skills.
However, due to the public health risks posed by the Coronavirus pandemic, in May 2020 the USMLE board announced its suspension of Step 2 CS for the following twelve to eighteen months. This brief pause allows us to revisit the current method of evaluation of clinical skills and consider how this may change going forward, with special considerations made to creating a "COVID-free" testing environment as an adjunct to the evolving use of technology in modern medicine.

A brief history of clinical skills assessment
Prior to the 1970s, medical education was greatly dependent upon assessment of clinical knowledge rather than its application at the bedside (Epstein, 2007). Short case, long case, and viva voce (oral) examinations were commonly used methods to test clinical skills in candidates during this period. Short case assessments required candidates to perform a focused examination on five to six patients while long case assessments allocated thirty to forty-five minutes to allow for the candidate to perform a history and complete physical examination, followed by questions regarding clinical management (Khan et al., 2013). Oral examinations consisted of a time period for candidates to assimilate the given clinical material followed by ten to fifteen minutes of questioning by examiners (Khan et al., 2013). The psychometric properties of these examinations are questionable at best. The viva voce assessments have poor content validity, higher inter-rater variability, and inconsistency in marking, essentially rendering this assessment method as highly unreliable (Tabish, 2008). Due to the relatively unstructured nature of questioning by examiners in each situation, there was an incredible need for the development of an assessment method that would improve the reliability and validity of the previous examinations.
The introduction of OSCEs in the late 1970s served to remedy this problem. Simply defined, OSCEs consist of multiple standardized patient (SP) stations that assess medical student interactions with patient-related medical issues. Student performance is dictated by clinical reasoning skills and problem solving abilities of these clinical scenarios (Courteille et al., 2008). Harden's first conducted OSCE in 1972 was a 100-minute examination consisting of eighteen testing stations with two rest stations, with each station lasting 4.5 minutes (Khan et al., 2013). The composition of OSCEs has since varied with the number of SP stations and length of time at each station. Importantly, research has consistently demonstrated the need for multiple SP stations for each defined clinical problem (Stillman et al., 1982) due to findings suggesting highly variable reliability scores between 0.20 and 0.95 Bapatla N, Pearson S, Stillman S, Fine L, Bauckman K, Rajput V MedEdPublish https://doi.org/10.15694/mep.2021.000171.1 Page | 3 (Harasym, Mohtadi and Henningsmoen, 1997). However, other studies have suggested that station length plays a minimal role in reliability (Schuwirth and van der Vleuten, 2003), unless extreme changes in length are implemented. Consequently, research efforts have been focused on identifying factors that increase or decrease reliability (Doig et al., 2000), such as test content, design, and implementation factors (Turner and Dankoski, 2008). For example, the use of checklists in OSCEs was initially implemented in order to minimize inter-rater unreliability; nonetheless, this was refuted by various studies suggesting that the use of global rating scales offers very little influence on reliability (Schuwirth and van der Vleuten, 2003). Although OSCEs were implemented with the consideration of improving reliability and validity of long and short case examinations, there are clear complications that require modern solutions.

The development and controversy of USMLE Step 2 Clinical Skills
Prior to 2004, clinical skills assessments at the national level were not administered to US medical graduates. Such an assessment, the clinical competence assessment (CCA), was only administered to graduates of foreign medical schools (Sutnick et al., 1994). In 1992, the Educational Commission for Foreign Medical Graduates (ECFMG) conducted a series of pilot projects to assess clinical competency of foreign medical graduates (FMG). The CCA consisted of integrated clinical encounters with ten standardized patients, which provided profiles of clinical competencies including data gathering, interviewing and interpersonal skills, diagnosis and management skills, interpretation of laboratory and diagnostic procedures, written communication, and spoken-English proficiency (Sutnick et al., 1994). These scores proved useful to residency directors, and they supplemented these results with scores on written examinations (Sutnick et al., 1994).
Through collaborative efforts of the ECFMG and National Board of Medical Examiners (NBME), the USMLE Step 2 CS was developed to administer an examination to US medical seniors testing competency in clinical skills through a multiple-station, standardized patient modality (Hawkins, 2005). The development of Step 2 CS was largely influenced by the need for promoting public safety through ensuring US trained physicians were appropriately prepared for the clinical setting, similarly to the already established CCA for FMGs.
Step 2 accomplishes this through testing of vital physician competencies such as taking medical history, performing physical examinations, effective communication with patients, accurate documentation of findings, and identifying appropriate initial diagnostic studies (Hawkins, 2005). Data interpretation scores provided by this examination proved to provides useful information that predicts clinical performance of physicians in supervised practice (Cuddy et al., 2016).
In 2016, medical students from Massachusetts initiated a movement to end Step 2 CS, which quickly gained traction and the support from the Michigan State Medical Society, Massachusetts Medical Society, and the American Medical Association (AMA) Student section (Elder, 2018). Arguments for cessation of Step 2 CS cite expense, limited accessibility, and questionable psychometric properties (Kashaf, 2017). Indeed, the examination is administered in five major US cities, which adds to the already steep price required to register for it. Additionally, opponents argue that the pass/fail nature of the examination does not provide any value to medical assessment; rather, quantitative results may better assist residency directors in targeting clinical strengths and weaknesses (Nguyen, 2018). Conversely, a survey of residency program directors indicated the high utility of Step 2 CS in screening residency applicants for professionalism, communication skills, and translation of knowledge into clinical skills in practice (Paniagua et al., 2018). Furthermore, there are concerns that without Step 2 CS it would be difficult to ensure a minimum entry standard of clinical competence in international medical graduates (IMGs) (Elder, 2018).
In spite of the surrounding controversy, in May 2020 Step 2 CS was suspended for twelve to eighteen months in accordance with CDC recommendations surrounding the COVID-19 pandemic. With a war raging in the wards across the globe, medical schools have also had to adapt to the new normal and find creative ways to continue Bapatla N, Pearson S, Stillman S, Fine L, Bauckman K, Rajput V MedEdPublish https://doi.org/10.15694/mep.2021.000171.1 Page | 4 education for students. Although there are hopes that eventually we may contain this novel virus and return to normalcy, it is prudent to consider a new era of medical education that not only incorporates a "COVID-free" environment but may also solve issues of cost and standardization.

Accomplishing a "COVID-free" testing environment
The Centers for Disease Control and Prevention policy suggested social distancing guidelines with the arrival of COVID-19 in early 2020. As a result, there was a major upheaval in medical education. Navigating social distancing in conjunction with assessing clinical skills have proven difficult yet not impossible. From a logistical standpoint, OSCEs may be administered in environments that ensure strict infection control and personal hygiene, no large group gatherings, and social distancing of individuals (Boursicot et al., 2020). Despite the potential for these control measures to mitigate the spread of COVID-19 among examiners, students, and SPs alike, the USMLE Step 2 CS chose to take a more cautious route with its suspension. While COVID-19 may be contained in the near future, it is imperative that we consider altering the methods of assessing clinical skills in order to avoid complete suspension of medical education in similar future scenarios.
Video conferencing has been a readily adapted modality during the pandemic that may be useful as a permanent method for assessment. Using software such as Zoom to conduct a web-OSCE requires access to a reliable Internet connection and personal devices with built-in audio and video capabilities (Major et al., 2020). With the proper technical support and a trained pool of SPs clinical skills such as accurate history taking, communication, and critical reasoning can be assessed virtually (Major et al., 2020). Utilization of web-OSCEs may further serve a purpose in training medical students in telemedicine (Lara et al., 2020), which has grown increasingly important in modern clinical care. Additionally, with the use of video conferencing, faculty may observe students from any geographic location (Cantone et al., 2019), which allows for lower out of pocket costs for the student and easier scheduling of examiners and examinees.
Unfortunately, video conferencing is blatantly unable to assess the performance of an adequate physical examination. Perhaps artificial intelligence (AI) may be a better-suited modality for this. With the use of AI, evaluation can be more objective, fast, and cost-effective (Masters, 2019). Additionally, AI can provide a more extensive individualized feedback of medical student performance (Masters, 2019). Virtual reality (VR) may also be useful in implementation of a COVID-free environment. VR applications allow students to have 360-degree views of real or simulated places. Software could be developed to simulate patient care scenarios and allow students to perform history taking and physical examinations while concurrently mitigating the spread of infection. VR has shown to be an interactive and engaging educational tool that supports knowledge retention and skills acquisition (Sultan et al., 2019).
Recently, the use of workplace-based assessments (WPBA) has gained traction in graduate medical education. With consideration of Miller's pyramid of clinical competence, WPBA targets "does" and enables collecting information of physician performance in the everyday clinical setting through commonly used tools such as direct observation of procedural skills (DOPS), Mini-Clinical Evaluation Exercise (mini-CEX), and case-based discussions (Liu, 2012). WPBA is especially superior to OSCEs because it targets limitations such as the deconstruction of the doctor-patient encounter in favor of performing isolated aspects of the clinical encounter (Liu, 2012). In this aspect, WPBA are considered more authentic than OSCEs when applying a resident's ability to perform in a real-life clinical setting. It may be advantageous to apply WPBAs to medical student education, especially considering how many institutions have had to halt clinical skills assessment with the COVID-19 pandemic. If precautions were taken to ensure the safety of medical students in the hospital setting in the event of another pandemic, trained physicians could continue clinical skill assessment in students.
Bapatla N, Pearson S, Stillman S, Fine L, Bauckman K, Rajput V MedEdPublish https://doi.org/10.15694/mep.2021.000171.1 Page | 5 The evolution of clinical skills assessment has mirrored the needs of its time. The development of OSCEs in the 1970s was a reflection of the necessity in administering a standardized clinical exam to medical students in order to facilitate the importance in clinical skills from medical knowledge. The implementation of Step 2 CS in 2004 further built on the foundation provided by OSCEs to assess medical students fairly at a national level while also providing residency program directors appropriate information to screen applicants equally from US and international medical schools. Currently, further studies are needed to develop an entirely COVID-free clinical assessment using modalities such as videoconferencing, AI, and VR in order to adapt to the ever-changing needs of medical education. However, necessity may find itself leading to the invention of new, less expensive, safer, and valid teaching and assessment tools.

Take Home Messages
The suspension of Step 2 CS due to COVID-19 requires evaluation of current methodology of clinical skills assessment in order to prevent future suspension of medical school activities in the face of similar events. OSCEs served as a solution to the low reliability and validity of short case, long case, and vive voce assessments.
Step 2 CS standardized examination of clinical competencies across medical schools. Incorporation of AI, VR, videoconferencing, and WPBA should be considered for assessment of clinical skills at the undergraduate medical school level in order to establish a "COVID-free" environment.