Empirical analysis comparing the tele-objective structured clinical examination and the in-person assessment in Australia

Purpose It aimed to compare the use of the tele-objective structured clinical examination (teleOSCE) with in-person assessment in high-stakes clinical examination so as to determine the impact of the teleOSCE on the assessment undertaken. Discussion follows regarding what skills and domains can effectively be assessed in a teleOSCE. Methods This study is a retrospective observational analysis. It compares the results achieved by final year medical students in their clinical examination, assessed using the teleOSCE in 2020 (n=285), with those who were examined using the traditional in-person format in 2019 (n=280). The study was undertaken at the University of New South Wales, Australia. Results In the domain of physical examination, students in 2020 scored 0.277 points higher than those in 2019 (mean difference=–0.277, P<0.001, effect size=0.332). Across all other domains, there was no significant difference in mean scores between 2019 and 2020. Conclusion The teleOSCE does not negatively impact assessment in clinical examination in all domains except physical examination. If the teleOSCE is the future of clinical skills examination, assessment of physical examination will require concomitant workplace-based assessment.

port or demonstrating empathy [6,7]. The implementation of an online format signals a further departure from its historical design. Recent efforts to deliver the tele-objective structured clinical examination (teleOSCE) and online OSCE orientation have been acceptable to students and examiners respectively [8,9]. A systematic review summarized the various attempts in the literature to incorporate an online element and discovered that while there is generally good validity and reliability, there is a need for systemic research to guide the ideal teleOSCE format [10]. As yet, no study has attempted to define and discuss the impact on the assessment outcome of separating students, patients, and assessors into different locations, and executing examinations online. This is of paramount concern in the context of current (and ongoing) social distancing requirements, whilst also providing useful insights into improving the accessibility of examinations to relevant stakeholders.
The challenging year of 2020 has offered an unprecedented and valuable opportunity for data collection on the online assessment process. The University of New South Wales in Sydney, Australia is one of many medical schools that was forced to use an online "electronic" OSCE (teleOSCE) with minimal guidance. Final year students underwent their summative OSCE assessments using this online format to comply with COVID-19 restrictions.

Objectives
The following study aims to establish whether the use of the te-leOSCE has impacted assessment outcomes. Ultimately, the study aims to facilitate discussion on what skills and domains can effectively be assessed in a teleOSCE to guide future development and use of online clinical examinations.

Ethics statement
The research was conducted under the ethics approval granted by the Human Research Ethics Committee of the University of New South Wales (reference no., HC15421, HREAPG: Health, Medical, Community and Social). Complying with the ethics approval we used administrative assessment data held by the University of New South Wales which required no consent from participants.

Study design
This study is an observational study which compares the results achieved by the students examined in 2020 via the teleOSCE, with those who were examined under the traditional OSCE format in 2019.

Setting
In-person clinical OSCEs have been run for many years and staff are familiar with the requirements for implementation. Conversion of this to an online format required several steps. The prime amongst these was the technology aspect (for all participants) which was entirely novel in this setting. Thereafter, adaptation of the station was required, i.e., to suit an online format. In conjunction, personnel and procedural changes were required, before various stages of testing. The clinical stations are described below, with the technological, personnel, procedural, testing, and training aspects described in Supplement 1.
The traditional in-person OSCE, as run in 2019, consisted of 9 stations from 7 disciplines (medicine, surgery, primary care, emergency medicine, psychiatry, pediatrics, and obstetrics and gynecology) of 12 minutes duration. For each station students are provided with a short summary of a clinical scenario, from which they are expected to take a brief targeted history, and conduct a physical examination, before attempting to diagnose the patient, suggest relevant investigations and management, and answer questions from the examiner. Each station is preceded by 2 minutes reading time and followed by a short period to allow transition to the subsequent station. Within stations students are assessed on their clinical skills in relation to the specific station content, but also on generic aspects such as communication and physical examination skills. To ensure comparability between the teleOSCE, and the in-person OSCE, core features from the latter were maintained in the online version wherever possible. For the most part, this was relatively straightforward since initiation of the consultation with the patient, collection of the history, interpretation of clinical information and summarization of the case by the student could all be easily achieved in the online format (Supplement 2). However, for the physical examination and/or procedural skills components, these aspects needed to be adapted into an activity which could be completed and assessed online. Here the task now required the student to describe the approach to and process of examining the patient in whatever way they felt the station question and information elicited suggested (Table 1). These changes also necessitated modification of the assessment rubrics. To this end, each adapted OSCE station was reviewed by staff familiar with clinical assessment and with recommendations made to station authors should further changes be required to suit the online format.

Participants
For the teleOSCE in 2020, 285 senior students participated in the assessment; while, for the traditional in-person format evaluation of OSCE in 2019, 280 senior students participated. All target students participated in the examinations.

Variables
Scores of the teleOSCE examination and that of the traditional in-person OSCE.

Data source/measurement
The results of teleOSCE and traditional in-person OSCE for 2 years. The data used in this analysis were the raw scores given by the examiners before modifications made by the post examination standard setting process implemented by the medicine program [11,12]. This process was essential to allow the analysis focusing on examiners' assessment outcomes rather than overall outcomes for students which could be impacted by the standard setting method.

Bias
There was no sampling bias because all target students participated in the OSCE and all data was included in the study.

Study size
The sample size of this study was determined by the availability of participants, all of which were included in the study (N = 565; combined cohorts of 280 and 285 each). Because the study used all available data and the study had no impact on the participants it was unnecessary to limit the number of study participants nor undertake a priori sample size and power calculation [13]. The results do present the effect size which is the appropriate measure to estimate the meaning of the differences between the 2 cohorts [14]. No post-hoc sample size and power calculation were undertaken because these are not statistically appropriate [15].

Statistical methods
Independent t-test analysis was used to compare the mean scores of student assessment results between 2019 (in-person OSCE) and 2020 (teleOSCE). The comparison was made by disciplines (all assessment domains combined) and separately by domains (all disciplines combined). This analysis was chosen to identify the impact of the assessment mode on different medical disciplines as well as assessment domains, respectively. Bonferroni correction was undertaken to correct for alpha inflation.

Descriptive data of participants
In 2019, 280 students undertook the in-person examinations, while in 2020, there were 285 examinees in the teleOSCE. All students completed the entirety of their examinations, and all of their unidentified raw scores were eligible for use in this study. Demographic data was not available for inclusion in this study.

Main results
In the domain of physical examination, students in 2020 scored 0.277 points higher than those in 2019 (mean difference = -0.277, P < 0.001, effect size = 0.332). Across all other domains, there was no significant difference in mean scores between 2019 and 2020. These results are illustrated in Table 2 and Dataset 1.
When analyzing the results by discipline, compared to the 2019 (in-person) students the 2020 (teleOSCE) students scored 0.216 points higher in medicine (mean difference = -0.216, P < 0.05, effect size = 0.257) and 0.390 points higher in emergency medicine (mean difference = -0.390, P < 0.01, effect size = 0.363) respectively. Across all other disciplines, there was no significant difference between the mean scores in 2019 compared to 2020. These results are depicted in Table 3 and Dataset 1.

Key results
The cohorts in this study were of comparable size (280 versus 285) and no significant difference were seen in the outcomes of almost all domains (communication, clinical history and background, medical knowledge, interpretation, diagnosis, and management) and disciplines. This suggests that the change to an on-   line platform did not influence the performance of students, or the information available to assessors in these areas. However, significant differences are noted in the disciplines of medicine and emergency medicine, and the domain of physical examination.

Interpretation
The method of performance assessment utilized in 2019 has been in practice at the University of New South Wales for many years and minor adaptations in methodology have previously been well-described in the literature [11]. There are several potential explanations for the higher mean scores in medicine (mean difference = -0.216, P < 0.05) and emergency (mean difference = -0.390, P < 0.01) between the 2 cohorts. Most notably, the in-person OSCE at University of the New South Wales was the product of successive annual reviews and moderation informed by local data and educational research. Conversely, the teleOSCE was the culmination of 6 months of targeted design and innovation with a paucity of guidance from the literature. As such, this may have limited the case or task complexity incorporated into several of the te-leOSCE stations and this could have contributed to the improvement in assessment outcomes observed in the medicine and emergency medicine stations. For instance, the adaption of multiple procedural tasks performed in the emergency medicine station required greater consideration than in, for example, a psychiatry station, in which the patient history and mental state examination translated quite straightforwardly to the teleOSCE format. Additionally, there are multiple established examiner-specific variables that could have further influenced the difference observed. For instance, examiner training has been shown to decrease marking variation [16]. Examiner expertise may be inversely correlated with generic scores in communication, while seniority may be associated with lower scores in general [17]. With the rapid development and implementation of the teleOSCE, expertise, training and issues around seniority of examiners were less pertinent in a brandnew platform. This foreseeably could impact disciplines unevenly, given the varied suitability of each discipline to the teleOSCE platform (e.g., emergency medicine versus psychiatry).
Exploring adjustments made to the method of examination provides insights into why the physical examination domain results may have changed (Table 1). In both the in-person and online formats, students are required to plan the physical examination they would like to undertake and in the teleOSCE this plan had to be clearly articulated to the examiner. In the latter, the physical findings were provided upon specific student request. In contrast, students undertaking the in-person iteration were required to gather this information using their physical examination skills. In both formats, students then used their clinical reasoning to construct an appropriate list of differential diagnoses. As such, poor performance in physical examination may have hampered information gathering for students undertaking the in-person assessment, whereas the online format negated this deficiency by affording the student the necessary clinical information on request to formulate their diagnosis.
Additionally, the information available to the examiner is less in the teleOSCE. Previously, the in-person format permitted an examiner a single observation of the student, with multiple facets, including witnessing the students carry out the steps of an examination. By removing this specific data point, examiners simply have less information with which to make a judgement, a well-established hindrance to validity [18]. Observation of physical examination has been the cornerstone of performance examination in the OSCE format for decades, and this result challenges the value of assessing physical examination in the teleOSCE in its current format. Amending the configuration of the teleOSCE to permit the appropriate performance of physical examination skills represents a potential alternative. For instance, this could be achieved by placing the student together with the patient in a single room, leaving only the assessor in a remote location. This has been studied previously, with positive results in terms of acceptability of participants and information gleaned by examiners [19]. However, with the physical distancing requirements in place during the COVID-19 pandemic, this would not have been achievable in this study. Moreover, this method negates the improved accessibility to patients, a key benefit of the online platform. Instead, physical examination in the online platform could focus on signs that are transferrable through a screen. There is evidence through the applications of telehealth that physical examinations which rely heavily on observation such as dermatology are highly suited to this purpose [20]. Indeed an examination should represent the context it aims to simulate [21], and the teleOSCE represents the challenges of the telehealth consultation very well. It is easy to see its utility in training future clinicians for whom telehealth may become far more prevalent. But for its current purpose, that is assessing final year medical students across the various domains, there remains no current solution for a fully representative sample of physical examination skills in the teleOSCE, as the in-person format may have done.
Looking more broadly at performance examination, the observation of the performance of a single skill in a single environment during an assessment has poor generalizability due to case specificity [16]. For instance, if a student can appropriately listen to the precordium on a young fit simulated patient, they may not be able to reproduce the same performance in an unwell, overweight and/ or elderly patient. Undeniably the only way to approach validity in assessment of a single target domain such as competence in physi-cal examination, requires a multitude of observations in varied contexts [18]. To navigate the shortcomings of the teleOSCE in judging physical examination, other components of a multimodal assessment method will need to be further emphasized. We suggest that structured and cyclical clinical workplace-based assessment of physical examinations are highly suited to this purpose. Repeating observed clinical assessments and utilizing a structured feedback system (e.g., mini-clinical evaluation exercise) has been shown to provide students with a means to apply, evaluate and refine their clinical skills [22], whilst also allowing teleOSCEs to be utilized for assessment of the other domains of skills which we have shown to be unaffected by the online format.

Limitations
Utilizing student marks to compare the impact of the in-person and teleOSCE has its limitations in that more than simply the student performance and the assessment modality will contribute to this result. Such confounders were not controlled for in this observational study. Furthermore, the in-person and teleOSCE assess different aspects of competence in the domain of physical examination and this may limit scope for comparison in this domain. Of note, the in-person OSCE requires a student to elicit findings by physical examination whereas the teleOSCE requires a student to describe this process (in stations requiring more information than can be gained by observation alone). The teleOSCE does impact assessment of physical examination competence; however, this may be due to the inherent constraints of the online format of assessment in this particular domain rather than solely ascribable to differences in student performance between 2019 and 2020.

Conclusion
Research on teleOSCE is still in its infancy. Evaluating the method used by the University of New South Wales for performance examination in 2020 has demonstrated that the transition from the in-person performance assessment to a teleOSCE platform was very successful. Bar physical examination, comparing traditional in-person OSCE outcomes with those of the teleOSCE shows no significant changes across the various domains. As such, these online summative examinations can continue to play a role in the broader multimodal assessment system, providing valuable data to examiners when making an accurate judgement about a student's competency in those domains. However, this study also revealed the need for careful consideration of the way we assess physical examination using teleOSCE. Students can no longer execute the actions of a physical examination on a patient, and as such, examiners have less information with which to make an assessment and this may account for an increase in scores in this domain. With the in-creased likelihood of utilizing teleOSCE more frequently in the future, we need to identify the physical examination skills that cannot be accurately assessed via teleOSCE and make sure these skills are assessed via workplace-based assessment tools. Supplement 1. Technological, personnel, procedural, testing and training aspects of the tele-objective structured clinical examination. Supplement 2. Video of tele-objective structured clinical examination (teleOSCE) interaction. Supplement 3. Audio recording of the abstract.