Reliability of online dental final exams in the pre and post COVID-19 era: A comparative study

Amidst the fourth COVID-19 wave in Viet Nam, national lockdowns necessitated the closure of numerous dental schools. To assess DDS (Doctor of Dental Surgery) graduation exams, this study analyzed their 2021 implementation in comparison to onsite exams conducted in 2020 and 2022 at the Faculty of Odonto-Stomatology, University of Medicine and Pharmacy at Ho Chi Minh City, Viet Nam (FOS-UMPH). The final online examination comprises two main sessions: a synchronous online examination using FOS-UMPH e-Learning for theories (consisting of 200 MCQs and 3 written tests with 3 clinical situations needed be solved) and a synchronous online examination using Microsoft Teams for practicum (comprising of 12 online OSCE stations). The final grades were evaluated using the same metrics in face-to-face final examinations in 2022 and 2020. A total of 114, 112 and 95 students were recruited for the first-time exams in 2020, 2021 and 2022, respectively. In order to analyze the reliability, histogram and k-mean clustering were employed. The histograms from 2020, 2021 and 2022 showed a striking similarity. However, fewer students failed in 2021 and 2022 (13% and 12.6%, respectively) compared to 2020 (28%), with clinical problem-solving part grades (belonging to theory session) being notably higher in 2021 and 2022. Intriguingly, the MCQ Score results showed the identical patterns. The courses of orthodontics, dental public health, and pediatrics subjects (in the group of prevention and development dentistry) stood out for their exceptional accuracy across both sessions. After examining data gathered over three years, we identified three distinct clusters: the first comprised of scattered average and low scores, the second characterized by high scores but unstable and scattered and the third cluster boasting consistently high and centered scores. According to our study, online and onsite traditional graduation exam results are relatively equivalent, but additional measures are necessary to standardize the final examination and adapt to the new normal trend in dental education.


Introduction
The COVID-19 pandemic has had a widespread impact on numerous facets of society, including dental education and assessment of undergraduate students. To cope with the crisis, dental students have had to adapt to a temporary change in their education. Clinical placements have been cancelled, forcing them to switch to remote study from home. Dental schools have responded proactively by introducing online lectures and teaching opportunities to ensure that the students' learning is not hampered through virtual platforms such as Zoom, Microsoft Teams, sOLAT and Web 2.0 tools [1,2].
Amid the COVID-19 pandemic, clinicians have admirably balanced their clinical duties with teaching which is highly appreciated. The e-learning course consisted of various components, including educational materials, project tasks, practice tests, and interface ask a teacher, which allowed for a comprehensive online learning experience [2]. However, the limitations of online dental exams include the inability to perform hands-on practical training and the lack of personal interaction between students and patients [3]. Dental students, however, are experiencing some apprehensions regarding the examination process. To mitigate future disruptions, universities are contemplating remote testing as a solution [4][5][6]. Nevertheless, there are multiple aspects to consider while administering the dental school exam, such as ensuring that students are meeting the learning outcomes and calculating the Educational Performance Measure (EPM). The pandemic has also resulted in the cancellation or postponement of all previously scheduled Objective Structured Clinical Examinations (OSCEs) due to social distancing measurements. Ultimately, learning from predecessors is crucial for students to advance in the field [7]. The absence of clinical experience and skill evaluation has caused has also sparked increasing concerns. Hence, assessing the effectiveness of remote online exams in this regard is imperative.
To objectively uncover novel insights from the data, we employed k-mean clustering analysis in this research. Our study offered an alternative method to assess the reliability of the final exam scores, moving beyond traditional comparative analysis statistics like t-test and one-way ANOVA.

Final excamination procedure
The research plan was approved by the Ethics Committee of University of Medicine and Pharmacy at Ho Chi Minh city. The score collection was approved by "the FOS-UMPH Academic Affairs Unit".
The final exam framework for 6th year dental students who have been completed their curriculum is detailed in Fig 1. The exam consists of both theory and clinical components. For theory section, students must answer 200 multiple-choices questions (MCQs) spanning 12 disciplines, and complete written tests on three clinical problem-solving scenarios, including Restorative Dentistry, Development and Preventive Dentistry, Oral Pathology and Surgery. To pass the MCQ portion, a minimum of 120/200 questions must be correctly answered with a maximum score of 10/10. In the written test, students must score at least 5 points out of 10 in each scenario to pass. The practical exam is conducted in OSCE format with 12 corresponding to the 12 disciplines. To pass the OSCE, students must at least 5 points out of 10 in each station with a maximum score of 10/10. To ensure a satisfactory evaluation, a minimum score of 5 was required for the 6th year clinical practice. The practicum score was combined average of the OSCE and 6th year clinical practice scores. The total score was determined by averaging of theory and practicum scores. Final exam framework for 6th year dental students. MCQ pass was to correct at least 120/200 MCQs with maximum 10/10 scores. For written test, passing was at least 5 scores/situation (with maximum 10/10 scores). The OSCE pass was at least 5 scores/ station (with maximum 10/10 scores). The 6th year clinical practice score was at least 5 scores. Practicum score was average of OSCE and 6th year clinical practice score. Total score was measured as an average of theory and practicum scores. In 2021, conventional assessments comprise in-person theoretical tests, where students are provided with computer system, and clinical examinations where they rotate among different stations and are assessed face-to-face for specific tasks. For virtual assessments, Microsoft Teams (v1.0.23) was utilized to facilitate both theoretical and practical examinations across multiple meeting rooms (Fig 2). We applied the Team and Channel division functions in Microsoft Teams to allocate students to their respective rooms after each round of testing. To ensure a seamless virtual experience, candidates were required to check their internet connectivity, computer compatibility, and availability checklist for the examination. Additionally, virtual waiting rooms were equipped with supervised cameras for both sessions.

Collection and organization of data
Theory and practicum scores were collected by using the same evaluation metrics in both onsite (2020, 2022) and online (2021). There were 114 students taking the final exam in 2020, 112 students in 2021 and 95 students in 2022 (S1 File). In each data, scores of 12 disciplines of each section (or station) were collected, including:

Data observation
Data were processed and analyzed using R version 4.1.3 and RStudio 2021.09.0 Build 351, PBC. We visualized scores in 2020, 2021 and 2022 with histograms, density plots, boxplots using ggplot2 (v3.3.5) and lvplot (v0.2.0) R packages. For each histogram, scores (ranged from 0 to 10 presented by different color) were visualized. For each density plot, scores (ranged from 0 to 10) were visualized with the number of students getting regarding score for 3 groups (2020, 2021 and 2022).

K-mean clustering analysis
We applied K-mean clustering method by factoextra (v1.0.7), stats (v4.1.1) R packages to analyze score data in three years (S2 File). The package provides convenient functions to extract and visualize the output of multivariate data analyses as our present data, including 'PCA' (Principal Component Analysis) prior to data visualization by ggplot2 package. K-means clustering is a method of vector quantization to partition n number of students/ year (observations) into k clusters in which each student (observation) belongs to the cluster with the nearest mean of scores (cluster centers). This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances). At the minimum, all cluster centers were at the mean of their Voronoi sets (the set of data points which were nearest to the cluster center). Algorithm of Hartigan and Wong (1979) was applied by default [8]. Characteristics of each cluster of two datasets were then interpreted and concluded. Firstly, we determined and visualized the optimal number of clusters k (from 1 to 10) in each year. Next, we performed k-means clustering on a data matrix to partition the points of score data into k groups such that the sum of squares from points to the

Statistical analysis
All results were expressed as means ± standard error of the mean. Statistical differences were determined by 2-way ANOVA with Bonferroni post-test or by 2-tailed Student's t test. A p value less than 0.05 was considered significant.

Distribution of component scores throughout 3 years
While the composition score spectrum showed similarity between years in total theory score and total practicum score, there was a disparity in scores between years on closer observation ( Fig 3A) ( Table 1). In addition, the average scores of each subject were displayed in Table 1, in which, the average practicum scores of clinical subjects (oral implantology, orthodontics, prosthodontics, periodontology, oral radiology, and oral surgery) are significantly different between online and onsite (2020 vs. 2021 and 2021 vs. 2022).
Fig 3B indicated that the distribution of MCQ total score of year 2022 was slightly shifted horizontally along the X-axis and the mean score was higher than 2020 and 2021. In particular, in 2022 examination, there were more students who achieved MCQ score in the range of 7.5-10 than that of 2020 and 2021 examination. In addition, the same tendency in total theory scores of 2022, and there is a relatively small amount of overlap ( Fig 3C). Interestingly, regarding to Clinical Problem-Solving scores, the mean score of 2021 was the highest one among the three cohorts. Moreover, the Average Score of Clinical Exam Scores of the 3 years was distributed very similarly (Fig 3D).

K-mean clustering analysis revealed 3 identical clusters in each year
To characterize the features of score across 3 years, we performed k-mean clustering analysis. The results revealed 3 clusters in data collected in these 3 years showing the same characteristics (Fig 5A and 5B). The 1st cluster (cluster 2 in 2020, 2021, and 2022) which was scattered average and low scores. The 2nd cluster (cluster 3 in 2020 and 2021, cluster 1 in 2022) which was high scores but not stable and scattered. The 3rd cluster (cluster 1 in 2020 and 2021, cluster 3 in 2022) was high and centered scores. The size of cluster 1, 2 and 3 in 2020 was 85, 7 and 22 students respectively. The size of cluster 1, 2 and 3 in 2021 was 71, 12 and 29 students respectively. The size of cluster 1, 2 and 3 in 2022 was 30, 5 and 60 students respectively. We can see that students in each year were clarified clearly with the compatible sizes for scattered average and low scores groups (7, 12 and 5 students or 6.1%, 10,7% and 5.3% in 2020, 2021 and 2022 respectively), high scores but not stable and scattered groups (22, 29 and 30 students or 19.3%, 25.9% and 31.6% in 2020, 2021 and 2022 respectively), high and centered scores groups (85, 71 and 60 or 74.6%, 63.4% and 63.1% in 2020, 2021 and 2022 respectively).

Discussion
Viet Nam faced a surge of SARS-COVID19 cases in 2021 that resulted in a public health lockdown from May 31 to September 30. The restrictions imposed included stay-at-home orders, mandatory mask-wearing, social distancing; and gathering limits [9]. The COVID-19 pandemic has brought about new challenges in dental education, and the need for an online evaluation platform has been recognized [10]. It is important to note that online evaluation platforms can be designed in many different ways, and their reliability depend on various factors, such as the number of items, types of questions, and methods used to prevent cheating [11]. Additionally, the reliability of self-reported dental visits may not be as valid as direct clinical examinations, which are considered the gold standard for assessing dental health [12,13]. In response, our faculty had to organize tele-assessments for final year dental students, with all participants interacting remotely online. The assessment comprised two main sessions: a synchronous online examination for theories and a synchronous online examination for practicum. The grades were evaluated using the same metrics in face-to-face final examination in 2022 and 2020. The findings revealed similarities in the composition score spectrum for theory and practicum scores between years, with disparity noted. Additionally, Development and Preventive Dentistry shows the most consistency over time and K-mean clustering analysis identified three identical clusters in each year.
The initial difference in subject scores between years can be attributed to the variance in assessment tools-online and offline examination. Although online education and evaluation have been around for some time, the recent pandemic has brought it to the forefront. A study by Jagadeesan and Neelakanta (2021) utilized an online self-assessment tool for medical students in biochemistry during the pandemic, highlighting the successful transition

PLOS ONE
Reliability of online dental final exams from traditional classroom teaching to online education for dentistry as evidenced by examination results [14]. This study contributes to the existing literature by critically examining students' evaluation performance in higher education during the pandemic online teaching period and suggests that the transition from traditional classroom teaching to online education in dentistry was successful in terms of adaptability as assessed by examination results. The results of the study indicate that utilizing an online model for final exams of dental students in Vietnam is a viable approach. The authenticity and dependability of this method have been established, rendering it apt not just for summative evaluations but also for formative assessments in the dental curriculum, particularly in the digital age and during the era of remote learning. Online examinations offer several advantages like ease of access, time and cost savings, instant feedback, and multimedia support in question design, among others. They also encourage creative and analytical thinking, provide personalized options for question display, and clarity in answer reception [15,16]. The study provided valuable insights of online tests. However, online tests may have some limitations as the uncontrollable nature of test takers can compromise the reliability of the test. In our case, due to the Covid19 lockdown, the six-year undergraduate dental program was compelled to shift the final exam to online assessment within a limited timeframe from May 31 to September 30. Insufficient time for clinical practice has negatively impacted students' Restorative dentistry subject scores in 2021. This particular field demands hundreds of training hours, making it challenging for students to excel without ample practice opportunities. Varied internet speed and connectivity issues may have impacted the exam process, with three cases of disconnection within 10 to 30 seconds. Fortunately, all affected students could reconnect and complete their examinations. Furthermore, each student's unique examination environment may have influenced their level of focus during the exam.
Despite similarities in course material and lectures, significant differences in practicum exam scores were observed between courses. However, it is important to note that assessment practices are not standardized, and each instructor has their own subject criteria for evaluation. As such, the reliability of our qualitative analysis may be influenced by external factors such as pandemic, adjustments made by teachers to help students perform better, and the challenges posed by transitioning from in-person to online exams, which impact students' technical, emotional, and intellectual abilities.
The COVID-19 pandemic has led to the closure of many dental schools, thus necessitating the need for effective and reliable online teaching and assessment tools [17]. However, the short amount of time given for online exam preparation due to the pandemic has raised concerns about the reliability of online dental exams. Our study indicates a high reliability, which is similar to other studies [17,18]. Nevertheless, COVID-19 has limited the value of online oral exams, as they require significant time to ensure fairness and reliability [19,20].
The consequences of the COVID-19 pandemic might also be a factor that affected the students' health and psychology in their examination preparation and examination process, but this issue has not been fully investigated in this study. Hence, further research is required to understand the potential challenges and limitations of online dental final exams, and how they may differ from traditional in-person examinations. Additionally, the lack of direct observation and physical interaction with patients during an online exam may limit the ability of examiners to accurately assess a student's clinical skills and decision-making abilities. These are important considerations that need to be addressed to ensure the reliability of online dental final exams. Further studies are needed to explore these factors in depth and to develop best practices for online dental education and assessment.

Conclusion
In this study, we applied k-mean clustering analysis as a new approach to evaluate reliability of final exam score besides the conventional statistics based on p-value looking on each single subject. The tool provided a comprehensive vision of the present data with feature discovering of each cluster. On a positive note, our final year written examination and clinical OSCE distance assessments proved successful in evaluating student performance. In light of this, we may consider retaining online written summative MCQs and distance assessments with extended matching questions. However, remote assessment may not be practical for certain OSCE tasks within and beyond our field. While simultaneous remote assessments help evaluate competency with great accuracy, we must prioritize in-person clinical OSCEs whenever feasible.