High-Fidelity Hybrid Simulation Not Only Optimizes Skills Acquisition But Improves Non-Technical Skills

Introduction Knee arthrocentesis is frequently performed as a diagnostic or therapeutic procedure. Although considered a key competency for medical doctors, most students never execute the procedure during their training. We aimed to assess technical and nontechnical skills for knee arthrocentesis through high-delity hybrid simulation Methods Medical students and general physicians were recruited for training; orthopedic surgeons were recruited as experts. Trainees received educational documentation prior to training. Trainees took a medical history and obtained informed consent from a patient-actor, then encountered a simulated knee to execute the procedure. We adapted a direct observation scale to assess technical and nontechnical skill performance. Personalized feedback was received after each session. Performance among trainees (learning curves) and between trainees and experts was compared using a mixed-effects model. Results Trainees signicantly improved from the rst session to the second and third. The third session was the learning curve plateau. Performance obtained in the third and fourth sessions were similar to expert performance. The assessment tool evaluated technical and nontechnical skills with high internal consistency and showed high interobserver reliability. Discussion Learning curve analysis showed that high-delity simulation allowed trainees to become procient in technical and nontechnical skills required to perform a safe knee arthrocentesis. of Evidence: Level II (Prospective Cohort


Introduction
Knee arthrocentesis is a procedure performed to aspirate uids or inject medications into the knee joint cavity. It can be used as a diagnostic or therapeutic procedure 1,2 . Diagnostically, knee arthrocentesis is a cornerstone in the differential diagnosis of in ammatory knee effusions 3 . As a treatment, joint injection of steroids or other medications plays a key role in the symptomatic relief of knee osteoarthritis (OA).
The American Board of Orthopedic Surgery (ABOS) included arthrocentesis as a core competency for medical students 4 , meaning that they considered it a signi cant milestone to obtain during undergraduate medical degree formation. Other countries have also included knee arthrocentesis as a pro ciency to be acquired during medical school 4,5 . In our postgraduate orthopedic program, only two of the last 16 accepted residents (13%) had performed a knee arthrocentesis at the undergraduate level.
Most last-year medical students feel ill-prepared to execute the procedure 6 . Furthermore, previous studies have shown that the success rate of palpation-guided therapeutic knee arthrocentesis among orthopedic surgery residents can be as low as 55% 7,8 .
Several medical education methodologies have been described to train knee arthrocentesis. These models have focused on the procedure's technical aspects, leaving nontechnical dimensions out of the training scenario 9,10 . This leads to an incomplete and underperforming educational model 11,12 . The inclusion of nontechnical skills promotes meaningful experiences that have shown to improve learning 13 . Moreover, exclusion can induce the belief that nontechnical competencies are less relevant when approaching a patient for a procedure 13 . Additionally, previously published teaching models have lacked explicit instruments to assess trainees, limiting their reproducibility and external validity 4,10 .
The aims of this study were to (1) design and implement a high-delity hybrid simulation scenario of knee arthrocentesis, (2) compare last-year medical students and general physicians through a foursession workshop with experts in the knee arthrocentesis procedure in a simulated environment, and (3) adapt a DOPS scale to assess technical and nontechnical skills related to knee arthrocentesis.

Materials And Methods
Institutional board review was obtained from the Medical Ethics Committee of the Ponti cal Catholic University of Chile (Nº 190107005), in accordance with ethical guidelines. All subjects were above 18 years of age and gave informed consent. Last-year medical students and general physicians were recruited to complete four nonconsecutive training sessions in a three-month period. We recruited six orthopedic surgeons with experience in knee arthrocentesis as experts against whom we could compare trainee improvement. Five days before encountering their initial training session, trainees received instructional documentation including written directions and rationale for performing a diagnostic knee arthrocentesis, a description of the training scenario including the assessment tool used to evaluate performance in technical and nontechnical competencies, and a video describing the procedure step by step.
Training scenario A high-delity hybrid simulation scenario was created. A patient-actor was trained with a script consisting of a 30-year-old patient arriving in the emergency department (ED) with two-day left knee pain associated with fever and joint in ammation. The patient-actor was stationed on a gurney; upon uncovering her left knee, trainees encountered a simulated knee (Sawbones©, Paci c Research Laboratories; Vashon, WA, USA). The model joint was a non-articulated knee with a partially mobile patella. On a side table, trainees had to select the required materials to perform the procedure, including hospital paperwork and informed consent. A health care assistant was posted to assist the trainee upon request but limited his participation to orders given by the trainee. During each session, trainees were required to take an abbreviated history and perform a physical examination of the patient. They had to explain the procedure and obtain written informed consent. After preparing the required implements, they had to execute the procedure and load the laboratory test tubes. Finally, they completed hospital exam forms and gave the patients postprocedure recommendations ( gure 1). A single orthopedic surgeon evaluated all trainees, and sessions were recorded for secondary evaluation to determine the inter-rater reliability of the evaluation tool. We used a speci c direct observation of procedural skills (DOPS) scale designed for the scenario (supplemental material les 1 and 2). After the procedure, all trainees were immediately conducted to a debrie ng room to receive feedback from the surgeon who evaluated their performance.
The surgeon had been previously trained to give effective feedback using the Pendleton model 13 . Feedback was also registered. Each trainee's DOPS result constituted a point in their individual learning curve. Trainee learning curves were compared with expert performance to measure student pro ciency in the training scenario. Pro ciency was de ned as the trainee's ability to safely conduct the procedure with careful consideration for the patient and following the best practices outlined in the educational material they received measured through the de novo DOPS scale. 14 After feedback, each trainee completed a validated satisfaction scale. 15 This tool measured the trainees' perceptions regarding scenario realism, quality of the instructional material sent beforehand, feedback received, and perceived utility of the training session. One year after training, all participants were contacted to determine if they had performed any knee arthrocentesis. Those who had performed the procedure were given a questionnaire to measure how con dent and prepared they felt to undertake the real-life procedure. Speci cally, we asked if training had allowed them to perform patient consent and education, perform a safe knee arthrocentesis, ll laboratory tubes and paperwork, and explain postprocedure care to the patient. Additionally, we asked the trainees to assess the perceived utility of participating in the training sessions.

DOPS adaptation and validation
We adapted the new score from a DOPS previously validated in the same cultural setting 14 . The adaptation maintained the 11 items included in the original DOPS but adjusted their descriptors to assess knee arthrocentesis. We determined the content validity of the de novo DOPS by conducting a Delphi panel composed by experts in Orthopedic Surgery, Rheumatology and Emergency Medicine. Ratings and commentaries for each item were registered, and modi cations were made for repeat expert assessment. We repeated expert consultation through the Delphi panel until we obtained at least 80% agreement on all items.
With the second evaluation performed by another orthopedic surgeon, inter-rater reliability was assessed 19 . Validity analysis was carried out with DOPS scale applications in consecutive sessions for each trainee. The construct validity was determined through an exploratory and con rmatory factor analysis. The exploratory factor analysis detected latent variables or constructs underlying the base of the observed variables 20-21 . A con rmatory factor analysis was further performed to validate the factor structure identi ed in the prior exploratory analysis 21 . In the exploratory factor analysis, the number of factors (or dimensions) was selected considering the Kaiser-Guttman 22 and Cattell 23 criteria. Thus, the factors with Eigenvalue above one, and those above the in ection point in the scree plot were retained. The determination coe cient (R 2 ) was estimated to quantify the percentage of the scale's items' variance explained by the two factors identi ed in the exploratory analysis. Internal consistency for each dimension detected in factor analysis was performed using Cronbach's Alpha.

Learning curve analysis
A mixed-effects/multilevel model with a random intercept was constructed to study differences in consecutive DOPS results of each trainee. The use of multilevel models was based on the fact that each trainee's performance was assessed in repeated training sessions. Thus, as DOPS scores in consecutive sessions for the same subject are compared, a correlation among them is expected, producing biased estimates of the standard errors and con dence intervals. Mixed-effects/multilevel models can be used to obtain standard errors that take the clustering within subjects into account. Multilevel statistical modeling enables quantitative analysis of learning curves and has been proven to have higher statistical power than conventional repeated-measures analysis of variance (ANOVA) 16 . This statistical method has also been used in previous research to analyze how trainees acquire skills [29][30][31] .
Given that residuals of the mixed-effects model did not have a normal distribution, the standard error was estimated using bootstrapping (10,000 replications). Thus, a 95% con dence interval (95% CI) was obtained using the bias-corrected and accelerated method. Mean scores and 95% CI were expressed for each training session.

Adaptation and validation of the DOPS scale
A Delphi panel composed of 17 orthopedic surgeons, four rheumatologists and two emergency department physicians. They received the 11-item DOPS with knee arthrocentesis descriptors. Two rounds were necessary to obtain over 80% agreement on every item.
In the exploratory factor analysis, two factors (or dimensions) with an Eigenvalue greater than one were identi ed (bi-dimensional). These factors explained 91.7% of the variance observed in the de novo DOPS, and they could be classi ed into a technical domain (two items) and a nontechnical domain (nine items). Regarding the con rmatory factor analysis, all standardized factor loadings were above 0.3 (cut-off point) and statistically signi cant. The R 2 uctuated between 0.28 (item one) and 0.93 (item ten). It means that at least 93% of the variability of item ten' scores was explained by the factor identi ed. Internal consistency for each dimension was α=0.86 in the technical domain and α=0.70 in the nontechnical domain. Interrater reliability was almost perfect (wK 0.87).

Learning curve analysis
Twenty-eight trainees were recruited (10 last-year medical students and 18 general physicians). Only two of them (7%; both general physicians) had performed a single knee arthrocentesis prior to the training sessions. We also recruited six orthopedic surgeons to serve as experts.
The second mixed-effects model was designed to compare each trainee session with the performance of the six expert orthopedic surgeons. We found that surgeons had a signi cantly higher mean score in their session (mean score 6.94 [95% CI 6.88-7.00]) than trainees did in their rst (p<0.01), and second session (p<0.01). The third and fourth trainee sessions did not differ signi cantly with the experts' performance (p=0.21 and 0.31, respectively).

Training satisfaction survey and follow-up
After debrie ng, 24 of the 28 trainees (85%) answered a questionnaire to evaluate the training experience.
They all agreed that (1) the instructional material they received prior to the training sessions was useful, (2) the training scenario allowed them adequate training, (3) they perceived the feedback received as useful, (4) the assessment tool allowed them to focus on improving speci c tasks, (5) the addition of a clinical case at the beginning improved scenario delity, and (6) the use of a trained patient-actor allowed improvement of nontechnical (communication) skills. The only question that received less than 100% of agreement was the perception that the knee model used was realistic (only 19 of 24 agreed; 80%).
Seven of the 28 trainees (25%) had performed a real-patient knee arthrocentesis by the one-year followup. All of the students thought that the training was useful and allowed them to perform the procedure safely. They felt con dent regarding patient consent and education, performing the procedure and explaining postprocedure cares to the patient. The only item where two trainees (29%) felt insecure was selecting the correct laboratory tubes and completing paperwork.

Discussion
Our high-delity simulation scenario allowed student trainees to improve the technical and nontechnical skills required to perform a safe knee arthrocentesis.
A key to training and learning is repetition. Previous studies have focused on proving differences between students prior to training and after training 4,10 or on measuring satisfaction after a single use 17 . Repeated training and evaluation increase student performance and avoid jumping to conclusions that could be due to chance 18,19 . This is the rst study to determine how many training sessions are required to achieve pro ciency in knee arthrocentesis. Students' learning curves showed a learning plateau after three sessions. Measuring the fourth session allowed us to con rm that performance was sustained and not just a one-session peak 18 . The short learning curve could be explained by the detailed training documentation received prior to the sessions, the direct observation scale known to the trainees, and personalized feedback received after each session that allowed the trainees to focus on their mistakes and gave them advice on how to improve. This learning curve behavior is related to obtaining a pro ciency in the scenario. Obtaining an expertise in knee arthrocentesis requires experience and hence is not obtainable in a brief simulated training course 20 .
Teaching combined technical and nontechnical abilities has also proven to be effective through simulation 13,21 . Simulated procedures and situations offer a structured teaching method that is replicable, objective, and safe (for the patient and student) 22,23 . The inclusion of nontechnical aspects to a procedure usually considered technical has many proven bene ts 13 . First, it allows for the teaching and learning of nontechnical aspects that are usually left out of technical training. Doctor-patient communication plays a key role in their relationship and in ideal clinical practice 24 . Furthermore, communication and nontechnical skills have been proven trainable 25 . Second, including these skills allowed us to incorporate the written consent as part of the procedure. The task of explaining the risks, bene ts, and steps of the procedure to the patients helped our trainees incorporate signi cant theoretical concepts of knee arthrocentesis. Finally, as mentioned previously, hybrid training based on clinical scenarios adds meaning to the interaction. The presence of a real patient-actor increases the stakes for the student while remaining teaching friendly. This method has been proven to improve the learning experience 26 . Nontechnical aspects had not been trained in previous studies regarding knee arthrocentesis 9,10,17 . In our study, the nontechnical dimension also improved between training sessions.
The scenario required creating an adaptation of a direct observation of procedure skills (DOPS) scale that proved to be consistent, reliable, and useful for trainee learning (assessment for learning) 27,28 . Designing a simulated training scenario requires an objective measurement tool. Other studies have used generic observation tools or unvalidated checklists 29 . The main limitation for the use of a non-procedure-speci c tool is that it lacks the nuance of the procedure and certainly does not incorporate nontechnical skills, limiting speci c item-by-item feedback and therefore improvement. We decided to adapt a similar tool. Del no et al. had already created a direct observation tool for a speci c technical procedure (tracheal intubation) including nontechnical skills in their tool 14 . We decided to adapt the scale, maintaining the 11 items but changing the descriptors used in each of them. First, we determined that the tool had internal consistency and external reliability (among observers). This determination is important because it assures the reproducibility of our results 30,31 . The tool we created also proved to be bidimensional.
Obtaining two dimensions through exploratory and con rmatory factor analysis rea rmed the notion that we were measuring technical and nontechnical competencies in the same procedure. Finally, we decided to make the DOPS scale available to trainees prior to the sessions. Traditionally, training and teaching scenarios prefer to maintain the items and descriptors of the scale away from the students. The use of the assessment tool for session preparation improves understanding of the key elements of each step, allowing students to learn from the assessment tool, thereby streamlining feedback and improvement 32 . This improvement might help explain the high initial rating for trainees' rst session (mean score 5.89) and probably contributed to the steep learning curve they followed.
A one-year follow-up found that trainees had performed the real-life procedures con dently given the abilities trained. Student perception has been proven to impact learning 33 . Positive perception leads to improved performance and increases the completion of training programs 34 ( 34 Kirkpatrick and Kirkpatrick)(34)<sup>34</sup><sup>34</sup><sup>34</sup>. Our scenario had a positive trainee perception just after training and one-year posttraining. Trainees positively valued the feedback given to them in the debrie ng room. The feedback was structured, and the surgeon giving the feedback received prior training regarding feedback structure and technique. We believe our experience and the previous literature [35][36][37] make the inclusion of personalized feedback a key element in simulated training.
One of our main limitations was that the training sessions were not evenly spaced between students nor for single students. This is a frequent limitation when training has no curricular integration. After its initial success, the program will be included in student training, allowing for sessions at regular intervals. Secondarily, although subjective trainee evaluations rated the model favorably, the knee model used is relatively simple and did not allow the students to scope the impact knee variability has on increasing arthrocentesis di culty.
In this research, we have opted to apply multilevel statistical models to consider repeated measures obtained from each trainee and compare groups. Therefore, by using multilevel models, differences in DOPS performances between sessions, and among trainees and experts could be obtained. Identifying the point in the learning curve where the slope attens out (in ection point) is critical 27 . This point represents a progressively higher effort to achieve trainees' learning gains before reaching a performance level similar to an expert 27 . Thus, using this statistical approach, we could graphically represent the learning curves and identi ed their critical points. Future research could be conducted incorporating an interaction between individual-level (e.g., last year medical student versus a specialist resident) and group-level variables (e.g., different simulations scenarios) through multilevel statistical models.

Conclusion
A high-delity simulation scenario allowed student trainees to improve the technical and nontechnical skills required to perform a safe knee arthrocentesis. After three training sessions, the trainee performance was similar to the experts we assessed. The scenario required creating an adaptation of a direct observation of procedure skills (DOPS) scale that proved to be consistent and reliable.

Declarations Data availability
The datasets generated during and analysed during the current study are available from the corresponding author upon request.

Author Contributions
CR: Substantial contributions to the conception of the work, the acquisition, analysis, and interpretation of data for the work; drafting the work; and nal approval of the version to be published. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. SI: Substantial contributions to the design of the work, and interpretation of data for the work; revising it critically for important intellectual content; and nal approval of the version to be published. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
PB: Substantial contributions to the conception and design of the work, the acquisition, analysis, and interpretation of data for the work; drafting the work; and nal approval of the version to be published. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
CV: Substantial contributions to the design of the work, the acquisition, analysis, and interpretation of data for the work; revising it critically for important intellectual content; and nal approval of the version to be published. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
CN: Substantial contributions to the design of the work, and interpretation of data for the work; revising it critically for important intellectual content; and nal approval of the version to be published. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
JV: Substantial contributions to the design of the work, and interpretation of data for the work; revising it critically for important intellectual content; and nal approval of the version to be published. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
IV: Substantial contributions to the design of the work, and analysis of data for the work; revising it critically for important intellectual content; and nal approval of the version to be published. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. EF: Substantial contributions to the design of the work, and analysis of data for the work; revising it critically for important intellectual content; and nal approval of the version to be published. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
AR: Substantial contributions to the design of the work, and interpretation of data for the work; revising it critically for important intellectual content; and nal approval of the version to be published. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Figure 1 Illustration showing key steps of the training scenario following the medical history and informed consent: (a) selecting required materials; (b) sterile environment preparation; (c) patient and skin preparation; (d) puncture site selection; (e) puncture and uid extraction; and (f) tube lling and paperwork.

Figure 2
Mean performance and standard error (vertical axis) for each trainee session (one through four of the horizontal axis) and the experts' performance ("Experts" on the horizontal axis)

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.