Development of a new valid and reliable microsurgical skill assessment scale for ophthalmology residents

Background More and more concerns have been arisen about the ability of new medical graduates to meet the demands of today’s practice environment. In this study, we wanted to develop a valid, reliable and standardized assessment tool for evaluating the basic microsurgical skills of residents in a microsurgery laboratory, to get them well prepared before entering the surgical realm of ophthalmology. Methods Twenty-three experts who have teaching experience reviewed the assessment scale. Constructive comments were incorporated to ensure face and content validity. Twenty-one attendings from different specialties then graded eight corneal rupture suturing videos with the scale to investigate interrater reliability. Fourteen of them graded the same videos 3 months later to investigate intrarater reliability (repeatability). Results A total of 280 assessment scales were completed. All the ICC values of interrater reliability were greater than 0.8 with 75% data greater than 0.9 (range 0.860–0.976). All the ICC values of intrarater reliability (repeatability) were also greater than 0.8 with 63% data greater than 0.9 (range 0.833–0.954). Conclusions The assessment scale we developed is valid and reliable. This tool could be useful to ensure that junior residents achieve a certain level of microsurgical technique in a laboratory environment before training in the operation room. Hopefully, this tool will provide a structured template for other residency programs to assess their residents for basic microsurgical skills.


Background
Along with the development of ophthalmic medical education, the training of surgical skills has become a key part of it. More and more educators have realized the importance of residents' competence in the operating room; however, the traditional methods for assessing surgical skills are largely subjective. Those methods were lack of standardization, consistency and reliability. Moreover, for the student assessed, they didn't know the standards and goals of surgical training. In order to change the condition, educators worldwide had done a lot of work. A variety of surgical competency assessment tools had been developed by international ophthalmic educators, such as OASIS (Objective Assessment of Skills in Intraocular Surgery), GRASIS (Global Rating Assessment of Skills in Intraocular Surgery), OSACSS (Objective Structured Assessment of Cataract Surgical Skill) and OSCAR (Ophthalmology Surgical Competency Assessment Rubric), and the feedback from experts and application of those assessments showed excellent results [1][2][3][4][5][6][7]. By far, most of the assessments focus on the performance of residents during real-life operations, especially cataract surgeries.
China is a developing and industrialized country. Ocular rupture especially corneal rupture is a common and dangerous ophthalmic emergency, which usually is residents' first independent real-life surgery. Prompt and meticulous wound management may reduce severe postoperative complications such as wound leak and endophthalmitis [8]. Thus, residents should be well prepared before they go into the operation room. What's more, suturing technique is a critical and fundamental part of microsurgery. Standardized and adept micromanipulation and suturing would pave the way for entering the surgical realm of ophthalmology. Therefore, in Shanghai, suturing corneal rupture on pig eyes is mandated to be one of the periodical exams of residency program. Appropriate evaluation of this procedure is essential because weaknesses in training and teaching are difficult to correct without factual data [9,10]. Since no rating assessment for suturing corneal rupture has been created before, Chinese ophthalmic education workers need to develop a comprehensive assessment scale in response to the current demand. In this study, we aimed to establish an efficient and reliable assessment scale for suturing corneal rupture to ensure the basic surgical competency of residents.

Methods
This study was approved by the Ethics Committee of Shanghai General Hospital. All the operations were performed in a microsurgery laboratory using pig eyes (Fig. 1a). Each resident was given detailed information of what they were going to perform. The ruptures were "L" shaped involving the limbus. First, we made a fullthickness horizontal incision (about 6 mm) from 9 o'clock limbus to central cornea. The incision was then extended down for another 3 mm vertically (Fig. 1b). All necessary instruments, as well as distracter instruments, were laid out on the table. The whole process from gloves on to gloves off was videotaped and stored for later view. Senior attendings from different specialties were asked to watch those recorded videos and finish the assessment scales accordingly. The videotapes were chosen from residents at different rotating levels to include a range of surgical skills, and evaluators were blinded to the resident's level of training. What's more, 3 month later, each attending was asked to watch the same videos and complete the scales again. In order to avoid the recall of the last scoring, the playing order of the videos was changed.

Validity of the assessment scale
A questionnaire was created (Fig. 2) to evaluate the scale's face validity (i.e., the extent to which the components address the vital aspects) and content validity (i.e., the extent to which the components assess resident competency and skill) [3,7]. The questionnaire along with the assessment scale was sent to experts from several teaching and research offices including one member of the committee of Shanghai standardized residency program, and then the scale was revised according to their comments and suggestions.

Reliability and repeatability of the assessment scale
Senior attendings from different specialties were included in this evaluation to achieve a broad representation. The interrater reliability of different observers as well as the intrarater reliability of the same observer (repeatability) was tested using the intraclass correlation coefficient (ICC) [11]. The ICC is defined as the ratio of the between-subjects variance to the sum of the combined within-subjects and between-subjects variance [12]. ICC can very between 0 and 1, with 1 indicating perfect agreement. It should be greater than 0.7 in order for newly developed scales to be considered reliable [13][14][15]. We calculated the ICC using SPSS version 13.0 (Chicago, IL, USA). Considering the fact that we had a sample group of observers and cases, we used the Two-Way Random model. The Single Measures results were used to evaluate repeatability, and the Average Measures results were used for reliability. The significance level and confidence coefficients were set to 0.05 and 0.95, respectively.

Validity of the assessment scale
Twenty-three experts completed the questionnaire, and the results of the questionnaire were noted in Table 1. Four experts recommended adding an assessment of "preoperative preparation and postoperative cleaning up" to the scale since the videotapes contained those parts and they were aspects of surgical skills. Two experts expressed that some of the descriptors were too explicit and burdensome to read and simplification may be better. Three experts suggested to use separated rating scales for "knotting", "knots tightness", and "knots exposure". One expert commented to add "Suturing" to the scale to assess the general suturing performance of the students such as needle load and needle entry. Five experts felt there was no need to include an assessment of "abnormal events management". All comments and suggestions were considered, and appropriate suggestions were incorporated into the assessment scale, thus establishing a level of face and content validity [6].
The finalized assessment scale was shown in Table 2. This assessment scale includes 6 measures of basic surgical skills (preoperative preparation, microscope use, instrument handling, hands coordination, postoperative   clean up and overall performance) and 9 measures of the stages of suturing (suturing, suturing order, sutures interval, sutures width, sutures depth, knotting, knots tightness, knots exposure and wound leakage and anterior chamber formation), which are rated on a 5point Likert scale, with each point anchored by explicit behavioral descriptors.

Reliability and repeatability of the assessment scale
Twenty-one attendings from different specialties finished 8-videotaped corneal suturing surgeries and completed the assessment scales accordingly for the first time. Specialties represented were cataract (4), glaucoma (3), cornea (3), strabismus (1), and retina (10). Only 14 attendings finished the scale again 3 month later. A total of 280 assessment scales were completed. All experts expressed that they could complete the scale within 5 min. The interrater reliability of each surgical procedure step and overall score, considering 21 observers together, was summarized in Table 3. All the ICC values were greater than 0.8 with 75% data greater than 0.9. "Microscope use" Showed the highest reliability (0.976, 95%CI 0.942-0.994). The intrarater reliability (repeatability) of each step and overall score was listed in Table 4. All data were greater than 0.8, with 63% data greater than 0.9. "Suturing order" showed the highest repeatability (0.954, 95%CI 0.934-0.968).

Discussion
Investigations suggested a trend towards enhanced acquisition of microsurgical skill in students allowed to practice microsurgery on all kinds of simulators and/or in the wet laboratory [16][17][18]. Nevertheless, in the early twenty-first century, the ophthalmic education of residents in China was unstructured and of variable quality. There were more and more concerns arising about the ability of new medical graduates to meet the demands of today's practice environment. Thus, China started the residency program about 10 years ago and Shanghai was one of the pilot cities. Up to now, each city is still responsible for its own resident training and examination. In Shanghai, the committee of ophthalmic resident training standardized the program as 3 years of ophthalmology education, and every year they will attend an annual ophthalmology residency-in-training examination. The major purpose of those examinations is to evaluate residents' competence in 4 aspects: (1) medical knowledge, (2) patient care and communication skills, (3) case-based learning and analyzing, and (4) surgical skills. Suturing technique is a critical and fundamental part of microsurgery. Standardized and adept micromanipulation and suturing would pave the way for entering the surgical realm of ophthalmology. Therefore, the surgical skills of junior residents are assessed by performance on suturing corneal rupture on pig eyes. This kind of examination has been held for 5 years and the  ophthalmic educators found out that the traditional scoring method might be unreliable due to grade inflation and overt subjective assessments [10,19,20]. Residency examination is supposed to enable competence in all aspects by collecting performance data that reliably and accurately reflects the resident's real ability. Thus, a valid and reliable assessment tool is desperately needed. To our knowledge, this is the first throughout assessment scale for corneal rupture suturing in wet laboratory. Fisher et al. [1] developed a phacoemulsification/ wound construction and suturing technique assessment scale for ophthalmology residents, but suturing technique assessment was only part of the scale containing 8 general items. The scale was simple and only had 2 choices (not done/incorrect and done correctly). There was no behavioral or skill-based rubric for the observers to use when assessing the resident's performance. Feldman et al. [21] used a corneal laceration repair assessment to evaluate microsurgical skill improvement after training on the simulator. However, the assessment was totally objective and only measured suture depth, bite size and suture spacing. In this study, we created a comprehensive, globally applicable assessment scale to evaluate the key components of corneal rupture suturing. This assessment scale breaks down to 15 essential items including 6 measures of basic surgical skills and 9 measures of the stages of suturing, with basic skill measures similar to that employed in GRASIS and OSCAR. Moreover, the scale is rated on a 5-point Likert scale with behavioral anchors for each level in each step of the surgical procedure.
The reliability and repeatability of the assessment tools mentioned above were seldom detected. In this study, we investigated validity, reliability and repeatability of our assessment scale. For validity, we asked 23 experts from different teaching and research offices, and all the comments were considered and appropriate suggestions were incorporated into the assessment scale. Therefore, a level of face and content validity was established. Considering the reliability for the entire group of 21 observers, the ICC values were higher than 0.8 (range 0.860-0.976) in all 15 individual categories as well as the overall score, indicating reliability of the tool as a whole. What's more, the assessment scale yielded very good repeatability, with ICC values ranging from 0.833 to 0.954. An assessment scale is considered to give almost perfect outcomes when ICC value is 0.75 and above [13,15,22].
Drawbacks of the assessment scale are that it is relatively simple and it cannot provide information about resident's judgment and handling of complications on real operations. However, it is a standardized tool that can be used to determine whether a resident is adequately prepared, in terms of their basic microsurgical skills, to enter the operating room. The "passing" threshold could be set at a score of > 3 for each item on the 5-point Likert scale. In addition, process in the wet laboratory can be standardized so that each resident is assessed under comparable circumstances, and ophthalmic educators can easily track their improvements or adjust the complexity to train residents of different rotating levels by changing the rupture (straight/ "Y" shaped rupture, with/without limbus).

Conclusions
In this study, we aimed to create a standardized tool to assess basic surgical skills and to improve overall process of early surgical education. In summary, the assessment scale we developed is valid and reliable. It is an analytical scoring system that contains observable and measurable components of surgical performance. It will help educators to reduce the subjectivity of the assessment and clearly express to the residents what is expected to obtain competence. Hopefully, this tool will provide a structured template for other residency programs to assess their residents for basic surgical skills.
Abbreviations GRASIS: Global rating assessment of skills in intraocular surgery; ICC: Intraclass correlation coefficient; OASIS: Objective assessment of skills in intraocular surgery; OSACSS: Objective structured assessment of cataract surgical skill; OSCAR: Ophthalmology surgical competency assessment rubric