Trainee Satisfaction with Feedback provided using an Entrustment Scale: A Survey of Internal Medicine Residents

Background: Numerous scales have been developed to evaluate medical learners, including entrustment scales. Little is known about resident satisfaction with entrustment scales. The objective of this study was to evaluate residents’ perceptions of entrustment scales as a method of assessment in comparison to traditional evaluation methods. Methods: Residents (n=102) at the University of Pennsylvania Internal Medicine residency program were asked to participate in a survey in June 2016 regarding perceptions of an entrustment scale, which was incorporated into end-of-rotation resident clinical evaluations in 2014. The survey assessed the assessment utility in various domains, including overall perception of the scale, overall satisfaction, and preference in comparison with other rating scales. Qualitative comments were elicited via free-text to further clarify residents’ perceptions. Results: Sixty of the 102 residents (59%) completed the survey. Most residents positively rated the usefulness of questions asked (n=54, 92%) and comments provided (n=48, 98%). Similarly, of those previously evaluated by numerical scales (n=29), numerical scales with behavioral anchors (n=26), and numerical scales with adjectives (n=35), 72%, 88%, and 83% found the entrustment scale to be equivocal or superior, respectively. Qualitative comments supporting the entrustment scale noted improved ease of interpretation, objectivity, and intuitiveness. Conclusions: Residents are satisﬁed with entrustment scales, noting the entrustment scale to be superior compared to traditional scale alternatives. Residents found entrustment scales provided a more objective assessment, allowed for easier interpretation, and were more intuitive than traditional scales.


Background
Competency-based medical education has evolved in response to the growing demands by the public that medical educators are held accountable for the preparation of trainees for unsupervised practice. (Choe et al., 2016;Holmboe, Sherbino, Long, Swing, & Frank, 2010;Olle Ten Cate & Billett, 2014) The Accreditation Council for Graduate Medical Education has promoted the shift to competency-based training in graduate medical education, recommending assessment via a criterion-based milestones framework. (Kwan et al., 2016;Nasca, Philibert, Brigham, & Flynn, 2012) However, optimal formative assessment within this framework has been challenging for medical educators and trainees alike. Specifically, educators have noted a disconnect between competency-based training and evaluation, particularly given the challenges of direct observation necessary within this framework, and added confusion translating competencies into meaningful metrics of assessment. (Holmboe, 2015;Williams, Dunnington, Mellinger, & Klamen, 2015) To address this issue, entrustment scales have been proposed to bridge the gap between the theoretical aspect of competency-based education and real world clinical practice. (Englander & Carraccio, 2014;Kwan et al., 2016;Rekman, Gofton, Dudek, Gofton, & Hamstra, 2016;Olle Ten Cate, 2013a, 2013bO. Ten Cate et al., 2016) Entrustment-based assessment aims to evaluate trainees against what they will actually do when independent. This form of assessment has been implemented within graduate medical education using entrustment scales, which are defined as behaviorally anchored ordinal scales detailing one's progression to competence. When using these scales, observers are asked to select the necessary level of supervision required for the trainee to complete a specific task. Therefore, entrustment scales align with the construct of competency-based education when compared to more traditional evaluation scales, such as ordinal scales with adjectives or normative scales, and reflect a judgment with relevant clinical meaning for assessors. (Crossley, Johnson, Booth, & Wade, 2011;Rekman et al., 2016;Williams, Klamen, & McGaghie, 2003;Yeates, O'Neill, Mann, & Eva, 2013) In addition to intuitiveness for assessors within clinical evaluation, early uses of entrustment scales suggest improved inter-rater reliability as compared to alternative evaluation scales (Gofton, Dudek, Wood, Balaa, & Hamstra, 2012;Kogan, Conforti, Iobst, & Holmboe, 2014;Mink et al., 2017), particularly as these scales provide an assessment structured around the way evaluators already make day-to-day clinical entrustment decisions. In response, residency programs have begun to institute these as an adjunct to their traditional milestone evaluations. (Mink et al., 2017;O. Ten Cate et al., 2016) Little work, however, has been done to assess the overall utility of this assessment tool via the conceptual framework proposed by Van de Vleuten, specifically focusing on validity, reliability, educational impact, feasibility and cost of assessment metrics. (Vleuten & Schuwirth, 2005) Trainee satisfaction with feedback, embedded within this framework, is paramount to the success of any assessment strategy and overall assessment utility. Particularly, trainee satisfaction (and related trainee acceptability) is an important component of an assessment's ultimate educational impact, an essential metric of overall assessment utility. Despite this, residents' perceptions of entrustment scale assessments have yet to be formally evaluated. This information is critically important, as prior research has established the beneficial impact of trainee satisfaction with a given assessment metric, and subsequent trainee growth and learning. (Watling & Lingard, 2012) In order to address this gap, the objective of this study was to assess internal medicine (IM) residents' perception of the entrustment scale as a method of evaluation on end-of-rotation clinical evaluations completed by faculty, in comparison to traditional assessment methods.

Setting and Participants
The IM residency program at the Hospital of the University of Pennsylvania implemented an entrustment scale ( Table 1) for evaluation of interns and residents as a metric of competency-assessment in 2014. Prior to the implementation of the entrustment scale, the residency program used faculty and peer evaluations completed at the conclusion of each inpatient rotation and semi-annually for outpatient practices. These former assessments consisted of 11 to 15 questions varying by post-graduate year (PGY) level (intern forms distinct from resident forms) rated on a 9-point Likert scale with adjectives; additional space was provided for open-ended mandatory comments. In 2014, these assessment forms were revised for use in a 5-point entrustment scale (modifying both questions and scale) based on residency curricular milestones, which served as the basis for the current survey study. Other assessments of residents, including nursing evaluations, medical student evaluations, and peer evaluations of handoffs, remained structured as per the traditional assessment form (numerical scales with adjectives or behavioral anchors).
At the completion of the 2015/2016 academic year, all categorical IM residents (PGY 1 to PGY 3) were asked to complete an anonymous paper survey to assess their perception of the entrustment scale, both as an evaluation scale itself, and in comparison to alternative rating scales. Residents were invited to participate following each resident's individual biannual meeting with the IM program director, during which the program director reviews the summary of the trainee's evaluation over the prior six months. Mean values for each assessment question item were compared to PGY-matched means.

Survey Design
The survey instrument was developed by the principal investigators (CJD, JRK) and distributed to participating residents at the completion of the 2015/2016 academic year.
In the survey, residents were asked to evaluate their perception of the entrustment scale using a 5-point Likert scale, assessing the usefulness of the questions asked (where 1 = 'very useless'; 5 = 'very useful'), usefulness of evaluators' comments (where 1 = 'very useless'; 5 = 'very useful'), usefulness of assessment scores when comparing to peers (where 1 = 'very useless'; 5 = 'very useful'), and overall satisfaction (where 1 = 'very dissatisfied'; 5 = 'very satisfied'). Residents were asked about prior experience and with other evaluation scales, including numerical scales (prompted with 'have you received evaluations where the rating scale was only numerical without any words under the scale'), numerical scales with behavioral anchors (prompted with 'have you received evaluations where the rating scale had numerical ratings with examples of behaviors that describe what the numbers mean'), and numerical scales with adjectives (prompted with 'have you received evaluations where the rating scale had numerical ratings with adjective descriptions that describe what the numbers mean'). Detailed examples of each assessment type were provided within the survey. Residents were then asked to compare the overall helpfulness of each prior assessment method with the entrustment scale ('Compared to this type of scale, how does this current evaluation using the entrustment scale compare,'where 1 = 'much less helpful'; 5 = 'much more helpful'). Finally, qualitative comments were elicited using open space for free text comments to further clarify perceptions of the entrustment scale, both in general and in comparison to each alternative rating scale.

Data Analysis
Data were imported into a Microsoft Excel spreadsheet file (Microsoft Corp., Redmond, WA) for quantitative assessment of survey responses. Thematic analysis of the free text responses was performed by the principle Heath J, Kogan J, Dine J MedEdPublish https://doi.org/10.15694/mep.2018.0000040.1 Page | 4 investigator (CJD) using post-positivism paradigm. Initial codes were identified manually, with themes identified iteratively until there was saturation of themes. The findings of the thematic analysis were independently confirmed by the primary author (JKH). The study protocol (number 824162) was reviewed by the institutional review board at the University of Pennsylvania and was determined to be exempt from review.

Results
Sixty of the 102 residents completed the survey (59% response rate). The majority of respondents rated the entrustment scale positively ( Table 2), noting the questions asked were either 'somewhat useful' or 'very useful' (n=54, 92%). Additionally, the majority of residents described the open text evaluator comments provided in addition to the scale as 'somewhat useful' or 'very useful' (n=48, 98%). Residents also found the entrustment scale 'somewhat useful' or 'very useful' (n=48, 85%) when used as a method of comparison among their peers. Overall, the majority of respondents were either moderately or very satisfied with the use of the entrustment scale (n=52, 90%).
Approximately half of respondents indicated that they previously had been evaluated using an alternative scale during their medical training. Of those respondents previously evaluated using numerical scales without anchors (n=29), 48% (n=14) found the entrustment scale to be superior, and 24% (n=7) found the scale to be equivocal to alternatives. In comparison to numerical scales with behavioral anchors (n= 26), 50% of residents (n=13) felt the entrustment scales were superior, while 38% (n=10) felt the scales were equivocal. Finally, of those previously evaluated using numerical scales with adjectives (n=35), 54% (n=19) found the entrustment scale to be superior, while 29% (n=10) found the entrustment scale to be 'about the same.' Of those residents that had previously been evaluated by alternative metrics, qualitative comments preferentially supporting the entrustment scale focused on improved ease of interpretation, objectivity and intuitiveness, as shown in Table 3, which were consistent themes across each alternative scale comparison.
The ease of interpretation and relevance was a predominant theme identified throughout the comments. Specifically, residents commented that the entrustment scale 'is easier to interpret and understand,' and 'makes much more sense.' Another resident noted that the entrustment scale 'allows [them] to identify areas where [one] actually need[s] improvement,' as compared to alternative metrics. As one resident noted, 'I think it makes more sense to rate [residents] practically on whether or not we can complete a task, rather than nebulously rating how well [residents] do the task.' Finally, as one resident noted, '…the entrustment scale forces the evaluator to focus on what is important. Feedback is more relevant and applicable.' In addition to ease of interpretation, several residents commented on the objectivity of the scale, noting that the entrustment scale 'seems like a more objective way to scale people than simply a number scale.' Despite these perceived benefits, trainees commented that the entrustment scale was not appropriate for all skills, noting that the use of this scale might be 'more applicable to describe skills or tasks,' 'dependent on context,' or may 'appear a little artificial in the context of the questions asked.' It was also perceived inadequate for performance improvement without evaluator comments, specifically noting 'the success of this scale requires thoughtful commentary to identify areas of improvement.'

Discussion
Although residency programs have begun to adopt entrustment scales as a component of evaluation, trainee satisfaction with feedback, an essential component of the overall utility of any assessment strategy, has not been evaluated. This is the first study addressing trainee satisfaction with entrustment scales in an IM residency program. After implementation of entrustment scales within our IM residency program in 2014, the majority of residents surveyed were moderately or very satisfied with the entrustment scale as a general method of assessment. In addition to reporting overall satisfaction with this scale, our results suggest that entrustment scales are considered to be an equivocal or superior method of assessment by residents, particularly when compared to alternative metrics such as numerical scales, numerical scales with behavioral anchors, and numerical scales with adjectives. Within the era of competency-based assessment, this provides crucial evidence supporting the overall utility of this form of assessment in residency training.
Furthermore, among survey respondents, our qualitative analysis elucidated themes regarding the perceived benefits of the entrustment scale, which included ease of interpretation, objectivity, and intuitiveness. This is consistent with the conceptual framework of entrustment scales, which were developed to align with the way evaluators already make day-to-day clinical entrustment decisions.
To our knowledge, this is the first study to assess residents' perceptions of entrustment scales, and is the first investigation highlighting trainee satisfaction with this form of assessment, a paramount component of the overall assessment utility. (Vleuten & Schuwirth, 2005) Our findings suggesting trainee preference for the entrustment scale likely reflects the challenges learners face interpreting traditional assessment scales within competency-based medical education, as noted by prior critics of these scales. (Hauer et al., 2016;Rekman et al., 2016) Theoretically, these challenges in interpretation of traditional metrics might limit the impact of formative feedback on trainee development. As trainee satisfaction and acceptability is crucial in an assessment's ultimate educational impact, our finding of resident preference for the entrustment scale may have broader implications in graduate medical education. The addition of entrustment scales, given the intuitive nature, ease of interpretation, and alignment with outcomes based medical education (leading to improved construct alignment), may ultimately allow for improved influence on trainee learning and development. (Watling & Lingard, 2012) Of note, in addition to the positive findings of interpretability and objectivity, our qualitative analysis also revealed potential limits of broad applicability of entrustment scales with residency training. Specifically, trainees noted that some of the components of the scale appeared 'a little artificial in the context of the question asked,' and 'perhaps [were] more applicable to describe skills or tasks.' Consistent with this observation, the majority of prior studies evaluating entrustment scales as a method of work-place assessment have been implemented within procedural specialties, such as obstetrics and gynecology, anesthesiology, and general surgery. (George et al., 2014;Gofton et al., 2012;Weller et al., 2014) Therefore, the use of entrustment scales for assessment of non-procedural tasks, while still valuable in non-procedural settings, may warrant scale modification to optimally assess all trainee competencies within the milestones framework.
In addition to a novelty of the finding of trainee satisfaction with entrustment scales, the further strengths of our study include the inclusion of both quantitative assessment of the residents' perceptions with further evaluation using qualitative methods. This allowed for expansion of our understanding regarding the perceptions of residents, and can serve as basis for future qualitative studies beyond our institution.
The limitations of the study include generalizability, as it was performed from a single IM residency program. We did have a significant non-response rate in our survey responses, however, our response rate of 59% falls within the Heath J, Kogan J, Dine J MedEdPublish https://doi.org/10.15694/mep.2018.0000040.1 Page | 6 expected range for physician groups. Furthermore, emerging evidence indicates that response rates are poorly correlated with response bias, particularly within this group. (Asch, Jedrziewski, & Christakis, 1997;Cull, O'Connor, Sharp, & Tang, 2005;James, Ziegenfuss, Tilburt, Harris, & Beebe, 2011) Otherwise, overall perceptions of the entrustment scale may have been biased by distribution of the survey following programmatic review of evaluations (within the program director meeting), which may have impacted the resident perception of the scale. Finally, while the improved perception of the entrustment scale has potential for improved trainee development, our study did not directly assess this outcome.

Conclusions
In conclusion, entrustment scales were perceived positively by trainees within a single IM residency program. Trainees found benefit in terms of the ease of interpretation, objectivity, and intuitiveness of the scale. Our study provides early data supporting the acceptability and educational impact of entrustment scales, a necessary component in the assessment utility index.

Take Home Messages
Entrustment scales have been implemented as an assessment metric within competency-based medical education. Resident perception of entrustment scales has not been evaluated, and is an essential component of the assessment utility. Our study highlights the overall satisfaction of residents with entrustment scales, particularly in comparison to other metrics of assessment. Qualitative comments supporting the entrustment scale noted improved ease of interpretation, objectivity, and intuitiveness.

Notes On Contributors
JD and JK each made substantial contributions to the conception or design of the work, as well as the acquisition, analysis, and interpretation of the data. JH made substantial contributions to the acquisition, analysis, and interpretation of the data. Additionally, they each played a role in the drafting and revisions of the manuscript, and have provided final approval for the version to be published. Finally, they are each agreeable to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.