The relationship of Grit to faculty evaluations and standardized test scores of anesthesia residents: a pilot study

Results: Correlation estimates between standardized test scores and grit ranged between -0.12 and 0.31 (not statistically significant at the one percent level of significance; p values ranging from 0.16 to 0.85). Correlation estimates between PGY-4 faculty evaluations and grit scores ranged from 0.43 to 0.49, but were not statistically significant. There may be a trend towards significance, as p-values ranged from 0.021 to 0.044.


Introduction
The ability to predict which residency applicants will be successful resident trainees, and eventually consultant anesthesiologists, is an area of active interest and research, especially for program directors. Residency programs compete for top resident applicants not only to provide excellent clinical care to patients, but also to enhance a program's reputation, which can have a positive impact on a program's ability to recruit top resident physicians in the future. There are economic considerations as well, as departments may utilize significant resources in money, time, and energy (Metro et al. 2005) to recruit successful residents; not to mention the investment in training once residents arrive to a program. Consequently, there is interest within academic training programs to identify the personal attributes or other pre-residency characteristics that may be predictive of high-level performance and success during training. Unfortunately for program directors and selection committees, predicting which applicants will become high-performing residents has proven challenging.
Performance on standardized tests is used extensively for screening applicants to residency programs (Boyse et al. 2002;Dirschl et al. 2006;Brothers and Wetherholt 2007;Tolan et al. 2010; Results of the 2014 NRMP Program Director Survey 2014) and results from standardized tests such as the United States Medical Licensing Examination (USMLE, or Comprehensive Osteopathic Medical Licensing Examination [COMLEX]) provide some of the few pieces of residency application data that are standardized amongst applicants. Many studies have found that USMLE scores are predictive of performance on subsequent in-training exam (ITE) and board certification tests (Bell et al. 2002;Boyse et al. 2002;Dirschl et al. 2006;McCaskill et al. 2007;Shellito et al. 2010;Guffey et al. 2011). Additionally, a meta-analysis of eighty studies confirmed that a resident selection strategy based on standardized examinations is most strongly correlated with examination-based outcomes (Kenny et al. 2013). However, achievement on pre-residency standardized exams does not reliably correlate with clinical performance (Bell et al. 2002;Boyse et al. 2002;Dirschl et al. 2006;Brothers and Wetherholt 2007;Tolan et al. 2010). In response to this dichotomy, at least one group has submitted a "plea" to training programs to stop using the USMLE as a screening tool for residency (Prober et al. 2016).
A multitude of other factors have been studied in the quest to find a pre-residency variable (or combination of variables) that may be predictive of clinical and/or overall performance -not just performance on standardized exams. These factors often focus on "non-cognitive factors" (Farrington et al. 2012) (sometimes now called "soft skills,' metacognitive learning skills," or "agency") or "emotional intelligence" (Cherniss et al. 1998), especially empathy, integrity, consciousness, emotional stability, tolerance, and communication skills. Commonly used approaches to assess these personality characteristics include face-to-face interviews (George et al. 1989;Altmaier et al. 1992;Metro et al. 2005;Dubovsky et al. 2008;Alterman et al. 2011), the Medical School Performance Evaluation (MSPE), and letters of recommendation (LOR) (Harfmann and Zirwas 2011;Kenny et al. 2013), but studies on their ability to predict future performance is mixed. Personality testing/surveying has also been studied as a possible tool for selecting applicants with a high likelihood of success during training (Gough et al. 1991;McDonald et al. 1994;Merlo and Matveevskii 2009;Lubelski et al. 2016), and positive correlations have been found in certain sub-scales of personality.
Duckworth et al aimed to define a personality trait that would be specific to, and consistent with, high levels of success in any domain (Duckworth A. L. et al. 2007;. This group conducted a series of investigations that suggest grit, defined as "perseverance and passion for long term goals" (Duckworth A. L. et al. 2007), is essential to success. Grit requires an individual to work "strenuously toward challenges, maintaining effort and interest over years despite failure, adversity, and plateaus in progress" (Duckworth A. L. et al. 2007 (Duckworth A. L. et al. 2007;Eskreis-Winkler et al. 2014). In each of these domains, individuals with higher grit scores had higher levels of success, even when compared to other common predictors such as intelligence or the Big Five personality traits (i.e., openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism).
As noted, the scientific literature is mixed regarding the ability of a wide array of pre-residency variables to predict success during residency training. Grit appears to have the ability to predict high levels of success, even among already high-achieving individuals. To assess whether grit is associated with better performance in residency training, our group conducted this pilot study of previous anesthesiology residents. We hypothesized that higher grit scores would be positively correlated with two markers of success in residency: standardized test scores and faculty evaluations of clinical performance.

Methods
The institutional review board (IRB) approved this study. An invitation email was sent to anesthesiologists who graduated from the Duke anesthesiology residency program between the years 2009 and 2012. Recently graduated residents were studied because all performance outcome variables would be complete by nature of having finished the program. Potential subjects were directed to a website to consent and then complete the "Short Grit Survey" (Grit-S) tool . The short version (6 questions, Figure 1) of the Grit Survey was used because it is shorter and thus may be perceived as less burdensome to the subjects to participate. The Grit-S was shown to have internal consistency, test-retest consistency, and predictive validity, similar to the longer, original Grit Survey . In other studies of grit in residency (Burkhart et al. 2014;Halliday et al. 2017;Salles et al. 2017), the Short Grit Scale was used, but it is important to note that we used the original 6item Grit-S, whereas these other studies used the revised 8-item Grit-S. De-identification of the data was performed such that the program director was not aware of the Grit Survey results. Those who chose to participate consented to have specific data from their educational record included in the study. Data collected from consenting participants' files included results from the pre-residency USMLE Step 1, Step 2 Clinical Knowledge, and intra-residency Step 3, reported as the numerical score. Results were also collected from the Anesthesia Knowledge Test (AKT), an anesthesiology-specific, intra-residency exam administered at 0, 1, 6 and 24 months of training. AKT results were recorded as the percentile based on the national average for the year of the test. Reporting of AKT24 required some additional normalization. Results of AKT24 are reported to programs in 7 sub-sections. For each participant, an "AKT 24 Overall Percentile" was calculated by averaging the 7 sub-section percentiles based on the national average.
In addition to standardized test scores, we also examined faculty ratings of residents. Resident clinical performance was assessed by faculty evaluation of ACGME Core Competencies (on a scale from 0 to 9). The average score for each core competency was calculated for each Post-Graduate Year (PGY) 2 through 4 (the years of clinical anesthesia training).
To protect participant confidentiality of sensitive academic information, a neutral-party data manager completed matching of grit scores to performance data such that a de-identified data set (lacking participant name/identifying information) was sent to the study team for analysis. The primary study end-points were the correlation of grit score to standardized test results (USMLE and AKT), and grit score to faculty evaluation of clinical performance as assessed by the ACGME Core Competencies. SAS (Cary, North Carolina) and R (ver. 3.3.3) were used for statistical tests. Descriptive statistics were calculated for grit score, standardized test scores, and faculty evaluations. Pearson correlation statistics were used to assess the relationship between the standardized test scores and grit scores, and faculty evaluations and grit scores. We used Pearson correlations to summarize the strength of a linear association (or trend) between our variables. The p-values are not adjusted for multiple comparisons. In addition, we included locally weighted scatterplot smooth (LOWESS) curves in our figures. These curves do not assume a linear association between variables, but computes a smooth curve that allows for easier visual evaluation of the type of association between the variables.

Results/Analysis
Forty-eight previous residents were contacted and 23 elected to participate, for a response rate of 48%. One participant was missing key information, thus some measures only included data on 22 participants. Demographic information of study subjects is shown in Table 1. Sixty-one percent of participants were male. The mean grit score was 3.52 ± 0.65 (range, 2.0 -4.33). As a comparison, a grit score of 3.5 is at the 40 th percentile of a large sample of adults in America (Duckworth A. L. 2016). In other studies of resident trainees the mean grit scores were: 3.61 -3.67 (Halliday et al. 2017), 3.64 (Salles et al. 2014), and 3.87 (Burkhart et al. 2014).
Descriptive statistics for the standardized test scores are shown in Table 2. AKT results are presented as a percentile based on the national average. Thus, the small decrease in mean AKT24 compared to the other AKT results could signify either a lower average score (signifying the learners performing worse on the exam) or could represent an increase in average scores of other national test-takers such that the subject's mean percentile was lower for a similar exam performance. However, the results are not extreme and could also be the result of chance. Correlation estimates between the standardized test scores and grit scores ranged between -0.12 and 0.31 (Table 3 and Figure 2). The associations, however, were not statistically significant at the one percent level of significance (p values ranging from 0.16 to 0.85).  For each Core Competency in the PGY-4 year, the final year of anesthesia residency, there was a trend towards a positive correlation between PGY4 faculty evaluation score and grit score (Table 4 and Figure 3). Correlation coefficients ranged from 0.43 to 0.49, and the characteristics were marginally associated. The p-values for the correlation between PGY-4 faculty evaluations and grit ranged from 0.021 to 0.044. There were no statistically significant associations with PGY-2 and PGY-3 faculty evaluations and grit score (data not shown).   Figure 3: Correlation of grit score with PGY-4 ACGME Core Competency, average faculty evaluation score. Each dot represents a study participant.

Discussion
As previously noted, the scientific literature on success in residency training is quite clear -performance on standardized tests may predict performance on future standardized exams but does not reliably predict clinical performance. "Non-cognitive" characteristics, especially empathy, integrity, tolerance, and communication skills are essential to the successful practice of medicine (  . While these attributes are often assessed by subjective means and thus may be more difficult to standardize and study, the assessment of an applicant's non-cognitive attributes may hold promise as a predictive tool for success compared to cognitive factors (such as examination scores or medical school grades). Studies investigating "noncognitive skill"-focused or "behavioral interview" techniques show correlations to future resident performance (Altmaier et al. 1992;Olawaiye et al. 2006;Brothers and Wetherholt 2007;Strand et al. 2011). For example, in a study of a surgical residency program, ratings of applicant's "Personal Characteristics," including attitude, motivation, integrity, and interpersonal skills, were strongly correlated with resident clinical performance as assessed by faculty, while cognitive measures (medical school GPA and USMLE scores) were negatively correlated (Brothers and Wetherholt 2007). An interview selection process for an Obstetrics/Gynecology training program that was heavily weighted toward the interview and non-cognitive qualities showed a positive correlation with subsequent resident clinical performance (Olawaiye et al. 2006). The Multiple Mini Interview (MMI) (Eva et al. 2004) is probably the most well-known standardized technique for trying to assess the non-cognitive, or "soft," skills of applicants, and is often used in medical school admission processes. In studies looking at predicting post-graduate performance, the MMI may hold promise, but the literature is still in its infancy (Hofmeister et al. 2009;Dore et al. 2010;Burkhardt et al. 2015;Sklar et al. 2015).
Personality testing is another area of applicant or trainee assessment that has been gaining popularity. In reviews of the academic performance literature, "conscientiousness" -often described as how a person controls, regulates, and directs their impulses, and focused on organization, thoroughness, and reliability -is a personality trait consistently found to be associated with academic performance, both in general (Poropat 2009) and in medical school, specifically (Doherty and Nugent 2011). For residency training, multiple investigators have found that extensive personality testing may yield positive correlations to performance. For example the group of Gough and McDonald (Gough et al. 1991;McDonald et al. 1994) used the 462-item California Psychological Inventory and the > 200 question Strong Interest Inventory to study anesthesiology residents and found that certain sub-scales of personality were associated with high-performing residents. Merlo et al. (Merlo and Matveevskii, 2009) used the 300-item International Personality Item Pool Representation (IPIP-NEO) and found that personality traits such as confidence and conscientiousness were associated with success in anesthesiology residents. Lubelski et al. (Lubelski et al. 2016) used the 574-item Hogan Personality assessments in neurosurgery residents, and while they did not specifically study markers of performance, they concluded that personality testing may help predict future resident behavior.
The Grit Scale is a personality survey that focuses on the propensity of an individual to doggedly pursue goals they find valuable over the long-term, despite setbacks; grit is an assessment of endurance and perseverance (Duckworth A. L. et al. 2007). In stark contrast to the previously mentioned personality assessments, the Grit Survey is shortonly 8 questions in the current version. Notwithstanding the survey's short length, higher grit has been found to be associated with greater success in many domains (Duckworth A. L. et al. 2007;Eskreis-Winkler et al. 2014). For example, individuals with higher average grit scores obtain more education and tend to have higher GPAs . Of competitors in the National Spelling Bee, children with higher grit progressed further in the competition; military cadets with higher grit scores were more likely to finish summer training or complete a Special Operations course; grittier men stayed married longer; employees with more grit kept their jobs longer and had fewer career changes; and grittier teachers were more effective at their jobs (Duckworth A. L. et al. 2007;Eskreis-Winkler et al. 2014). While a variety of personality characteristics (and other factors) may interact to mediate behaviors associated with success -like self-control/self-discipline (Duckworth A. L. and Seligman 2005), need for achievement, and self-efficacy -the relationship between conscientiousness and success seems to have the most support in the medical literature (Merlo and Matveevskii 2009;Doherty and Nugent 2011). Conscientiousness is likely closely related to grit, but at least in some sub-groups of people studied -US Military Academy training retention (Duckworth A. L. et al. 2007) and children in the National Spelling Bee (Duckworth A. L. and Quinn 2009) -the Grit Survey shows predictive validity over measures of conscientiousness.
Using the Grit Survey in a pilot study of anesthesiology residents, we found a trend toward a positive correlation between increasing grit and final-year (PGY-4) faculty evaluation of residents' ACGME Core Competencies. There was no significant association between grit and PGY-2 and PGY-3 faculty evaluations of core competencies. Faculty evaluations tend to encompass a wide view of the individual resident and their characteristics and abilities, including technical skills, knowledge, judgment, empathy, and professionalism. In this context, it seems reasonable that grit, which encompasses many aspects of personality that relate to success (motivation, endurance, perseverance, and passion), may be likely to correlate with residency performance evaluations by faculty. While this is a preliminary finding and was not statistically significant in our study, future studies with more power to detect differences are ongoing to assess if grit is associated with resident performance. In addition, it may be possible to assess for grit in the pre-residency time period and use this as a residency selection tool to help predict which future trainees may perform well. This will have to be evaluated in future studies.
The fact that only the final year's evaluations showed a trend toward correlation with grit could be explained in multiple ways. It could be that years of exposure to a trainee are needed for faculty evaluators to gain a comprehensive appraisal of resident performance. This finding could also mean that these gritty characteristics are most notable in the final year of training, when most senior residents are given greater responsibility and autonomy.
Finally, this could also mean that residents became grittier as they progressed through training. While it does appear that grit increases with age (on the order of decades) (Duckworth A. L. et al. 2007;Duckworth A. L. 2016), over shorter periods of time (year-to-year) grit appears to be stable . In a study of surgical residents, Salles, et al. found grit to be stable over multiple years of training (Salles et al. 2014), but in our sample Grit Scores were collected one to four years after training was complete, thus we do not know if grit changed either during residency or after.
One of the ultimate goals of this line of study is to assess if grit may be able to help predict which residents will do well in a training program. In the current literature on grit and residency training, there is some evidence that the Grit Survey may be used for this purpose, and specifically that low grit may be a marker for attrition. Burkhart, et al. studied general surgery trainee grit in relation to attrition and found that those with below median grit were twice as likely to consider leaving a training program (Burkhart et al. 2014). Of the three trainees who did leave their training program, all of them had below median grit (but the difference was non-significant due to the low attrition rate) (Burkhart et al. 2014). Another study in general surgery residents had similar findings -those with higher grit were less likely to consider leaving a program, but the true attrition rate was low and there was no significant correlation between leaving a program and grit (Salles et al. 2017). While higher grit has been found to be associated with higher levels of success in other fields, low grit may also be a predictor of poor resident performance or attrition (although attrition does not always result from poor performance).
We found that grit did not correlate with standardized exam performance taken either during medical school (the USMLE) or during residency (AKT). This is consistent with another study of grit in the medical field that showed no correlation between the grit of surgical residents and performance on the American Board of Surgery In-Training Examination (ABSITE) (Burkhart et al. 2014). In addition, having more grit is not necessarily indicative of having stronger "cognitive" abilities -usually identified with intelligence and the ability to solve abstract problems (as often measured by the IQ test and standardized tests) (Brunello and Schlotter 2011) -and in fact grit predicts success more strongly than measures of intelligence (Duckworth A. L. et al. 2007;. Given that overall resident performance encompasses many more aspects than merely cognitive skills like knowledge acquisition, it seems plausible that grit may correlate with measures of overall performance (such as faculty evaluations) and not test scores. This study has several limitations. In this pilot study, a primary limitation was that we had a relatively small sample size and thus, a low level of power to detect significant associations. And given the multiple comparisons, we used the more rigorous p-value cut-off of 0.01 to indicate statistical significance. As participation in this study was voluntary, selection bias is of concern -grit scores and residency performance may be different between those who chose to participate versus those that did not. We did not collect performance data of those not participating, thus we cannot know if performance differed between participants and non-participants.
The self-report nature of the grit questionnaire makes it susceptible to social desirability bias (the desire to "look good" (Gnambs and Kaspar 2017)), and questions on the Grit Survey are transparent. We attempted to minimize this effect by emphasizing the de-identified nature of the study with multiple safeguards in place to protect participants' privacy; however, it was up to each study participant to decide how honestly to answer each question. Despite these safeguards, other more objective measures of grit may be desirable. In one series of studies on grit it was found that family and friends could reliably assess a person's grit , and grit as scored by a blind rater based on résumé data correlated with teacher effectiveness (Robertson-Kraft and Duckworth 2014). Therefore, in future studies it may be possible to assess grit in a manner independent of a self-report score, which would be essential if a grit score is to be used for applicant screening.
We cannot rule out the possibility that grit changed over time between when the study subjects were applicants versus when the in-residency exams, faculty evaluations, and assessment of grit occurred. While grit does increase with life experience and age, it seems to be relatively stable over short periods of time , including during residency training (Salles et al. 2014), and an ongoing study by our group will assess this same question.
Finally, faculty evaluations have well-known limitations (Holmboe 2004). We hoped that by averaging all faculty evaluations each year, multiple evaluations by many faculty members would be captured and represent an accurate picture of resident clinical performance. Most residency training programs have now transitioned to Milestone evaluations, and it's unclear how these changes may affect future assessments of clinical performance and grit.

Conclusion
This pilot study showed a trend toward higher grit being associated with higher overall faculty evaluation of resident performance in their final year of training. This finding was non-significant and thus should be considered preliminary. There were no highly significant associations between grit and performance on standardized tests. Future larger scale studies are needed to confirm the associations of grit and overall performance in a larger cohort of trainees, to assess whether low grit can predict risk of poor performance, and to investigate if grit can be assessed by objectives means (i.e., not a self-report scale). Our ultimate goal is to find applicant characteristics predictive of future success in residency training and beyond.