Assessing medical student empathy in a family medicine clinical test: validity of the CARE measure

Introduction The Consultation and Relational Empathy (CARE) measure developed and validated in primary care settings and used for general practitioner appraisal is a 10-item instrument used by patients to assess doctors’ empathy. The aim of this study is to investigate the validity of the CARE measure in assessing medical students’ empathy during a formative family medicine clinical test. Method All 158 final-year medical students were assessed by trained simulated patients (SPs) – who completed the CARE measure, the Jefferson Scale of Patient Perceptions of Physician Empathy (JSPPPE), and a global rating score to assess students’ empathy and history-taking ability. Results Exploratory and confirmatory factor analysis identified a unidimensional structure. The CARE measure strongly correlated with both convergent measures: global rating (ρ=0.79 and <0.001) and JSPPPE (ρ=0.77 and <0.001) and weakly correlated with the divergent measure: history-taking score (ρ=0.28 and <0.001). Internal consistency was excellent (Cronbach’s α=0.94). Conclusion The CARE measure had strong construct and internal reliability in a formative, undergraduate family medicine examination. Its role in higher stakes examinations and other educational settings should be explored.

t the heart of a meaningful doctorÁpatient relationship is empathy (1). More than an expression of sympathy or a character trait, empathy in a clinical setting is a multifaceted concept. It includes emotive, moral, cognitive, and behavioral components (2) that can be articulated as a professional skill or competency Á which in turn, can be learned, demonstrated, and assessed. It has a direct, positive impact on the quality of patient care (3) in terms of patient and doctor satisfaction, patient enablement, and possibly health outcomes (4).
Given its recognized importance in patient care, nurturing empathy from the earliest stages of medical training has been widely advocated, and the Association of American Medical Colleges has recommended that empathy be an essential objective in undergraduate education (5). One of the key aims of the undergraduate medical curriculum at the University of Hong Kong (HKU) is to develop students who will be able to 'engage in productive, empathic relationships with patients, and display effective communication skills' (6). Indeed, researchers have found that medical student empathy predicts future doctorÁ patient empathy, underlining the importance of cultivating the development of empathy in medical students during their training (7).
In terms of expected competencies within family medicine, medical students in their final year of undergraduate medical education at HKU are expected to be able to properly conduct a primary care consultation. This includes acquiring relevant information, generating diagnoses, and negotiating a management plan Á all using a humanistic, patient-centered approach. Empathy is a central element in the patient-centered approach and key to the development of a therapeutic doctorÁpatient relationship. Since it is ultimately the patient's perception which

Medical Education Online ae
Medical Education Online 2015. # 2015 Julie Y. Chen et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for any purpose, even commercially, provided the original work is properly cited and states its license. determines the success and effectiveness of the clinical relationship, patients' perception of empathy is highly relevant.
On this premise, a patient-centered measure of empathy tailored to a primary care setting was developed in the UK. Known as the Consultation and Relational Empathy (CARE) measure, this 10-item questionnaire was designed to capture the set of physician competencies perceived by patients as important in holistic and empathic care (8). It has been subsequently validated in primary care settings in the both the UK (9) and Hong Kong (10) and is capable of distinguishing between doctors' interpersonal competencies (11). In the UK, it also plays a role in quality assurance and training, where it is used for workplace appraisal and training of general practitioners (12).
Other measures of empathy, mostly general self-report instruments (e.g., the Interpersonal Reactivity Index (13), the Empathy Scale (14), the Emotional Empathy Scale (15)), have been used in a research context. The Jefferson Scale of Patient Perceptions of Physician Empathy (JSPPPE) (16) is a generic scale, which has been used in medical education Á but is not specifically designed for primary care. As our focus is on clinical consultations conducted within a family medicine framework, the CARE measure is a more fit-for-purpose instrument which, if valid, may be a useful assessment tool in identifying deficiencies in medical students' relational empathy, as perceived by their future patients.
The aim of this study, then, is to establish the validity of the CARE measure in assessing the empathy of finalyear medical students during a formative family medicine clinical competency assessment.

Subjects and setting
All final year medical students taking the formative family medicine clinical competency test (CCT) in 2013 comprised the target population. Administered at the end of each of six annual family medicine rotations, the CCT requires students to conduct a 15-min consultation with a simulated patient (SP) in the presence of an examiner. Every SP is trained to assess students on their interpersonal skills and empathy, and to assess students' acquisition of key history-taking information using a case-based checklist.
SP training sessions were conducted prior to each CCT, and the content of the CARE measure was reviewed to ensure SPs understood each element they were required to assess. The SPs were encouraged to respond according to how the student actually made them feel during the consultation.
A total of nine SPs (three males and six females) assessed 8Á10 students across 1Á4 clinical rotations; all SPs were 20Á30 years of age and of Chinese descent. Different SPs were used depending on the gender requirement for the case and/or SP availability.
All cases were structured similarly and based on a common complaint encountered in family practice (e.g., cough, headache, and palpitations) Á requiring students to identify and address (in a management plan) a biopsychosocial problem list. Although some scenarios were more conducive to showing empathy, elements of the CARE measure (e.g., Does the student make you feel at ease? Does the student really listen to you? Does the student explain things clearly?) pertained to general interpersonal skills required of any consultation.
Written informed consent was obtained from students prior to the CCT, permitting the use of their assessment scores in the study.

Study instrument
The CARE measure is a 10-item consultation process measure shown to produce valid scores of patients' perceptions of relational empathy in primary care contexts (9). A 5-point Likert scale ranging from 1 (poor) to 5 (excellent) is used to rate each item, which are summed into a total score ranging from 10 to 50. Missing values were handled as recommended in the guidance notes on the scoring (12). Two or fewer missing values and 'not applicable' responses were replaced with the average score for the remaining items in that individual's questionnaire. Questionnaires with more than two missing responses were excluded from the analysis.

Comparison instruments
The global rating of empathy is a single question, which asked patients to give their overall impression of the student's empathy, interpersonal connection, and attitude on a 5-point Likert scale. This item is based on a global rating scale for empathy, which has been used to assess physician empathy in the domains of patient connectedness Á allowing patients' sharing of feelings and perspective and showing of empathic expression (17). A similar summated global rating of senior medical student performance in the domains of empathy, coherence, and verbal/non-verbal expression has been shown to have good psychometric properties in an objective structured clinical examination (OSCE) setting (18).
The JSPPPE is a 5-item scale rated on a 7-point Likert scale describing empathetic engagement of the physician as perceived by patients. Its use in medical education has been supported by psychometric evidence in studies involving post-graduate medical trainees (16). It significantly correlates with patients' satisfaction, interpersonal trust, and adherence to physicians' recommendations (19) Á and has also been used in a US medical school to assess empathy during a third-year OSCE (20).
A 10-item history-taking checklist documented student's elicitation of key clinically relevant information from the SP. These items reflect solely factual information and are unrelated to interpersonal skills or empathy. Checklists completed by SPs or other observers have been useful in assessing history-taking and other domains in the realm of general medical practice (21).

Ethics approval
Ethical approval of this study was granted by the Institutional Review Board of the University of Hong Kong/ Hospital Authority Hong Kong West Cluster (Reference No.: UW 12-102).

Data analysis
To identify potential floor or ceiling effects in the CARE measure, the proportions of students receiving the minimum and maximum possible scores were calculated to see if either exceeded 15% (22).
Using Spearman rank order correlation coefficients, construct validity of the CARE measure was established via its relationship to: 1) the JSPPPE and global empathy rating (convergent validity) and 2) the history-taking checklist (divergent validity). Convergent validity was supported if the CARE measure, the global empathy rating, and the JSPPPE scores were moderately to highly correlated (r ]0.3). Divergent validity was supported if the CARE measure was only correlated weakly (r B0.3) with the history-taking checklist score.
Exploratory factory analysis (EFA) utilizing a principal components method with Varimax rotation was used to establish the underlying factor structure of the CARE measure, and to compute the factor Eigenvalues and individual factor loadings. Factor loadings ]0.5 reflected items' correlation with a factor, while items which loaded B0.5 or loaded on multiple factors (i.e., crossloaded) were removed from further investigation. Eigenvalues describe the amount of variance attributable to each factor; factors with eigenvalues of !1 were retained in the structure (23).
Confirmatory factor analysis (CFA) was performed to further examine the construct validity of the factor structure proposed by the EFA and the one-factor solution of the original (UK) version of CARE measure (9). Polychoric correlations measured the ordinal association between item scores, and maximum likelihood estimation explored the factor loadings and variance explained by one-factor solution. A chi-square test (24), goodness-of-fit index (GFI) (25), adjusted goodness-of-fit index (AGFI) (25), root mean square error of approximation (RMSEA) (26), and comparative fit index (CFI) were used to assess the model goodness-of-fit, which was considered adequate if: 1) chi-square test (p]0.05); 2) RMSEA 50.08; 3) GFI]0.90; 4) AGFI]0.80; and 5) CFI ]0.95 (27).
For factor analysis, the sample was split into two subsamples comprising only of cases with complete data (no missing responses). Data from rotations 1Á3 and 4Á6 were used EFA and CFA (respectively) to identify subscales. Cronbach's a coefficient was used to determine each subscales' internal consistency relative to the expected standard of ]0.7 (28). The effect of imputed data substitutions (missing values) on internal consistency was undertaken in a sensitivity analysis.
Both the EFA and CFA were performed using LISREL 8.80 (Scientific Software International, Inc., Lincolnwood, IL, USA), while other statistical analyses were performed using IBM SPSS Window 21.0 program (SPSS, Inc., Chicago, IL, USA).

Results
Of the 158 study subjects, 97 (61.4%) were male and ranged in age from 22 to 37 (median 024). Based on the six rotations of 2013 CCT examinations, the mean CARE measure score was 35.8 out of a possible 50. No floor or ceiling effects were observed. Descriptive, univariate statistics of key variables are shown in Table 1.

Exploratory and confirmatory factor analysis
The validity of our data was first confirmed using EFA where the KaiserÁMeyerÁOlkin measure of 0.94 and Bartlett's test of sphericity [x 2 (45) 0887.8, pB0.001] confirmed the sampling adequacy and variability. Using a principal components analysis, a one-factor solution was shown to explain 77.6% of the total variance. All 10 items loaded significantly on this single factor.
Based on conventional guidelines, a CFA found that this one-factor model met the criteria demonstrating excellent goodness of fit (RMSEA 00.06; GFI00.89; AGFI 00.83; CFI00.99). The null hypothesis of chisquare test was rejected (x 2 046.72; p00.09), suggesting an adequate fit of the data with the one-factor model. EFA and CFA loading are shown in Table 2.

Convergent and divergent validity
Patients' total CARE measure scores were strongly positively correlated with both their global empathy rating (r00.79 and B0.001) and the JSPPPE scores (r 00.77 and B0.001), but only weakly associated with the historytaking score (r00.28 and B0.001). This is shown in Table 3.

Internal consistency
Internal consistency of the 10-item CARE measure was excellent, as evidenced by the Cronbach's a of 0.94. A sensitivity analysis of mean substitutions of missing data yielded only a miniscule increase in internal consistency (a00.95).

Discussion
The CARE measure is a widely used means of assessing primary care doctors' relational empathy during a consultation, from the patient's perspective. In this study, we extrapolated its validity to include medical students' consultations in an undergraduate family medicine setting Á showing that the CARE measure retained its original unidimensional structure (9), excellent internal consistency, and had good convergent and divergent validity. These findings bring the patient perspective squarely into medical educational assessment and should encourage more objective and standardized assessment of a complex attribute, empathy, in a formative (low-stake), family medicine context.
As validated in this context, the CARE measure may have some educational benefits over shorter measures like the 1-item global empathy rating or the 5-item JSPPPE. Firstly, with 10 items, the CARE measure expands a complex concept into a set of concrete, practical elements that are clearly understood by students. Smaller components enable students to focus on particular aspects of the whole, analogous to learning a complex skill through microskill acquisition. Secondly, items better articulate the interpersonal skills needed by primary care doctors, so its applicability in primary care would be an advantage in teaching consultations in family medicine and other primarycare-oriented settings. Similar to some instruments used to measure healthcare outcomes Á shorter, generic measures may not have the sensitivity to capture small differences or may be less responsive to capturing changes over time in a specified population (29). Used formatively, where the focus is to help students learn and improve, the CARE measure can serve as a guiding rubric that represents the essential elements desired in a primary care consultation. This may be used for benchmarking and for generating student feedback to help identify specific clinical strengths and weaknesses.
Furthermore, the absence of a floor or ceiling effect in this context may make this instrument sensitive enough to differentiate among students' performance. In contrast, when used in doctorÁpatient or therapistÁpatient settings, CARE measure scores tended toward the higher end of the distribution Á with more than a quarter of targets receiving the maximum score (9,30). Real patients are likely to voluntarily seek out and establish relationships with doctors they find 'acceptable' and whom they may already know well. For students, this is a required interaction that represents a one-off visit. As well, SPs in an undergraduate exam setting may recognize the 'developmental' limitations of students, and hence refrain from awarding them the maximum score.
The excellent internal consistency of the CARE measure found in this study provides some preliminary evidence for its reliability. However, assessments of the same student by multiple examiners or over time would offer additional support of its reliability. In the primary care setting, it has been suggested that 50 completed assessments by patients using the CARE measure are required to reliably assess doctors' empathy (8), which would be impossible or impractical in most educational settings.
The value of assessment for learning (as opposed to assessment of learning) has been advocated in the learning of clinical competencies in medical education (31), and students' relational empathy may be best developed and improved if assessed in the same way. The CARE measure provides a valid way in which students can be assessed and learn to improve their relational empathy. This, combined with qualitative feedback from peers/supervisors and selfreflection, can provide a more solid indication of students' acquisition of a core clinical consultation skill.

Strengths and limitations
An adequate and appropriate sample, as well as the use of external measures to establish convergent and divergent validity, is among the strengths of this study.
The main limitation relates to the unknown generalizability of the findings to other educational settings or activities. Even though our study included a low-stakes, formative emphasis, both students and patients may have behaved differently than those within a more realistic clinical setting. In addition, our study was conducted in a specific setting, in one curriculum, and at one institution, which necessitates further study to examine validity issues in other educational settings. Finally, although the internal consistency of the CARE measure was established, further psychometric examination in terms of testÁretest and interrater reliability would greatly strengthen our findings.

Conclusion
The CARE measure was shown to have strong construct validity and excellent internal consistency in a formative,  undergraduate family medicine examination. It also has some discriminatory potential in this context due to the absence of floor or ceiling effects and the ability of SPs to complete the measure under exam conditions. This study demonstrated that the CARE measure can be a useful tool to assess and generate feedback to students on specific interpersonal elements of the consultation Á bringing patients' perspective into the realm of primary care consultation. Further work is needed to explore its role in higher stakes clinical examinations and other educational settings.