White patients’ physical responses to healthcare treatments are influenced by provider race and gender

Significance Rapid changes in the US physician workforce to include more women and people of color likely have a range of effects on healthcare. This study illustrates how long-standing societal representations of providers (i.e., as White and men) can undermine patients’ treatment outcomes. Even when White patients’ overt attitudes toward Black and women providers were positive, we found that they were less physiologically responsive to the treatment administered by these providers. These results illustrate how notions of race and gender can influence patients beneath the surface—literally under the skin—despite their professed intentions and even to their own detriment.


SI Figures
: Components of the effectiveness of medical treatment. Fig. S2: Scatterplots depicting wheal size (in mm) by provider race and gender (NBlack_Women=27, NBlack_Men=28, NAsian_Women=34, NAsian_Men=36, NWhite_Women=30, NWhite_Men=32) over the timecourse of the study. Fig. S3: Participants did not detect more negative, non-verbal bias or greater patient discomfort when patients were interacting with providers of color as compared to White providers. Fig. S4. Participants rated patients as more comfortable when interacting with women providers than men providers, and they rated patients' non-verbal reactions to women providers as more positive.

SI Analyses
Results Omitting the Control Variable of Initial Reaction Size Exploratory Intersectional Analyses Analyses Examining Ratings of Provider Warmth and Competence Fig. S5: Ratings of provider competence did not differ by race, and patients rated White providers as less warm than Asian or Black providers. Fig. S6: Patients rated women providers as warmer and more competent than men providers.

SI Discussion Exploratory Analyses: Additional Details on Methodology and Results
Internal Motivation to Control Prejudice Fig. S7: Patients' internal motivation to control prejudice in a follow-up survey. SI References Appendix S1: Measures used to assess patient engagement with providers (NBlack_Women=27,NBlack_Men=28,NAsian_Women=34,NAsian_Men=36,NWhite_Women=30,NWhite_Men=32) over the timecourse of the study. Horizontal solid lines represent the mean of each group.

Fig. S3. Participants did not detect more negative, non-verbal bias or patient discomfort when patients were interacting with providers of color as compared to White providers.
Note. Error bars represent standard error of the mean. ns p>0.10, + p<0.10. Participants perceived patients' non-verbal reactions to Black providers as marginally significantly more positive than non-verbal reactions to White providers, B=0.31 [-0.02 S4. Participants rated patients as more comfortable when interacting with women providers than men providers, and they rated patients' non-verbal reactions to women providers as more positive.
Note. Error bars represent standard error of the mean. * p<0.05

Results Omitting the Control Variable of Initial Reaction Size
The analyses in the main manuscript controlled for initial wheal size in response to the skin prick test before the inert treatment was administered. This was done because research suggests that responses to skin prick tests can vary widely across individuals (2). However, the results in the main manuscript hold when this control is omitted. Results of analyses conducted in the same way as those in the main manuscript but omitting the control variable of initial reaction size remained significant and in the same direction as those reported in the main manuscript. These analyses conducted without the control variable of initial reaction size are reported below.
White patients were less responsive to the standardized treatment when women providers administered it, compared to men providers, FProviderGender*Time (1,184)

Exploratory Intersectional Analyses
Do provider race and gender intersect to affect patient outcomes? Our study was not originally powered to pose nuanced questions of intersectionality, but we nonetheless examined this question in exploratory analyses. Specifically, we examined whether there was a significant three-way interaction between provider race, provider gender, and the timepoint at which patients' wheal size was measured (i.e., T2 to T5). If the interaction were significant, it would indicate that the impact of provider race on treatment response differed depending on whether the provider was a man or a woman. The three-way interaction between provider race, provider gender, and timepoint of measurement was non-significant, F(2, 178)=0.56, p=0.57. This suggests that provider race had a similar effect on treatment response for patients of both men and women providers.
Given that the three-way interaction was non-significant, the lower-order interactions should be interpreted with caution; for the sake of thoroughness, however, we examined patterns among men and women providers separately. Breaking down the non-significant three-way interaction, among women providers, there was a significant two-way interaction when comparing Black women providers to White women providers, Taken together, these results suggest that although responsiveness to treatment from Black providers may have been weakest when those providers were also women, the pattern of results was similar for Black men providers. Both patients of Black women and Black men providers tended to respond less strongly to treatment over time relative to White and Asian providers (see Fig. S2).
Given research showing that gender concordance can impact the outcomes of patient-provider interactions, one might predict that patient gender could influence the results reported in the main manuscript in important ways (e.g., women patients interacting with a woman provider may be more responsive to the positive expectations set by these providers). We tested this possibility by conducting analyses that included an interaction between provider gender and patient gender. We also conducted analyses that included an interaction between provider race and patient gender, although we did not predict any differential responses to providers of different races/ethnicities based on patient gender. These analyses indicated that patient gender did not influence the results in meaningful ways. Below we describe the results of these analyses.
Patient gender did not interact with provider gender or provider race to impact wheal change from T2 to T5, absolute value of all t's<0.97, all p's>0.34 (i.e., all of the three-way interactions between provider characteristics, patient gender, and timepoint of wheal measurement were non-significant), thus indicating that men and women patients responded similarly to men and women providers, and that gender concordance did not affect results.
Given that the study was not originally powered to detect these nuanced questions of intersectionality and concordance, these results should be considered with caution. Future research is needed to further probe how provider race and gender, as well as patient race and gender, intersect to affect patient treatment outcomes.

Analyses Examining Ratings of Provider Warmth and Competence
Ratings of provider warmth and competence. We used provider race and gender as predictors and the same dummy codes as the analyses conducted on physiological data (i.e., change in wheal size from T2 to T5). As with the analyses of the physiological data, we controlled for patient gender in the analyses, though results do not differ when this control is omitted. Past literature on societal stereotypes about groups' warmth and competence (4), in light of research showing that provider warmth and competence can enhance placebo response (5), suggest that patients may respond less to treatment expectations set by Black and women providers because they perceive these providers as less warm and/or competent. Post-visit, patients rated provider warmth (7 items, α=.89, e.g., "The medical practitioner was friendly," "…made me feel at ease," and competence (11 items, α=.92, e.g., "The medical practitioner was intelligent", "…was skilled at the medical procedures") on 7-point scales (
Note. Error bars represent the standard error of the mean. ***p<0.001, **p<0.01 Provider recommendations. Patients reported how likely they would be to recommend the provider to a close friend or loved one (scale points: Definitely no, Probably no, Maybe, Probably yes, Definitely yes), and we dichotomized this variable to indicate whether participants said that they would at least probably recommend the provider. We conducted chi-square tests of independence to determine how provider race and gender affected recommendations. Results are similar when recommendations are analyzed as a continuous dependent variable using linear regression.
Data regarding patients' willingness to recommend these providers to others supported these more positive perceptions of providers of color and women providers. Patients were more likely to recommend the provider if she was a woman (64 out of 91 participants, 70.3%) than if he was a man (50 out of 96 participants, 52.1%), χ 2 (1)=6.54, p=0.011. Provider race did not predict recommendations, χ 2 (2)=3.41, p=0.182, but indicated that if anything, patients were more willing to recommend providers of color (for Black providers, 36 out of 55 participants, 65.5%; for Asian providers, 46 out of 70 participants, 65.7%) than White providers (32 out of 62 participants, 51.6%).
Patients also reported how likely they would be to recommend the provider to a stranger (scale points: Definitely no, Probably no, Maybe, Probably yes, Definitely yes). We dichotomized this variable to indicate whether participants said that they would at least probably recommend the provider. Chi-square analyses indicated that participants were more likely to recommend the provider if she was a woman (67 out of 91 participants, 73.6%) than if he was a man (55 out of 96 participants, 57.3%), χ 2 (1)=5.50, p=0.019. A chi-square based on provider race was non-significant, χ 2 (2)=0.55, p=0.758; participants were as equally willing to recommend providers of color (for Black providers, 35 out of 55 participants, 63.6%; for Asian providers, 48 out of 70 participants, 68.6%) as White providers (39 out of 62 participants, 62.9%).
In sum, patients rated Black and Asian providers as more warm than White providers, and women providers as both warmer and more competent than men providers. In addition, patients were more likely to recommend women providers over men and were just as likely to recommend providers of color as White providers. Thus, patients' ratings of provider warmth and competence do not explain why patients responded less to the expectations set by Black and women providers. There is an intriguing disconnect between patients' self-reported perceptions of the providers and their actual physical response to the providers' treatment, aligning with literature suggesting that people often do not report explicitly biased attitudes but may show evidence of bias in other ways (3).

Exploratory Analyses: Additional Details on Methodology and Results
Below, we include additional details about the methodology used in the exploratory analyses presented in the discussion section. We also include supplemental analyses involving only White participants on Amazon's Mechanical Turk (mTurk).
Patient non-verbal bias and anxiety. First, we created a pool of 90 videoclips of patients interacting with the healthcare providers during the experiment. Each participant watched a unique, randomly selected subset of six videoclips. Each participant watched one videoclip of a patient (i.e., the participants from the lab study) interacting with a healthcare provider of each race and gender (i.e., one Asian man, one Asian woman, one Black man, one Black woman, one White man, one White woman). Each videoclip was approximately 10 seconds long and silent and the researchers cropped out the healthcare provider so that participants were unaware of the healthcare provider's race and gender. Participants thus only saw silent videoclips of the patient's non-verbal behavior. Participants also reported the gender of the patient in the video and estimated the age of the patient in the video, as these factors might also affect interaction quality with providers.
The R package rmcorr was used to calculate repeated measures correlation coefficients, thus taking into account the fact that there were six measurements of these variables per participant. We used mixed-effects linear regression to predict ratings of comfort/anxiety and patient engagement with providers. We included race (two dummy codes omitting Whites as the base group) and gender (one dummy code omitting men as the base group) as predictors. We included random intercepts for participant and target (i.e., the particular patient that participants evaluated) to account for correlated responses across participants and across targets. We controlled for the gender and perceived age of the patient in the video (male/female) as well as the gender of the participant in the analyses; results are not affected if these controls are removed.
As a secondary prediction, we predicted that participants would rate patients as more uncomfortable in the clips from right after when the skin prick test was conducted than in the other two clips (initial entry, before skin prick test). This is because the skin prick test involved patients having their skin touched by providers, and we thought that this physical contact might enhance discomfort in cross-race interactions. Accordingly, the clips were taken either during the initial interaction (i.e., right as the provider entered the door of the exam room), right before skin prick test, or right after skin prick test. The timing of the clip was randomized across participants. We tested whether the timing of the interaction (initial entry of provider, before skin prick test, after skin prick test) moderated the effects, by including an interaction between this variable (dummy coded to omit clips that were filmed after the skin prick test) and provider race/gender as a fixed effect in our model. Clip timing did not moderate the effects, absolute value of all t's<1.50, all p's>0.139, and thus we do not discuss this variable further.
When analyses included only White participants on MTurk, they were similar to the analyses presented in the main manuscript. Patients did not appear to show greater non-verbal bias when interacting with Black providers, F(2, 83.22)=1.50, p=0.229. Provider race also did not predict participants' ratings of patient comfort, F(2, 81.42)=0.73, p=0.484. There was no indication of negative non-verbal bias when patients interacted with women providers either. In fact, participants rated patients' nonverbal reactions to women providers as more positive than their non-verbal reactions to men providers, B=0. 37 Patient engagement with providers. All of the measures of patient engagement in Appendix S1 correlated highly with one another (r's>0.60) except for one ("How preoccupied does the patient seem by interacting with this doctor?", r's<0.12). This question was thus omitted and all others were combined into a composite scale measuring social engagement with the providers (6 items, alpha = 0.95). Results are similar when each of these measures is examined separately.
When analyses included only White participants on MTurk, they were similar to the analyses presented in the main manuscript. Results suggested that White patients in the study did, in fact, engage more with providers of color than White providers, F (

Internal Motivation to Control Prejudice
In a follow-up survey conducted approximately one month after the end of the study, 140 of the White patients from the lab study completed an established measure of their internal motivation to control prejudice against racial minorities (6) (4 items, alpha = 0.76, e.g., "I attempt to act in non-prejudiced ways toward racial and ethnic minorities because it is personally important to me", 1 = not all motivated to 9 = extremely motivated). White patients were nearly at ceiling on this measure (on the 9point scale ranging from 1 = not all motivated to 9 = extremely motivated, participants' modal response was 9 and their mean response was 7.74). See Figure S7 below.