Association Between Web-Based Physician Ratings and Physician Disciplinary Convictions: Retrospective Observational Study

Background: Physician rating websites are commonly used by the public, yet the relationship between web-based physician ratings and health care quality is not well understood. Objective: The objective of our study was to use physician disciplinary convictions as an extreme marker for poor physician quality and to investigate whether disciplined physicians have lower ratings than nondisciplined matched controls. Methods: This was a retrospective national observational study of all disciplined physicians in Canada (751 physicians, 2000 to 2013). We searched ratings (2005-2015) from the country’s leading online physician rating website for this group, and for 751 matched controls according to gender, specialty, practice years, and location. We compared overall ratings (out of a score of 5) as well as mean ratings by the type of misconduct. We also compared ratings for each type of misconduct and punishment. Results: There were 62.7% (471/751) of convicted and disciplined physicians (cases) with web-based ratings and 64.6% (485/751) of nondisciplined physicians (controls) with ratings. Of 312 matched case-control pairs, disciplined physicians were rated lower than controls overall (3.62 vs 4.00; P<.001). Disciplined physicians had lower ratings for all types of misconduct and punishment—except for physicians disciplined for sexual offenses (n=90 pairs; 3.83 vs 3.86; P=.81). Sexual misconduct was the only category in which mean ratings for physicians were higher than those for other disciplined physicians (3.63 vs 3.35; P=.003) Conclusions: Physicians convicted for disciplinary misconduct generally had lower web-based ratings. Physicians convicted of sexual misconduct did not have lower ratings and were rated higher than other disciplined physicians. These findings may have future implications for the identification of physicians providing poor-quality care. (J Med Internet Res 2020;22(5):e16708) doi: 10.2196/16708


Background
The ability of patients to accurately evaluate health care quality is not well understood. Although some studies demonstrate an association between greater patient satisfaction and quality of care, others show either no relationship or even poorer outcomes with increased patient satisfaction [1][2][3][4][5][6][7][8][9][10][11][12]. Over the last decade, with the advent of physician rating websites such as healthgrades.com, ratemds.com, and vitals.com, a novel source of patient satisfaction data has emerged. Such websites have become popular forums for patients to evaluate and publicly share their health care experience.
Physician misconduct can be considered a reflection of poor quality care. Physicians are investigated, convicted, and disciplined by their professional associations for activities such as unprofessional behavior, sexual misconduct, failure to meet standards of care, fraud, abuse of drugs and alcohol, and negligence. Resultant penalties range from fines and mandatory education to license suspension and revocation. Although disciplinary proceedings are publicly posted by each province's physician regulatory college, at the time of a clinical encounter, patients are often unaware of a physician's disciplinary history.

Objectives
We were interested in whether or not physicians who have been convicted and punished for misconduct are rated differently than nondisciplined physician controls. We hypothesized that, for many types of misconduct, patients would accurately recognize poor-quality physicians and felt that, overall, disciplined physicians would have lower web-based ratings than controls. We also sought to determine whether ratings were consistently lower across all types of misconduct, and we hypothesized that associations between ratings and discipline would differ depending on the type of misconduct.

Cases and Controls
We employed a nested case-control design and matched each disciplined physician (cases) with a nondisciplined counterpart (controls) according to specialty, gender, town of listed practice, and years in medical practice (within 5 years). We developed a group of nondisciplined physician controls by searching provincial physician regulatory college websites for each disciplined physician and narrowing our search terms by the abovementioned criteria. In certain instances (ie, 2 provinces), if after controlling for the 4 matching criteria, multiple physician matches were possible, a physician was chosen at random. In total, 751 disciplined physicians were matched with 751 nondisciplined controls. A nondisciplined control was found for every disciplined physician (Table 1).

Physician Ratings Data
RateMDs.com is a publicly accessible physician rating website founded in the United States in 2004. Since its launch in Canada (2005), it is the country's leading physician rating website and one of the most popular physician rating websites in North America [41,43]. As of 2013, RateMDs.com included more than 640,000 ratings of over 57,000 unique physicians in Canada [29]. No registration or subscription is required to view or submit a rating, and there are no monetary reimbursement or other incentives to rate a physician. Physicians are rated on staff (typically front office staff), punctuality, helpfulness, and knowledge, on a scale of 1 to 5 (1=terrible, 2=poor, 3=okay, 4=good, and 5=excellent). Raters may provide text comments if desired. It must be noted that RateMDs.com does not provide disciplinary information. We reviewed all disciplined and nondisciplined control physicians on this website and recorded rating scores. Data collection took place between approximately May 2014 and September 2014, with data cleaning and quality control performed by a second party in July 2015.

Creation of the Dataset
We paired 751 disciplined physicians with 751 nondisciplined matched controls and collected information from rateMDs.com for each physician. As not all physicians were rated, this resulted in 4 groups: disciplined rated cases, disciplined unrated, control rated, and control unrated. When considering pairs of cases and controls who both had web-based ratings, our dataset included 312 physician pairs ( Figure 1). We used only matched pairs for analysis and performed analyses when there were more than 50 case-control matched pairs. We also grouped disciplined physicians according to types of misconduct and punishments. The number of matched pairs available for testing varied from 2 to 254 pairs available for comparison ( Table 2).

Analysis 1: Comparison of Disciplined Versus Nondisciplined Physicians (Matched Analysis)
To compare ratings between disciplined and nondisciplined physicians, we computed an overall average rating for each physician using the mean of the available rating categories, then calculating an overall weighted mean. Generalized estimating equations (GEEs) were used to estimate the average rating by group (disciplined vs nondisciplined), and GEES were used for each type of misconduct or penalty. GEEs were selected to account for the matched study design. We felt it was appropriate to select GEEs over nonparametric testing, given that there were sufficiently large (eg, 451) distinct average ratings, and therefore, it could treat ordinal data similar to continuous data. This analysis allowed us to report 95% CIs for the estimated group means and provide a sense of the precision of the estimates in addition to significance testing. We reported the estimated mean ratings by group, 95% CIs for these estimates, and Wald P value against the null hypothesis (no group difference). An α of .05 was used as the threshold for statistical significance. Analyses were performed using the geepack package in R version 3.0.3 (R Foundation).

Analysis 2: Comparison of Physicians Disciplined for a Specific Type of Misconduct/Punishment Versus the Rest of the Disciplined Physicians Cohort
Recognizing that the severity of physician misconduct and punishment is variable (eg, ranging from substandard recordkeeping to more egregious offenses such as sexual misconduct), we compared physician ratings for specific disciplinary offenses with those of the at large disciplined physicians cohort. Mixed effects models were used for analyses of ratings among disciplined physicians, considering each physician's overall average web-based rating and category-specific ratings as outcomes. The presence of each type of misconduct/punishment in a physician's discipline record was used as a binary predictor. Gender, year of offense, province, professional years, and IMG status were included as fixed effects, and physician specialty was included as a random effect. The estimates reflect the mean centering of the year of offense and professional relative to the rest of the disciplined cohort. We report the estimated mean ratings by group, 95% CIs for these estimates, and Wald P value against the null hypothesis (no group difference). An α of .05 is used as the threshold for statistical significance. Analyses were performed using the nlme package in R version 3.0.3.

Sensitivity Analysis
To assess the degree to which physicians with a low overall number of ratings (ie, <5 or <10 ratings) influenced our overall results, we performed additional testing on both analyses 1 and 2 by excluding instances in which physicians had (1) less than 5 overall ratings and (b) less than 10 overall ratings.

Disciplined Physicians Versus Nondisciplined Physicians: Matched Analysis
We paired 751 disciplined physicians with 751 nondisciplined matched controls. Of the 751 disciplined physicians, 37.3% (280/751) did not have any web-based ratings, whereas 62.7% (471/751) had at least one rating. Of the 751 nondisciplined physician controls, 64.6% (485/751) had at least one rating, whereas 35.4% (266/751) were not rated online. When comparing rated, but unmatched, physicians, 21.1% (159/751) were disciplined, rated, but unmatched compared with 23.0% (173/751) nondisciplined, rated, but unmatched. When considering pairs of cases and controls who both had ratings, our dataset included 312 physician pairs (Figure 1). When we grouped disciplined physicians according to the types of misconduct and punishments, the number of matched pairs available varied, ranging from 2 to 254 available pairs ( Table  2).
When we compared the 312 pairs of convicted and disciplined physicians with nondisciplined controls, disciplined physicians were rated lower than nondisciplined physicians for all offenses and punishments (mean rating 3.62, SD 0.82 vs mean rating 4.00, SD 0.75; P<.001). When comparing rated, but unmatched, physicians, disciplined unmatched physicians had even lower ratings than nondisciplined unmatched physicians (mean 3.42, SD 0.98 vs mean 3.91, SD 0.82). As 12.5% (94/751) of our disciplined physicians cohort had more than one disciplinary conviction during our study period, we also looked at this group of repeat offenders. Of the 94 disciplined physicians who were repeat offenders, approximately half were available for case-control analysis, as 44 disciplined physicians were appropriately matched to a case-control where both groups had ratings. Disciplined repeat offenders had mean ratings that were also lower than controls (mean 3.44, SD 4.09 vs mean 4.00, SD 0.81; P<.01).
The mean rating for disciplined physicians was lower than that for nondisciplined physician-matched controls for the following types of misconduct  Table 2).

Comparison of Specific Type of Misconduct/Punishment Versus the Rest of the Disciplined Physicians Cohort
Sexual misconduct was the only category of misconduct in which mean ratings for this group of physicians were higher than those for other disciplined physicians. Moreover, 62.7% (471/751) disciplined physicians who were rated online, 219 were disciplined for sexual misconduct. The overall mean rating of physicians disciplined for sexual misconduct was higher than that of all other disciplined physicians (3. P=.04). For punishments, suspension was the only type of punishment in which this group of disciplined physicians was rated higher than all other disciplined physicians (3.54 vs 3.33; P=.023); however, this result did not remain robust when physicians with less than 10 ratings were excluded from our sensitivity analysis. For all other types of misconduct and punishments, no overall mean rating differences existed compared with all other disciplined physicians (Multimedia Appendix 1).

Sensitivity Analysis
Of 312 cases and 312 controls, 48 case physicians and 68 control physicians had less than 5 ratings. Similarly, 84 case physicians and 117 control physicians had less than 10 ratings. To assess whether such ratings influenced our main results, we performed sensitivity analyses by excluding cases in which physicians had (1) less than 5 and (2) less than 10 ratings. Our main results remained robust (Multimedia Appendices 2 and 3). When we excluded physicians with few ratings, our finding that disciplined physicians had lower overall mean ratings did not change (<5 ratings: 3.61 vs 4.01; P<.001 and <10 ratings: 3.52 vs 4.01; P<.001). When broken down by type of misconduct and punishment, results also remained robust-that is, disciplined physicians had lower ratings than nondisciplined case-controls, with the exception of sexual misconduct (<5 ratings and sexual misconduct: 3.89 vs 3.94; P=.69 and <10 ratings and sexual misconduct: 3.79 vs 3.92; P=0. 36).
Similarly, when comparing physicians disciplined for types of misconduct with all other disciplined physicians, all results remained robust, with 2 minor exceptions. Ratings for physicians whose licenses were suspended no longer differed from all other disciplined physicians, nor did the ratings for physicians who were punished with a formal reprimand (Multimedia Appendix 3). All other results remained consistent after sensitivity analyses.

Principal Findings
Our study used a national dataset of all disciplined physicians and collected their available online ratings from rateMDs.com over a 10-year period. Of over 750 matched physician pairs, 63.6% (956/1502) physicians are rated online. For most types of misconduct, disciplined physicians are rated lower than nondisciplined controls. However, physicians disciplined for sexual misconduct were not rated differently than controls and, in fact, were rated higher when compared with all other disciplined physicians, a directional relationship that was not found with any other type of misconduct.

Comparison With Prior Work
Our results are in general agreement with other studies that show that physicians are, overall, rated positively [13,20,21,29]. Our findings are also consistent with data showing lower online ratings for physicians on probation for many types of misconduct, but not sexual offenses [34]. There may be something unique about physicians who commit sexual misconduct that distinguishes them from other convicted physicians, at least with respect to online ratings.
We found that online raters discerned a difference between disciplined and nondisciplined physicians with respect to online ratings overall; however, interestingly, sexual misconduct was the only category in which this effect was not seen. Furthermore, we found that physicians who were disciplined for sexual misconduct are rated more favorably than the rest of the disciplined physician cohort. Again, sexual misconduct was the only category of misconduct in which mean ratings were higher than all other disciplined physicians.
Our findings related to sexual offense convictions are consistent with previous findings. Only a handful of studies have compared sexual offender physicians with other physicians; however, it has been reported that some antisocial personality traits were unique to psychiatrists who were subsequently convicted of sexual boundary violations and that these characteristics were identifiable early in training [44][45][46][47].
This study adds to the body of literature on online physician ratings and extends current knowledge to include extremes of poor quality (ie, physician disciplinary convictions). This is the first study to combine 2 large, comprehensive national databases of physician discipline and web-based physician ratings over a 13-year period, using a rigorous matched control approach. We highlight the heterogeneity of disciplined physicians as a group and are among the first to identify this finding in physician sexual offenders. Although the majority of low-rated physicians are not disciplined and they are not sexual offenders, we feel that the potential for patient harm is sufficient enough in such cases to warrant further investigation of this group of disciplined physicians. Future studies could focus on predicting or developing interventions to prevent patient harm.

Limitations
We recognize several limitations. First, our study assumes that disciplined physicians, as a group, are poor-quality physicians.
Although not perfectly synonymous, these physicians have been convicted by their professional colleges for conduct that is substandard, inappropriate, or morally not in line with professional standards. As such, this is an excellent surrogate for poor quality. Second, we cannot exclude that publicly posted ratings may, themselves, influence future ratings. Although we considered censoring ratings after a particular disciplinary proceeding became a public record, we felt a time-based analysis would decrease the number of ratings in our analysis, with no clear added benefit against potential bias. Moreover, the uncertainty of whether the rater had advance knowledge of the physician would remain, as it would be difficult to ascertain whether raters were influenced by other sources (eg, popular media attention). Interestingly, we found that physicians who were disciplined for sexual misconduct (the misconduct category frequently reported in the media) were rated no differently than controls. In fact, they were rated higher when compared with the rest of disciplined physicians, making us more likely to accept our findings. Although occasionally there was a mention of misconduct in the comments, we estimated this to reflect a small proportion (ie, <5%) of all comments. Moreover, we would argue that for our research question, timing may be less relevant, that is, a physician disciplined in 2000 and reviewed in 2005 versus a physician who was reviewed in 2000 and disciplined in 2005 are both relevant enough to merit consideration.
Third, although we used a stringent matching process, in smaller centers, it was not possible to match by subspecialty for 5 physicians. In this case, we matched as closely as possible (ie, we matched surgeons with another surgeon rather than, eg, a psychiatrist). This represented less than 1% of cases. Fourth, as not all physicians are rated on websites, data may not be generalizable. However, 63.6% (956/1502) of physicians had an online presence, which is much higher than in previous studies [13,15,31], and when we analyzed data from disciplined, unmatched physicians, overall demographics and mean ratings did not substantially differ. In fact, ratings of unmatched, disciplined physicians were lower than unmatched, undisciplined physicians. We also considered external validity concerns in potential comparisons between the 60% of physicians who are rated online versus those who are unrated. However, because our physician control group was hand selected to resemble the disciplined physician group, and not representative of the general population, such comparisons would not be particularly useful; therefore, we specifically refrained from making such direct comparisons between rated and unrated physicians. Finally, rating website users may be different with respect to access to a computer and inclination to post online ratings. However, this is an issue germane to all online ratings. Taken together, we feel that these limitations would not significantly alter our conclusions.

Conclusions
Disciplined physicians are rated lower than control physicians by those who rate their physicians online, in keeping with the hypothesis that patients can accurately appraise health care quality. However, any ability to ascertain quality becomes more difficult for physicians disciplined for sexual misconduct. Our findings suggest that this group of physicians deserves further investigation to better understand why they would be rated more favorably than all other disciplined physicians. Our research may have implications for the identification of at-risk physicians to develop interventions before patient harm can occur [48].