An Analysis of Online Evaluations on a Physician Rating Website: Evidence From a German Public Reporting Instrument

Background: Physician rating websites (PRW) have been gaining in popularity among patients who are seeking a physician. However, little evidence is available on the number, distribution, or trend of evaluations on PRWs. Furthermore, there is no published evidence available that analyzes the characteristics of the patients who provide ratings on PRWs. Objective: The objective of the study was to analyze all physician evaluations that were posted on the German PRW, jameda, in 2012. Methods: Data from the German PRW, jameda, from 2012 were analyzed and contained 127,192 ratings of 53,585 physicians from 107,148 patients. Information included medical specialty and gender of the physician, age, gender, and health insurance status of the patient, as well as the results of the physician ratings. Statistical analysis was carried out using the median test and Kendall Tau-b test. Results: Thirty-seven percent of all German physicians were rated on jameda in 2012. Nearly half of those physicians were rated once, and less than 2% were rated more than ten times (mean number of ratings 2.37, SD 3.17). About one third of all rated physicians were female. Rating patients were mostly female (60%), between 30-50 years (51%) and covered by Statutory Health Insurance (83%). A mean of 1.19 evaluations per patient could be calculated (SD 0.778). Most of the rated medical specialties were orthopedists, dermatologists, and gynecologists. Two thirds of all ratings could be assigned to the best category, “very good”. Female physicians had significantly better ratings than did their male colleagues (P<.001). Additionally, significant rating differences existed between medical specialties (P<.001). It could further be shown that older patients gave better ratings than did their younger counterparts (P<.001). The same was true for patients covered by private health insurance; they gave more favorable evaluations than did patients covered by statutory health insurance (P<.001). No significant rating differences could be detected between female and male patients (P=.505). The likelihood of a good rating was shown to increase with a rising number of both physician and patient ratings. Conclusions: Our findings are mostly in line with those published for PRWs from the United States. It could be shown that most of the ratings were positive, and differences existed regarding sociodemographic characteristics of both physicians and patients. An increase in the usage of PRWs might contribute to reducing the lack of publicly available information on physician quality. However, it remains unclear whether PRWs have the potential to reflect the quality of care offered by individual health care providers. Further research should assess in more detail the motivation of patients who rate their physicians online. (J Med Internet Res 2013;15(8):e157) doi: 10.2196/jmir.2655


Introduction
In many health care systems, quality of care improvement strategies have been implemented over the last few years [1]; nevertheless, quality deficits still remain [2][3][4]. Several studies have further shown remarkable variability in quality of care across health care providers [1,[5][6][7]. However, patients are not likely to be generally aware of existing quality differences [8,9]. One reason for this is the limited amount of publicly reported information on the quality of health care providers [10].
It has become a major challenge to remedy this deficiency by improving transparency about the quality of health care providers [10,11]. This is supposed to increase overall quality by steering patients to better performing health care providers [12,13] and by motivating providers to make quality improvements [9,14]. Therefore, public reporting (PR) instruments have been put in place in many countries [15][16][17][18][19][20][21][22]. These instruments generally assess the quality of care by measuring adherence to clinical guidelines and by providing additional structural information [11]. However, patients have been slow to take advantage of these comparative reports in making their health care provider choices [9]. Possible reasons for this might be found in the fact that patients are not aware of the information, do not understand it, do not believe it, or are unwilling or unable to use the information provided [23].
The newest trend in the PR movement is the use of physician rating websites (PRWs) [24]. The primary objective of these websites lies in rating and discussing physician quality online by using user-generated data [25,26]. Although the usefulness of PRWs has been seen critically from a scientific point of view [24], their popularity among patients has been increasing [24,27,28]. In contrast to traditional PR instruments, PRWs might have the advantage that the information can be more easily understood by patients. While traditional instruments report on measures such as the administration of beta blockers or angiotensin-converting enzyme inhibitors, which require a higher level of clinical knowledge than most patients have [8], PRWs concentrate on measuring patient satisfaction [24].
Although there is a vast amount of evidence regarding traditional PR instruments, little research has addressed PRWs [25]. A recently conducted systematic review has identified 9 articles published in peer-reviewed journals [25]. In them, the number, distribution, and trend of the evaluations on PRWs were investigated [11,[27][28][29][30][31][32][33][34]. Most of the investigations evaluated ratings for a (non)random sample of physicians, while 1 study assessed over 386,000 national ratings from 2005 to 2010 from the US PRW, RateMDs. Furthermore, there is no published evidence available that analyzes the characteristics of the patients who provide ratings.
In this context, this paper adds to the literature by presenting an analysis of all physician evaluations posted on the German PRW, jameda, in 2012. Thereby, we provide descriptive analysis of (1) both physician and patient characteristics, and (2) the number, distribution, and results of the ratings. Analytical analyses were applied to assess (3) the impact of physician and patient characteristics on the overall performance measure, and (4) the correlation between the number of ratings per patient/physician and the overall performance.

Analysis of Jameda
This paper presents an analysis of all 127,192 physician evaluations that were posted on the German PRW, jameda, in 2012. In total, 107,148 patients completed evaluations on 53,585 physicians. The dataset contained the following information: the medical specialty and gender of the physician, as well as the gender, age, and health insurance status of the patient. Additionally, the results of the physician ratings for all mandatory and optional questions were included. The mandatory physician rating system on jameda consists of 5 questions, rated according to the grading system in German schools on a 1-6 scale (1=very good; 2=good; 3=satisfactory; 4 =fair; 5=deficient; and 6=insufficient) [35]. These relate to (Q1) satisfaction with the treatment offered by the physician, (Q2) education about the illness and treatment, (Q3) the relationship of trust with the physician, (Q4) the time the physician spent on the patient´s concerns, and (Q5) the friendliness of the physician. A mean score ("overall performance") is calculated, based on the results of these 5 questions. Beyond that, a narrative commentary has to be given and 13 optional questions are available for answering (these are not addressed in this paper) [36].
We focused on jameda because it is likely to play the most significant role in the German PRW movement for the following reasons: (1) from a patient's perspective, jameda is the PRW to which a patient is most likely to be referred [24,31], (2) jameda is ranked highest in traffic among German PRWs [34], and (3) among German PRWs, jameda has been shown to contain the largest number of ratings, so far [37].

Statistical Analysis
All statistical analyses were conducted using SPSS 21.0 (SPSS for Windows, version 21.0). The median test was used for nonparametric data of groups with different distributions. The Kendall Tau-b test was used to analyze specific correlations. Differences were considered to be significant if P<.05 and highly significant if P<.001.

Number and Distribution of Ratings
In total, 127,192 ratings of 53,585 physicians from 107,148 patients were posted on the PRW, jameda, in 2012. The German outpatient sector consists of approximately 146,000 physicians [38]; thus, 37% were rated in 2012. As displayed in Table 1, about one third of all rated physicians were female (34.1%). The rating patients were mostly female (60%), between 30-50 years (51%), and covered by Statutory Health Insurance (83%).
The distribution of ratings demonstrates that nearly half of the physicians were rated once and less than 2% were rated more than ten times (see Table 2). Thereby, rated physicians had a mean of 2.37 individual ratings (SD 3.169, range 1-159). It could further be shown that 88% of the patients left a single rating and 12% of them left between two and five ratings. This leads to an average of 1.19 rated physicians per patient (SD 0.778, range 1-153).
If the ratings are analyzed according to the medical specialty of the physicians in absolute terms, family physician/general practitioner, internist, and gynecologist were rated most often (13,466,8709, and 6410, respectively) (see Table 3; [38,39]). In contrast, laboratory specialist, nuclear medicine, and child and youth psychotherapist were rated least frequently (13,136, and 166, respectively). The distribution of ratings in relative terms, compared to the national physician composition, shows that the most rated medical specialties were orthopedists, dermatologists, and gynecologists (59.20%, 58.90%, and 56.90%, respectively). In contrast, the least frequently rated medical specialties were radiologists, anesthetists, and laboratory specialists (10.40%, 7.90%, and 2.10%, respectively). Evaluations Table 4 shows the evaluation results of all 53,585 rated physicians (as they are displayed on the website). It can be shown that two thirds of all evaluations were assigned to the best rating category, "very good". An additional 13% of patients rated their experience with the physician as "good". Three percent of the physicians were rated with the worst score, "insufficient" in their overall performance. The median result of all questions was "very good", while the mean varied between 1.68 for question 5 (friendliness of the physician) and 1.85 for question 3 (relationship of trust with the physician).
An analysis was performed to ascertain whether differences in the rating of a physician, regarding both the physician (ie, gender and medical specialty) and the patient characteristics (ie, gender, age, and health insurance) could be determined. The results are displayed in Table 5. They show that female physicians were rated better than their male colleagues and that the difference is statistically significant (the percentage of rated physicians below median is 61% for female and 59% for male physicians; P<.001). Furthermore, significant rating differences between medical specialties could be demonstrated (P<.001). The best rated medical specialties were laboratory specialists, anesthetists, medical practitioner without specialization, and family physician/general practitioner (85%, 76%, 74%, and 70% below median, respectively). The lowest ratings were given to neurologist/psychiatrist, ophthalmologist, orthopedist, and dermatologist (including venereologist) (47%, 45%, 35%, and 35% below median, respectively).
With respect to patient characteristics, no significant rating differences between female and male patients could be detected (percentage below median is 59% in each group; P=.505). However, it could be shown that older patients gave better ratings than did their younger counterparts (P<.001). Additionally, patients covered by private health insurance gave more favorable evaluations than did patients covered by statutory health insurance (P<.001).
Next, the correlation between the mean overall performance of a physician and the number of ratings per physician was addressed. As displayed in Figure 1, the total performance range can be observed for physicians with a low number of ratings. By contrast, physicians who received a higher number of ratings were shown to have better ratings (eg, all physicians with more than 60 ratings were rated as "very good"). As a result, the correlation between the mean overall performance of a physician and the number of ratings per physician could be shown to be statistically significant (Kendall Tau-b=0.193, P<.001). This is also true for all five mandatory questions (P<.001; data not presented here). We further investigated to find out whether similar results could be detected for the number of ratings per patient compared to the mean overall performance given by this patient. The result is displayed in Figure 1 and shows a similar correlation (Kendall Tau-b=0.178, P<.001).

Principal Findings
In this section, the results obtained in this investigation are compared to published studies, mostly from the United States.
The evidence from this investigation shows that 37% of physicians in the German outpatient sector were rated on jameda in 2012. This number exceeded those from previously published international studies. For example, Gao and colleagues showed that 16% of US physicians received an online review on RateMDs in the period between 2005 and 2010 [27]. Lagu et al reported that out of 300 Boston physicians, 27% of them had been rated [11], while Mostaghimi et al calculated percentages of between 0.4% and 21% for a sample of 250 randomly selected internal medicine physicians [33]. In a sample of 500 randomly selected US urologists, the percentages varied between 0.4% and 53.6% [40]. Published results for German PRWs reported percentages of between 3.36% and 25.78% in 2009 [31] and between 3% and 28% in 2012 [34]. However, it is worth mentioning here that direct comparison is difficult due to the fact that data from one year was analyzed in this investigation, whereas most studies use ratings for a sample of physicians without including any time constraints.
It could also be shown that rated physicians had a mean of 2.37 individual ratings (SD 3.169, range 1-159). Published results for the US PRW, RateMDs, were quite similar and were reported to be 2.7 [30], respectively 3.2 [27]. More recent US studies determined numbers of 2.35 [11] and 2.4 [40], while results for German PRWs were reported to be between 1.1 and 3.9 [34]. The number decreases to 0.87 when regarding all rated physicians from the German outpatient sector in 2012. This is slightly higher than the results obtained by Lagu and colleagues (mean 0.63) [11].
Nearly half of the physicians were rated only once, and 44% received between 2 and 5 ratings in this study. Less than 2% were rated more than 10 times and 0.1% more than 50 times. These numbers are in line with the results obtained by analyzing the ratings provided for 2010 on RateMDs. In that case, half of the physicians had a single rating and the percentage of physicians with 5 or more ratings was 12.50% [27]. Of 250 randomly selected physicians in Boston, 50 physicians (20%) had between 1 and 4 reviews on Healthgrades, 13 physicians (5.2%) on RateMDs, and 1 physician (0.4%) on Wellness. Only 3 physicians had more than 5 reviews on any of the ratings sites [33].
About one third of all rated physicians on jameda were female. This is consistent with both the gender composition of physicians in Germany (female national average 40% [38]) and with the results by Gao and colleagues [27]. If the ratings are analyzed according to the medical specialty in relative terms (ie, compared to the national physician composition), the numbers are again confirmed by other study results. For example, Gao and colleagues showed that rated physicians were most likely to be classified as obstetrician/gynecologists and least likely to be classified as other specialists such as radiologists or anesthesiologists [27].
In this study, almost 80% of all evaluations could be assigned to the two best rating categories. Less than 3% of the physicians were rated with the worst score, "insufficient". These results are in line with most other studies: Lagu and colleagues categorized 88% of quantitative reviews as positive, 6% as negative, and 6% as neutral [11]. On RateMDs, 45.80% of the physicians received the best score and only 12% were rated with the worst score [27]. Kadry et al assessed the 10 most commonly visited US PRWs and found that the percentage of reviews rated ≥75 on a 100-point scale was 61.5%, ≥4 on a 5-point scale was 57.74%, and ≥3 on a 4-point scale was 74.0% [32]. On the Canadian PRW RateMDs, 70% of the comments were reported to be favorable and about 30% of the comments were negative [41]. In the sample of 500 randomly selected US urologists, 86% had positive ratings [40]. Moreover, the median result of all questions in this study was "very good". The means varied between 1.68 concerning the friendliness of the physician (question 5) and 1.85 regarding the relationship of trust with the physician (question 3). In their study, Kadry et al determined the average rating to be 77 out of 100 for sites using a 100-point scale, 3.84 out of 5 for sites using a 5-point scale, and 3.1 out of 4 for sites using a 4-point scale [32]. For the US RateMDs, the mean scores were reported to be 3.93 [27] and 3.82 [30] on a 5-point scale, respectively. Finally, a comprehensive analysis of German PRWs showed the mean ratings to be between 1.1 and 1.5 (3-point scale, 1 "good", 3 "poor") [34].
The results of this study suggest that female physicians receive better ratings than do their male colleagues. The number is small but statistically significant (P<.001). Better ratings for female physicians were also determined by Ellimoottil and colleagues (P=.72) [40]. However, this is in contrast to the results obtained by Gao and colleagues, who showed that male physicians received higher ratings than did female physicians (P<.001) [27]. But, differences in all three studies were shown to be quite small.
We can further demonstrate significant rating differences among the analyzed medical specialties. Of these, the best rated were laboratory specialists, anesthetists, medical practitioners without specialization, and family physician/general practitioners. The lowest ratings were given to neurologist/psychiatrists, ophthalmologists, orthopedists, and dermatologists. In line with the numbers obtained in this study, higher ratings were shown for physicians in primary care [27] and lower ratings for physicians in dermatology [30]. However, in another study, primary care physicians were rated at average [30]. Lagu et al found a similar percentage of positive, negative, and neutral quantitative reviews for generalists and subspecialists. They then concluded that after accounting for varying number of reviews per physician, generalists tended to have more positive reviews than did subspecialists [11].
This is the first study that allows for a closer analysis of the patients who rate their physicians. Approximately 73% of all patients provided information regarding gender, age, and health insurance. According to our results, most of the rating patients were female (60%) and were covered by Statutory Health Insurance (83%). One other notable fact could be shown: patients in the youngest age group (<30) made fewer ratings than did older patients. Whether or not this is due to more severe illness problems with increasing age cannot be assessed with this data. However, this question should be addressed in future research.
The fact that hardly any patients leave more than a single rating (mean 1.19 rated) can be regarded as even more surprising. One might expect that once they were aware of the existence of such websites, patients would use them constantly in an active (ie, rating physicians) or passive (ie, only searching for physicians) manner, especially to assist other patients with information when seeking a physician. However, we could not investigate the motivation behind the patients' ratings. Nor could we assess the reasons for not regularly rating physicians. Considering the mean of 14 [42] to 17 [43] physician contacts in Germans with statutory health insurance, there is still high potential for even more ratings. The fact that patients covered by private health insurance give more favorable ratings than do patients covered by statutory health insurance is not surprising, since they were found to have faster access to care [44]. This might well have had an effect on the ratings differences. Whether quality of care differences can be determined between the two groups and whether this leads to ratings differences should be addressed in future studies.
It could be shown that there is a significant correlation between the mean overall performance rating of a physician and the number of ratings received for that physician (P<.001). One possible explanation for this finding might be the fact that physicians who are aware of these websites and use them as a marketing instrument may specifically ask satisfied patients to leave a (positive) rating on a PRW. Another explanation might be that some physicians, who are identified by patients on PRWs, simply provide outstanding quality of care and they receive favorable ratings afterwards. Although our results prove that there is a significant correlation between these variables, we cannot prove which assumption is true. This should be addressed in further studies, which should contain additional information about the physicians.

Limitations
There are some limitations that have to be taken into account when interpreting the results of this investigation. First, we analyzed online ratings from only a single PRW, jameda. Although jameda has shown to be the most frequently used German PRW, it is possible that other PRWs have more online reviews or show other results. Second, the data provided allowed for comprehensive analysis. However, there was no information available on the age of the physician, malpractice claims, or the medical school attended. This information would have allowed further analysis. Third, we were not able to present analysis conducted over a longer period of time. However, the data do reflect the entire year 2012. Fourth, we did not analyze results presented in narrative comments. Finally, there was no chance to verify the validity of the analyzed reviews. Therefore, it cannot be guaranteed that the ratings were not subject to manipulation [27].

Conclusions
Finally, it can be stated that there is a limited amount of publicly reported information on quality of health care providers. To increase transparency, different approaches have been developed. There are traditional PR instruments that focus on the adherence to evidence-based guidelines. Thus, they may have the potential to reflect the clinical quality of care provided by a health care professional. However, these instruments have not yet proven to be a meaningful measure for patients. In contrast, PRWs concentrate on patient satisfaction measures. Whether or not these results have the potential to reflect the quality of care provided by a health care professional should be addressed in future research as well. Since an increasing usage of these websites has already been shown [24,27,28], PRWs might contribute to reducing the lack of publicly available information on quality, at least for those physicians who have been rated. Given that only a certain number of physicians has been rated so far, there is still no perfect transparency. However, given the increasing number of ratings on PRWs, the future impact for patients seeking a physician will continue to rise.