BACKGROUND

Overuse is care in which potential harms outweigh potential benefits. This applies both to over-testing that may lead to downstream actions that cause harm or overtreatment where therapeutics are used under circumstances where the potential harms likely outweigh benefits.1, 2 The Choosing Wisely Campaign was developed to address over-testing and overtreatment for commonly encountered conditions and is gaining momentum internationally.3 A working group from the American Geriatrics Society applied these principles to older adults.4 Several of the topics identified were well-established examples for which clinicians in ambulatory practice often do not follow best practices thereby leading to patient harm.2, 5,6,7,8,9,10 Three specific examples relevant to general ambulatory care in which clinicians deviate from Choosing Wisely recommendations include the following:4 (1) prostate-specific antigen (PSA) testing to screen for prostate cancer in older men,11 (2) overuse of urine studies (e.g., urinalysis and/or urine cultures) to detect pyuria and/or bacteriuria in older women without specific genitourinary signs and/or symptoms or conditions,12 and (3) overtreatment of diabetes mellitus in older adults to achieve a hemoglobin A1C level (HbA1c) of < 7.0% with drugs that may cause hypoglycemia.13, 14

There are several factors that might influence clinician decision making regarding testing and treatment related to overuse in older adults, including patient, clinician, and environmental characteristics.15, 16 A few studies have shown that clinician-level variation may influence whether a test or treatment is performed.17 To our knowledge, the extent of clinician-level variation in metrics of overuse of testing or treatment in older adults has not been well established. Understanding clinician-level variation may inform the development of targeted interventions aimed to improve adherence to evidence-based practices.

The objective of this paper is to examine the clinician-level variation for measures of three areas of potential overuse in older adults: (1) PSA screening in older men (PSA), (2) overuse of urine studies in older women without specific genitourinary signs and/or symptoms (UA/UC), and (3) diabetes mellitus overtreatment in older adults (DM).

METHODS

Design

The study was a retrospective analysis of process and intermediate outcome measures in adults aged 65 years and older evaluated in outpatient primary care or immediate care within two regions (north and central) of a health system affiliated with an academic medical center between July 1, 2016, and June 30, 2017. Northwestern University’s Institutional Review Board approved the study.

Subjects

Clinicians in this sample included physicians, physician assistants, advanced practice nurses in primary care specialties (e.g., internal medicine, family practice, geriatrics) providing outpatient care for patients who were eligible for any of the three overuse measures below. Clinicians practicing in immediate care were included only in the UA/UC measure. Only clinicians with ten or more eligible patients attributed to them were included in the clinician-level analyses for any given measure. We describe the three specific older adult patient populations we examined during the study period below.

Measurements

The primary measures of interest were adherence to three Choosing Wisely recommendations, operationalized as electronic clinical quality measures (eCQM) calculated from electronic health record (EHR) data that represent potentially inappropriate use of medical services in older adults.

PSA testing against guidelines was defined as the presence of a PSA laboratory result in the EHR during the study period (numerator) among men aged 76 years and older without a preceding diagnosis or procedure suggesting a history of prostate cancer (denominator) (eTable 1). Patients on androgenic agents or anabolic steroids were excluded (eTable1). Laboratory orders that constituted PSA testing against guidelines are provided in eTable 2. Patients were attributed to the primary care clinician with whom they had the greatest number of visits during the measurement period.

Urine Testing for Non-specific Reasons was defined as the proportion of urinalyses (UA) and/or urine cultures (UC) that were not associated with a visit diagnostic code indicating a specific genitourinary sign, symptom18 or other potentially relevant indication among instances where a UA/UC was obtained in the interval 24 h before to 48 h after a face-to-face ambulatory care visit by a woman aged 65 years or older with a qualifying clinician. Patients with a UC order were excluded if they had a visit diagnostic code indicating a genitourinary sign or symptom (e.g., hematuria, urgency, frequency, hematuria, dysuria, suprapubic tenderness, and/or costovertebral angle pain or tenderness)18, genitourinary condition (e.g. nephrolithiasis), or other indications of infection (e.g., fever) as described in eTable 3. Patients with a UA order were excluded if they had a genitourinary tract specific sign or symptom, genitourinary condition, other indication of infection, or a systemic illness or comorbidity for which a UA may be appropriate (e.g., hypertension, rheumatologic condition) as described in eTable 3. Testing episodes were attributed to the clinician who ordered the test. Only the first testing episode was included in patients with multiple qualifying visits.

Diabetes Overtreatment in the Elderly was defined as the having the most recent hemoglobin A1C during the study period or the year prior to the study period of less than 7.0 among adults aged 75 years and older with a diagnosis of diabetes mellitus who had insulin or an oral hypoglycemic (e.g., sulfonylurea or meglitinide, eTable 4) on their active medication list at the end of the study period. Patients only on medications without much potential for hypoglycemia (e.g., metformin) were excluded. Patients were attributed to the primary care clinician with whom they had the greatest number of visits during the measurement period.

Additional details of the measure criteria and measure development process are available in the supplemental criteria (eMethods). For each measure, a physician examined approximately 100 charts to confirm the accuracy and completeness of the measure criteria.

We obtained discrete EHR data for all participants. Characteristics included age, race and ethnicity, insurance, and marital status.

Statistical Analysis

For each measure, we examined the proportion of patients meeting the measure in the population overall and by an individual clinician. We calculated the intraclass correlation coefficient (ICC) from a mixed logistic regression for each measure to assess the degree of variation at the clinician level. We constructed caterpillar plots with 95% confidence intervals to show the overall variation of individual clinicians’ performance and to visualize the variability across clinicians and help determine which clinicians tended to differ in performance from the rest. We also sought to quantify the odds of an individual patient receiving inappropriate testing or treatment based on the treating clinician. To accomplish this, for each measure, we divided the clinicians into quartiles based on their performance rates. Then, we applied logistic regression models for the patient-level indicator of a potentially inappropriate test, with the primary predictor of interest corresponding to their clinician’s quartile membership. We calculated odds ratios for the 2nd–4th quartiles (e.g., higher inappropriate use) compared with the 1st quartile (e.g., lowest inappropriate use). For the DM model, we adjusted for age group. For PSA and UA/UC we adjusted for both age group and race. We examined whether clinician-level performance on one measure was correlated with performance on another by calculating Pearson correlation coefficients. SAS 9.4 (SAS Institute Inc. Care, NC) was used for all analyses and R version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria) was used for the plots.

RESULTS

Table 1 shows overall rates for each overuse eCQM. The mean age of patients in each group was 81.6 years for PSA testing, 73.4 years for urine testing without specific signs/symptoms or conditions, and 81.0 years for overtreatment of DM. Patient characteristics are listed in eTable5. The overall proportion of patients with potentially inappropriate use for each measure were 0.23 for PSA, 0.23 for UA/UC, and 0.32 for DM. Table 2 describes the estimated odds of obtaining an inappropriate test for patients attributed to the clinician in the higher overusing quartiles compared with the lowest overusing quartile. The estimated odds of receiving a PSA test against guidelines for patients attributed to a clinician in the highest overusing quartile were 99.3 times greater than for patients attributed to a clinician in the lowest quartile. The estimated odds that a UA/UC was done for non-specific reasons for patients evaluated by a clinician in the highest overusing quartile were 15.7 times greater than for patients seeing a clinician in the lowest quartile. The estimated odds of DM overtreatment for patients attributed to a clinician in the highest overusing quartile were 6.0 times greater than for patients attributed to a clinician in the lowest quartile. Performance by individual clinicians along with corresponding 95% confidence limits are shown in Figures 1, 2, and 3. Clinician performance on the PSA measure was positively correlated with performance on the diabetes measure (eTable 6). Correlation at the clinician level was highest for the PSA measure (ICC = 0.27), followed by UC/UA (ICC = 0.18), and lowest for DM (ICC = 0.024).

Table 1 Rates and Within-Clinician Correlation for Three Geriatric Overuse Measures
Table 2 Rates of Geriatric Overuse Measures by Quartile of Clinician Performance
Figure. 1
figure 1

Proportion of men 76 and older receiving screening PSA against guidelines by primary care clinician Circles represent the observed proportion of patients with PSA result for each clinician. Vertical lines represent the 95% CIs. Bold (red) horizontal line represents the proportion of patients with a screening PSA result for the entire group of patients attributed to these primary care clinicians (n = 69). Results are limited to only clinicians with 10 or more eligible patients.

Figure. 2
figure 2

Proportion of UA and/or UC done for women ≥65 years of age without an appropriate diagnosis by clinician Note: Circles represent the observed proportion of UA and/or UC done for women ≥ 65 years of age without an appropriate diagnosis for each clinician. Vertical lines represent the 95% CIs. Bold (red) horizontal line represents the mean proportion for all eligible cases attributed to the group of primary care and immediate care clinicians (n = 144). Results are limited to only clinicians with 10 or more eligible patients.

Figure. 3
figure 3

Proportion of diabetes patients 75 or older on insulin or oral hypoglycemic medication with most recent HbA1c < 7.0 by clinician Note: Circles represent the observed proportion of patients on insulin and/or an oral hypoglycemic with a HbA1C < 7.0. Vertical lines represent the 95% CIs. Bold (red) horizontal line represents the mean proportion for all eligible patients attributed to the group of primary care clinicians (n = 42). Results are limited to only clinicians with 10 or more eligible patients.

DISCUSSION

We examined three eCQMs to assess potential overuse for topics described in the geriatrics Choosing Wisely Campaign4 and applied them to a population of patients and clinicians within a single large health system. We observed several notable findings: (1) overuse, as assessed by these measures, was not uncommon, with measured rates ranging from 23 to 32%; (2) the performance on these measures varied considerably across clinicians, (3) the amount of variation occurring at the clinician level was much greater for the two process measures (urine testing and PSA screening) than for the intermediate outcome measure (diabetes overtreatment), and (4) PSA and DM were positively correlated, but UA/UC was not correlated with either of the other measures. For the PSA and UA/UC—both related to diagnostic overuse—there were clinicians whose rates were zero or close to zero and others with potential overuse more than a majority of the time. These results suggest that for these two quality measures there are some clinicians whose behavior is highly consistent across patients in their practice, but may differ greatly from other clinicians in their medical group.

We are not aware of prior work that has examined multiple eCQMs related to overuse in older patients in the ambulatory care setting, but several investigators have used administrative data. A prior study that used Medicare claims data from the state of Texas in 2010 to evaluate PSA screening in men over age 75 found a screening rate from primary care clinicians of 28.8% with similarly wide variation in practice across clinicians and an intraclass correlation at the clinician level of 0.27 (identical to the ICC we observed).19 In another study of Texas Medicare patients, variation in the early imaging for low back pain among primary care clinicians was substantial (ICC 0.25) and displayed a distribution similar to what we observed for the two process measures we examined.17 Pendrith et al. examined three low-value services (specific low-value uses of advanced imaging for low back pain, repeated dual-energy X-ray absorptiometry, and cervical cancer screening) among primary care physicians in Ontario Canada and found considerable variation at both the practice and regional level.20

The positive correlation between two of the measures examined here suggests that these two measures may be assessing an underlying common clinical quality domain at the clinician level in this population. Prior work on clinician-level quality measurement that examined diabetes quality measures did show a good degree of association across some quality measures at the clinician level which provided support for aggregating measures to represent a single construct. However, these were not measures of overuse.21 Bouck and colleagues used administrative data to examine four low-value tests and found that among primary care clinicians from Ontario, there were physicians who could be identified as overusing more than one test. In particular, the same physicians tended to be over-users of low-value ECGs and chest X-rays.22 Another study analyzing data on low-value care among Medicare beneficiaries also found significant variation in use of low-value services among primary care physicians. However, in contrast to our study, Schwartz et al. found substantial overuse even among the least wasteful physicians.23 They also did not find that physician characteristics contributed significantly to the variation seen in the use of low-value care, though other studies have found physician characteristics to have important associations with overuse.22 Further work is needed to determine whether a meaningful composite measure of overuse can be constructed from individual quality measures at the clinician level. Additionally, future work should seek to identify the key drivers that lead clinicians towards over-testing and overtreatment. Insights into not only the prevalence of overuse but also its causes will be needed to inform the development of future interventions that aim to change this behavior. This study adds to our understanding of how some clinicians (even in the absence of an improvement intervention) are not overusing the tests we examined, Exploring the knowledge and attitudes of these low-using physicians may also help identify viable ways to reduce overuse.

LIMITATIONS

First, this study was conducted within one health system and may not be generalizable to all settings. We expect there might be more variation at the clinician level if a wider range of clinicians were examined. Second, the measures of overuse relied on discretely coded data and may not have captured all the information that justifies clinical decision making. Thus, it is possible that in some cases there was inconsistency between what was documented using discrete diagnostic codes and the clinical rationale provided in the free text of the chart. We acknowledge that some patients always pose exceptions to quality measures and the optimal rate for these measures is probably not zero. This is of concern particularly for the urine testing measure. Third, some variation could be due to variation in coding practices rather than true differences in clinical care. Fourth, for the urine testing measure, we used the proportion of urine tests that were inappropriate, not the proportion of patients, because of the high frequency for which urine tests are ordered in this population. Lastly, the diabetes measure we used may underestimate the overuse of hypoglycemic treatment in older adults. For some individuals with advanced age, serious comorbidities, or frequent or serious hypoglycemia, greater de-intensification of insulin or oral hypoglycemics may be warranted.24

CONCLUSIONS

We were able to design and implement several electronic clinical quality measures of overuse pertaining to ambulatory care of older adults. We found considerable room for improvement in all three measures and significant variability in process measures at the individual clinician level. Further studies should explore these measures in other populations and examine the clinical outcomes of interventions aimed both at lowering the use of these practices overall and at reducing unjustified clinician-level variability.