INTRODUCTION

Applying randomized clinical trial (RCT) results to patients in routine clinical practice presents challenges. RCTs typically focus on selecting inclusion and exclusion criteria to maximize internal validity, potentially at the expense of external validity.1 This design feature, compounded by often inadequate reporting of trial participant characteristics2 and time pressures on providers,3, 4 can make it difficult for clinicians to go deeper than general guidelines when necessary to identify which patients should receive a treatment in real-world settings. Ultimately, the multiple inclusion and exclusion criteria required to select such populations are typically presented in trial appendices or methods papers that are unlikely to be seen by the busy clinician.

Considering some of these limitations, there is increasing interest in using large real-world data sources to conduct observational research that can augment results from RCTs and guide evidence-based practice.5 It has been suggested that studies using real-world data could be reported in addition to RCTs to provide the generalizability that is often sacrificed in RCTs that seek to maximize internal validity.6, 7 However, a recent study on the feasibility of using real-world data to replicate RCTs suggests that currently available data can only support this approach for a fraction of trials.8 Furthermore, there are well-known limitations to observational studies, including the challenges of conducting research with data collected for other purposes and the difficulty of reproducing results across different settings or data sources.9,10,11,12,13 Accordingly, RCTs remain the “gold standard” for clinical evidence even though discordances between the population in which an intervention is studied and the population in which it is used can have real consequences.14 Specifically, average risks and benefits in the real-world may differ from those observed in trials, which could at the very least affect shared decision-making conversations.15

Recently, several major clinical trials have shown the effectiveness of sodium glucose co-transporter 2 (SGLT2) inhibitors for treating diabetes and cardiovascular comorbidities, resulting in the incorporation of this drug class into recent American Diabetes Association Guidelines.16 However, further study is necessary to examine how closely the trial populations match the real-world patients who may be offered SGLT2 inhibitor therapy, and so this study had one primary and one secondary exploratory aim regarding just one of these trials studying SGLT2 inhibitors. Primarily, we sought to compare the characteristics of individuals living with diabetes to the baseline characteristics of patients who underwent randomization in the EMPA-REG trial, and secondarily, we hoped to characterize what proportion of individuals actually prescribed an SGLT2 inhibitor would have been included in the EMPA-REG trial.17

METHODS

Study Population

The sample consisted of adults aged 18 years and older who took part in the National Health and Nutrition Examination Survey (NHANES). NHANES is a cross-sectional survey conducted by the Centers for Disease Control and Prevention. NHANES uses a complex, multistage, clustered probability method to sample non-institutionalized US civilians to provide nationally representative estimates.18

To address the first aim of comparing patients who underwent randomization in the EMPA-REG trial with the general population with diabetes, we used responses from individuals sampled between 2011 and 2014 to correspond with the EMPA-REG study enrollment period. Respondents were eligible for inclusion if they had been selected to complete the interview, exam, and fasting laboratory sections of the survey. Respondents were included if they had diabetes, defined as having a glycated hemoglobin (hgba1c) ≥ 6.5% or fasting glucose ≥ 126 mg/dL, and were not missing data. To address our second aim of determining the proportion of individuals that were prescribed SGLT2 inhibitors that would have met inclusion criteria for the EMPA-REG trial, we used responses from individuals sampled between 2015 and 2018, which are the most recent NHANES cycles available. Respondents were eligible for inclusion if they had completed the interview, exam, and laboratory sections of the survey. Respondents were included if they were prescribed a SGLT2 inhibitor and were not missing data.

Outcome Measures

Outcomes were defined based on demographic, clinical, laboratory, and medication variables. Demographic variables included age (years), sex (male vs female), ethnicity (Hispanic vs non-Hispanic), and pregnancy status (yes vs no). Clinical variables included heart disease (defined as self-reported history of angina, coronary artery disease, stroke, myocardial infarction, or heart failure), weight (kilograms), BMI (kg/m2), and systolic and diastolic blood pressure (mmHg). Laboratory variables included hgba1c (%), low-density lipoprotein cholesterol (mg/dL), high-density lipoprotein cholesterol (mg/dL), triglycerides (mg/dL), and estimated glomerular filtration rate (GFR) as calculated by the MDRD equation (mL/min/1.732).19 Medication variables included the number of individuals using prescription contraception, taking anti-hyperglycemic agents (including biguanides, sulfonylureas, dipeptidyl peptidase-4 (DPP-4) inhibitors, thiazolidinediones, SGLT2 inhibitors, and insulin), anti-hypertensive agents (including angiotensin-converting enzyme inhibitors/angiotensin II receptor blockers (ACEi/ARB), beta-blockers, diuretics, calcium-channel blockers, mineralocorticoid receptor antagonists, and renin inhibitors), and lipid-lowering therapies (including statins, fibrates, ezetimibe, and niacin). Medication usage was extracted from NHANES preferentially using Lexicomp drug class codes and with generic drug identification codes when no specific drug class existed. All data from NHANES were extracted using the nhanesA package in R while data from the EMPA-REG trial were extracted from Table S2 in the supplementary materials.17

The primary outcome, created based on these variables, was whether a respondent would have met inclusion or exclusion criteria for EMPA-REG. Criteria were extracted from the EMPA-REG trial protocol. EMPA-REG inclusion criteria included hgba1c ≥7.0% and ≤ 10% for patients on background therapy or hgbA1c ≥7.0% and ≤ 9.0% for drug-naïve patients, BMI ≤45 kg/m2, and history of heart disease while exclusion criteria included pregnant women or women of child bearing age not on birth control, GFR ≤ 30 mL/min/1.73 (via the MDRD equation per trial protocol)2, and substantially abnormal laboratory test (operationalized for this study as aspartate aminotransferase, alanine aminotransferase, or alkaline phosphatase greater than three times the upper limit of normal).20 The secondary outcome was whether individuals actually prescribed an SGLT2 inhibitor from 2015 to 2018 would have met these same criteria.

Statistical Analysis

Descriptive statistics were used to describe sample means and the number of individuals who would have met inclusion criteria for the trial. One-sample T-tests were used to compare continuous NHANES sample means with continuous population means reported in the EMPA-REG trial. Chi-square goodness of fit tests were used to compare categorical NHANES sample means with categorical population means reported in the EMPA-REG trial. Finally, a subgroup analysis, using one-sample T-tests and chi-square goodness of fit tests as described, was also conducted among only those who reported having heart disease. All analyses were conducted in R version 4.0.2 (https://cran.r-project.org/) or SAS (Cary, NC: SAS Institute Inc.) using appropriate survey weighting with the survey package in R or the SURVEYFREQ procedure, respectively. A two-sided alpha level of 0.05 was used to assess for statistical significance.

RESULTS

For the analysis of whether patients with diabetes would have been eligible for EMPA-REG, 655 respondents were included in the analysis, representing a weighted sample of 21,849,775 individuals with diabetes from 2011 to 2014. For the analysis of patients receiving SGLT-2 inhibitors in real-world settings, 48 respondents were included in the analysis, representing a weighted sample of 1,062,573 individuals who received an SGLT2 inhibitor from 2015 to 2018.

Compared with individuals in the EMPA-REG trial, NHANES respondents were younger (59.2 vs 63.1 years, p<0.001), were more likely to be female (45.6% vs 28.5%, p<0.001), weighed more (94.1 kg vs 86.3 kg, p<0.001), had a higher BMI (33.2 kg/m2 vs 30.6 kg/m2, p<0.001), and had lower systolic (131.1 mmHg vs 135.4 mmHg, p<0.001) and diastolic (70.7 mmHg vs 76.6 mmHg, p<0.001) blood pressure (Table 1). Among laboratory measures, NHANES respondents had lower hgba1c (7.55% vs 8.07%, p<0.001), higher LDL (102.8 mg/dL vs 85.7 mg/dL, p<0.001) and HDL (47.4 mg/dL vs 44.4 mg/dL, p<0.001), but lower triglycerides (152.7 mg/dL vs 170.6 mg/dL, p=0.002), and higher GFR (84.4 mL/min/1.732 vs 74.0 mL/min/1.732, p<0.001). Finally, NHANES respondents were less likely to be on every class of anti-hyperglycemic and anti-hypertensive medication evaluated as well as less likely to be on statins (but not less likely to be on other lipid-lowering medications, Table 2).

Table 1 Comparison of NHANES and EMPA-REG Demographic, Exam, and Laboratory Characteristics
Table 2. Comparison of NHANES and EMPA-REG Medication Characteristics

When the analysis was limited only to NHANES respondents with cardiovascular disease, patterns of differences between the NHANES populations and the EMPA-REG populations remained similar for sex, weight, BMI, systolic and diastolic blood pressure, hgba1c, and LDL. However, the NHANES respondents with cardiovascular disease were less likely to be Hispanic (7.9% vs 18.0%, p=0.001) and did not have significantly different levels of HDL, triglycerides, or GFRs compared with EMPA-REG trial participants. Lastly, similar to patterns observed when comparing the total NHANES population with the EMPA-REG population, rates of medication use by NHANES respondents with cardiovascular disease were generally significantly lower than rates of medication usage by the EMPA-REG trial participants.

Overall, 7.6% (95% CI 4.8–10.6%) of 2011–2014 NHANES participants with diabetes would have met EMPA-REG trial inclusion criteria. Most notably, only about one-third of individuals would have met the hgba1c and drug cutoff criteria (36.6%, 95% CI 32.1–41.1%) and only a quarter would have met the high cardiovascular risk criteria (25.1%, 95% CI 20.1–30.2%; Table 3).

Table 3 Description of Proportion of NHANES Population Meeting Major EMPA-REG Inclusion Criteria

In the secondary analysis of individuals who participated in NHANES from 2015 to 2018 and were prescribed an SGLT2 inhibitor, 10.6% (95% CI <1–24.7%) would have met all inclusion criteria to be included in the EMPA-REG trial. Most NHANES respondents prescribed an SGLT2 inhibitor failed to meet the hgba1c and drug cutoff criteria (49.2%, 95% CI 24.1–74.4%) or the high risk of heart disease criteria (32.7%, 14.7–50.6%, Table 4).

Table 4 Description of Proportion of NHANES Population Who Received an SGLT2 Inhibitor and Met Major EMPA-REG Inclusion Criteria

DISCUSSION

This study compared individuals with diabetes from the general population with participants from the EMPA-REG trial. Overall, individuals with diabetes in the NHANES sample differed substantially from EMPA-REG trial participants across demographic, clinical, laboratory, and medication domains. When only individuals with heart disease were considered, the NHANES sample was more similar to EMPA-REG trial population, but there were still significant differences across all domains. A minority of individuals with diabetes in the 2011–2014 NHANES sample, and a minority of individuals in a more contemporary NHANES sample who were prescribed an SGLT2 inhibitor would have met eligibility for the EMPA-REG trial.

We examined several components of external validity that clinicians might consider during their routine practice. First, they may consider how well the patient characteristics of trial populations, such as age, sex, and race, match their own clinical populations. Second, they may consider the stage of disease or extent of disease control in the trial population as compared with those parameters in their patients. Third, they may consider whether the patterns of care and treatment in a trial protocol are similar to the way that they practice; for example, study participants may have more frequent follow-up testing or medication titration than would be expected in typical practice. Assessments of the external validity of clinical trials often focus on the first component: how closely the demographic characteristics of patients enrolled in trials reflect those of the general population. Proposed solutions to mismatches between trial and general populations typically include increased recruitment of under-represented groups of patients,21, 22 which represents an important priority for researchers. However, the second and third components of external validity have been less widely studied and may require different mitigation approaches. In addition to recruiting a more diverse population, addressing these other challenges may require additional detail in the reporting of trial results that goes beyond subgroup analyses to include descriptions of how the stage of disease and patterns of care for trial participants compare with local, state, or national trends, which might assist both individual clinicians with decisions as well as regional and national organizations with guideline-making. Going further, trialist could consider recruitment plans with the aim of more closely matching trial populations to clinical populations on more than just demographic variables.

While prior work has identified criteria for clinicians and researchers to consider when evaluating external validity,23, 24 many trials do not present enough information for clinicians to make these assessments. For example, one study that examined nearly 200 Norwegian drug trials affecting general practice found that a majority did not report one or more important variables necessary for assessing external validity.25 Even when enough information is available, evaluations of the external validity of trials have found that inclusion and exclusion criteria often limit trial generalizability. A review of cardiology, oncology, and mental health studies that sought to compare trial populations with real-world populations concluded that over 70% of trials were not representative of real-world populations.26 Our findings are consistent with this previous literature on trial generalizability.

Recent evaluations of the generalizability of clinical trials of novel anti-diabetic medications have concluded that no trial perfectly matches real-world populations.27,28,29,30 These studies have reported the proportion of individuals from the general population who would have been included in trials of anti-diabetes medications based largely on demographic characteristics and a limited set of laboratory measurements but report only limited comparisons of other aspects of the trial versus real-world populations, and few describe how characteristics of the trial populations compare with characteristics of individuals in the general population receiving these agents. Our study builds on this previous work in two ways: first, by providing a more comprehensive framework for comparing a nationally representative population with a trial population, and, second, by reporting how many individuals receiving an SGLT2 inhibitor would have been included in one of the landmark trials.

The direction of the differences between NHANES respondents and trial participants was not consistent; for example, the NHANES populations was younger, had lower blood pressures, and had higher GFRs compared with the EMPA-REG trial populations, but NHANES respondents also had higher average BMI. The finding that NHANES respondents were less likely to be on multiple classes of medications might suggest that the population is healthier than trial participants though we cannot exclude the possibility that NHANES respondents were receiving less aggressive diabetes care than trial participants especially considering EMPA-REG investigators were encouraged to aggressively treat cardiovascular risk factors, like hypertension and hyperlipidemia, during the study period. Either of these explanations could be problematic for the generalization of the trial results, particularly if SGLT2 inhibitors interact, either positively or negatively, with other drug classes. Even when only individuals with high cardiovascular risk were included, which would more closely approximate the EMPA-REG population that was enriched for such individuals, many of these differences remained. It is plausible that these differences could influence the efficacy of SGLT2 inhibitors; for example, the EMPA-REG trial reported that SGLT2 inhibitors tended to have less effect in individuals with BMI over 30. These findings mirror similar analyses from Europe that have suggested that effect sizes seen in the data from trials of SGLT2 inhibitors may only apply to a small subset of individuals with diabetes.31

Clinical trials cannot be perfectly representative of the entire population; it is more efficient for them to select participants for whom a given therapy is most likely to have benefit.32 Our finding that a minority of NHANES respondents who received an SGLT2 inhibitor would have met trial inclusion criteria for EMPA-REG highlights another potential concern with the generalizability of these trials. While the confidence intervals for our estimates were wide, these preliminary findings point to an area where ongoing surveillance in the form of observational studies could be useful to both quantify effect sizes in real-world (vs trial) populations and possibly help redirect prescribing to more appropriate populations.

This study has several limitations. First, because of coding differences for variables in NHANES and the EMPA-REG trial, some characteristics, such as race, could not be compared. Moreover, this study could not differentiate NHANES respondents diagnosed with type 1 versus type 2 diabetes, but because type 2 diabetes in over 17 times more prevalent, misclassifications were unlikely to significantly affect results.33 Relatedly, it is possible that some individuals did not meet the hgba1c cutoff for either the primary or secondary analysis because of improved control on anti-diabetic regimens, specifically SGLT2 inhibitors, as it relates to the latter analysis. Additionally, cardiovascular risk in NHANES is assessed via self-report rather than the more detailed medical information available to trial recruiters, so some NHANES respondents may or may not have been classified as high cardiovascular risk according to trial protocols, though it is unclear if the average primary care clinician makes decisions with information closer to that contained in NHANES questions or the EMPA-REG trial data. Furthermore, the sample size of NHANES respondents receiving an SGLT2 inhibitor from 2015 to 2018 was small, limiting our ability to make comparisons. While it is possible that individuals can be prescribed SGLT2 inhibitors for multiple indications, it is less likely that that was the case in this analysis as our study period predates the trials of SGLT2 inhibitors for use in kidney disease and heart failure.34, 35 Future studies should more fully characterize the demographic, clinical, laboratory, and medication differences between those prescribed SGLT2 inhibitors and those who were included in these more recent trials. Finally, this study did not assess all inclusion and exclusion criteria from the EMPA-REG protocol, including some subjective considerations, such as whether a prospective trial participant would be able to make follow-up visits and might therefore overestimate the percent of individuals meeting inclusion criteria.

This study builds on the current literature by exploring in more detail the differences between patients included in the EMPA-REG trial and the general US population with diabetes. Our findings demonstrate how the trial population differed significantly from a nationally representative sample in multiple domains and that few individuals with diabetes or who receive SGLT2 inhibitors would have met EMPA-REG trial eligibility criteria, which highlights the more general difficultly in applying the literature to clinical decision-making. Clinical trialists might address this issue by selecting inclusion and exclusion criteria with the expressed purpose of closely matching the population in which the drug will be used and providing clearer comparisons of how the trial population compared with the general population in which the drug might be marketed. These results suggest that observational research comparing trial populations with relevant real-world populations across multiple classes of variables could provide useful data to support clinicians attempting to apply the results from a trial to their own patients. Finally, clinicians should be mindful of the possible limitations of trial generalizability when making prescribing decisions for individual patients.