Introduction

Systemic therapies indicated for patients with completely resected stage III or IV melanoma in the adjuvant setting include the immuno-oncology (I-O) agents nivolumab and pembrolizumab, as well as the BRAF plus MEK inhibitor combination of dabrafenib plus trametinib (for BRAF-mutant disease) [1]. Nivolumab, an anti-programmed cell death-1 (PD-1) antibody, is approved in the United States and other countries as adjuvant therapy for resected stage III or IV melanoma based on evidence from the phase 3 CheckMate 238 randomized controlled trial (RCT), which included patients with in-transit metastasis with and without nodal involvement [2]. In that trial, patients with stage IIIB, stage IIIC, or stage IV resected melanoma (per American Joint Committee on Cancer, Cancer Staging Manual, seventh edition [AJCC-7]) treated with nivolumab showed significant improvement in recurrence-free survival (RFS) compared with those treated with ipilimumab, an anti-cytotoxic T lymphocyte antigen-4 antibody (hazard ratio [HR] 0.68; 95% confidence interval [CI] 0.56–0.82; P < 0.0001; minimum follow-up, 36 months), with reduced toxicity [3]. In an updated analysis of CheckMate 238, the 5-year RFS and OS rates were 50% and 76%, respectively, among patients treated with nivolumab (minimum follow-up, 62.0 months) [4].

Data from real-world studies may complement results from RCTs by helping to address data gaps [5]. For example, comparing outcomes from RCTs with those from the real-world setting may provide important insights into the use of cancer treatments [6, 7]. Real-world evidence has been reported suggesting that adjuvant nivolumab treatment provides modest benefit in patients with resected stage IIIA melanoma [8,9,10]. The current comparative analysis aimed to validate clinical outcomes observed in patients with resected stage III melanoma who received adjuvant nivolumab in CheckMate 238 relative to a similar population from the real-world Flatiron Health electronic health record (EHR)-derived de-identified database. Time to treatment discontinuation and use of subsequent systemic treatment were also evaluated in the real-world cohort.

Materials and methods

Study design and data sources

This comparative analysis evaluated clinical outcomes in patients with completely resected stage III melanoma who received adjuvant nivolumab in either CheckMate 238 (NCT02388906; supplementary Data Sources) [2, 11] or in the real-world setting for up to 12 months, per label. Data for patients receiving 12 months of treatment and those receiving < 12 months were not analyzed separately. This analysis only included patients with completely resected stage III melanoma because patients with stage IV melanoma having no evidence of disease after resection are not included in the Flatiron Health database. Inclusion and exclusion criteria are shown in Table 1. Patients with a diagnosis of ocular/uveal melanoma prior to index date were excluded. The index date was the date of randomization to adjuvant nivolumab treatment for the CheckMate 238 cohort and the date of adjuvant nivolumab treatment initiation for the real-world cohort. In the CheckMate 238 cohort, patients who had resected stage III melanoma per AJCC-7 were reclassified per AJCC, eighth edition (AJCC-8). Data for the CheckMate 238 cohort were derived from the 5-year dataset (database lock, March 9, 2021). The real-world cohort was derived from the nationwide Flatiron Health EHR-derived de-identified database, which represents > 280 community cancer centers and eight major academic centers in the United States and includes more than three million records for patients being actively treated for cancer and followed longitudinally (supplementary Data Sources) [12]. Patients in the real-world cohort must have met the key eligibility criteria for the CheckMate 238 trial and were diagnosed with resectable stage III melanoma (per AJCC-8) between January 1, 2011, and June 30, 2022. The primary objectives of the study were to compare baseline characteristics between the two cohorts and to compare real-world OS (rwOS) in the Flatiron Health cohort with OS in the CheckMate 238 cohort.

Table 1 Inclusion and exclusion criteria for the CheckMate 238 and real-world cohorts

Outcomes

Baseline characteristics were assessed during screening (1–28 days before randomization) or at randomization in the CheckMate 238 cohort and during the 6-month period prior to the index date in the real-world cohort. Follow-up time in the CheckMate 238 cohort was defined as the period from the index date to death or date last known to be alive. Follow-up time in the real-world cohort was defined as the period from the index date to death or date of last confirmed activity (defined as the latest of the last confirmed structured activity or the last clinically relevant abstracted date [i.e., date of disease recurrence, metastasis, any oral therapy, specimen collection, medical procedure, clinical note, or disease progression]). OS in the CheckMate 238 cohort was defined as the time between the date of randomization and the date of death from any cause or the last date known to be alive. rwOS in the Flatiron Health cohort was defined as the time between the date of nivolumab initiation and the date of death from any cause; for patients without documentation of death, rwOS was censored at the data cutoff date (June 30, 2022). The mortality variable in the Flatiron Health database was curated from the following three sources: EHRs, the Social Security Death Index (SSDI), and obituary data. The mortality variable has been benchmarked to the recognized gold standard National Death Index across the 18 cancer types represented in Flatiron Health’s Enhanced Datamarts, which included advanced melanoma. The Flatiron Health mortality data have been determined to have high sensitivity (83.9%–91.5%), specificity (93.5%–99.7%), and positive predictive value (96.3%–98.3%) when benchmarked against SSDI data, all varying by tumor type [13]. Time to treatment discontinuation and use of subsequent systemic treatment were evaluated in the real-world cohort. Time to treatment discontinuation was defined as the time between the initiation of adjuvant nivolumab and treatment discontinuation for any reason (including death). Data for RFS and distant metastasis-free survival were not analyzed. Capturing or evaluating adverse events for nivolumab was outside of the scope of the analysis because safety data are not available in the Flatiron Health database.

Statistical analysis

Baseline characteristics were compared between the two cohorts. Continuous variables for baseline characteristics were summarized using means and standard deviations (SDs) and compared using the Wald test. Categorical variables for baseline characteristics were summarized using frequency counts and percentages and compared using Chi-square tests (Fisher’s exact tests for variables with small frequency counts). Comorbidities with a prevalence rate of > 2% in the real-world cohort were evaluated.

OS in the CheckMate 238 cohort and rwOS in the Flatiron Health cohort were analyzed using the Kaplan–Meier method. rwOS was compared with OS using univariable (unadjusted) and multivariable (adjusted) Cox proportional hazards models, with calculation of hazard ratios (HRs) and associated 95% CIs. Median OS and rwOS, and their associated 95% CIs, were reported. Landmark OS and rwOS rates (e.g., at 1, 2, 3, and 4 years) were estimated. A multivariable Cox proportional hazards model was used to adjust for the following key prognostic factors: age, sex, race, disease stage, time from surgical resection to index date, Eastern Cooperative Oncology Group performance status (ECOG PS), and comorbidities of diabetes, chronic pulmonary disease, and atrial fibrillation (each with a prevalence of > 2% in the real-world cohort and known to be associated with increased mortality). Adjusted OS and rwOS Kaplan–Meier curves for the two cohorts were generated using the results of the Cox proportional hazards model, which was based on the Breslow method.

Inverse probability of treatment weighting (IPTW) [14] was used to reduce baseline discrepancies between the two cohorts and address residual confounding in the adjusted Cox proportional hazards model (supplementary IPTW Methods). IPTW aimed to achieve a balanced distribution of measured confounders at baseline across the cohorts, thereby simulating an RCT in which patients were randomly assigned to either study cohort. Weights were used to create a hypothetical sample in which the distribution of measured covariates was independent of the study cohorts. Weighting each patient created a “pseudo-population” in which the distribution of measured baseline covariates was similar between the two cohorts. Each patient was assigned a weight. Propensity scores were estimated using logistic regression as the probability of belonging to the CheckMate 238 cohort (vs. the real-world cohort) given an observed set of baseline covariates (i.e., age, sex, race [White or missing vs. non-White], disease stage [IIIC/D vs. IIIA/B], time from surgical resection to index date, ECOG PS [0 or missing vs. 1], diabetes, chronic pulmonary disease, and atrial fibrillation). Patients with missing race and/or ECOG PS were grouped into the most populated category of each specific variable (i.e., White race and ECOG PS 0).

Each patient’s weight was calculated as the inverse of the propensity score. Weights were stabilized using the marginal probability of being in their observed study cohort and truncated at the first and ninety-ninth percentiles. Stabilization of weights preserved the weighted total sample size so that it was similar to the original unweighted total sample size and increased the precision of estimates. A weighted multivariable Cox proportional hazards model was used to compare weighted rwOS with OS, adjusting for baseline characteristics. A standardized difference for a given baseline characteristic of < 0.1 was considered an inconsequential imbalance between the two cohorts [15]. If the standardized difference was > 0.1, that covariate was further adjusted for in the Cox model to address residual confounding.

Time to treatment discontinuation in the real-world cohort was analyzed using the Kaplan–Meier method. The number of patients in the real-world cohort initiating subsequent systemic treatment after the discontinuation of adjuvant nivolumab during the follow-up period was recorded.

All statistical analyses were conducted using SAS Enterprise Guide 7.1 software and R 3.6.3.

Results

Sample selection

A total of 369 patients with resected stage III melanoma (per AJCC-8) receiving adjuvant nivolumab from the CheckMate 238 trial were included in the CheckMate 238 cohort. A total of 452 patients with resected stage III melanoma (per AJCC-8) who met key eligibility criteria for CheckMate 238 were included in the real-world cohort from the Flatiron Health database (Fig. 1).

Fig. 1
figure 1

Sample selection in the real-world cohort

Baseline characteristics

The CheckMate 238 cohort, compared with the real-world cohort, had a lower median age (56.0 vs. 63.0 years; P < 0.001), lower median body weight (80.0 kg vs. 89.1 kg; P < 0.001), a lower proportion of patients with stage IIIA disease (1% [reclassified per AJCC-8] vs. 5%; P < 0.01 for differences in all disease stage categories), a longer mean time between surgical resection and index date (2.2 vs. 1.4 months; P < 0.001), and a lower proportion of patients with atrial fibrillation (1% vs. 4%; P < 0.05; Table 2). ECOG PS data were missing for no patient in the CheckMate 238 cohort and for 24% of patients in the real-world cohort. A higher percentage of patients were White in the CheckMate 238 cohort than in the real-world cohort (93% vs. 76%; P < 0.001 for all race categories). Patients in the CheckMate 238 cohort received nivolumab at 3 mg/kg every 2 weeks (Q2W), and patients in the real-world cohort received nivolumab at 3 mg/kg Q2W (1%), 240 mg Q2W (43%), or 480 mg every 4 weeks (56%; based on first dosing information or, if missing, the earliest available dosing information). BRAF-mutant disease was detected in 40% of patients in the CheckMate 238 cohort and 25% of patients in the real-world cohort, although BRAF mutation status data were missing in 17% and 36% of patients in the respective cohorts.

Table 2 Baseline characteristics in the CheckMate 238 and real-world cohorts

Unadjusted OS and rwOS

Median follow-up time (defined as the period from the index date to death or the last date known to be alive) was 61.4 months (range, 0.0–70.6) and 25.5 months (range, 0.8–54.1) in the CheckMate 238 and real-world cohorts, respectively. Deaths during the follow-up period occurred in 24% of patients (n = 89) in the CheckMate 238 cohort and 17% of patients (n = 78) in the real-world cohort. In the unadjusted analysis, rwOS was not different from OS (HR 1.27; 95% CI 0.92–1.74; Fig. 2a). OS rates were slightly higher than the rwOS rates across time points. Two-year OS and rwOS rates were 89% and 84%, respectively; 4-year OS and rwOS rates were 78% and 74%, respectively. Unadjusted median OS and rwOS were not reached in either cohort. In the unadjusted analysis, baseline covariates with significantly different rwOS compared with OS were age at the index date, sex (female vs. male), disease stage at initial diagnosis (IIIC/D vs. IIIA/B), ECOG PS (1 vs. 0), and diabetes (supplementary Table 1).

Fig. 2
figure 2

Unadjusted a and adjusted b OS in patients with resected stage III melanoma (per AJCC-8) who received adjuvant nivolumab in the CheckMate 238 and real-world cohorts, respectively. aComparison of real-world cohort versus CheckMate 238 cohort. b451 of the 452 patients in the real-world cohort were included because one patient with missing comorbidity profiles was excluded AJCC-8 American Joint Committee on Cancer, Cancer Staging Manual, eighth edition, CI Confidence interval, HR Hazard ratio, NR Not reached, OS Overall survival, rwOS Real-world overall survival

Adjusted OS and rwOS using the Cox proportional hazards model

After adjusting for key prognostic factors (i.e., age, sex, race, disease stage, time from surgical resection to index date, ECOG PS, diabetes, chronic pulmonary disease, and atrial fibrillation) in the Cox proportional hazards model, rwOS was not different from OS (HR 1.01; 95% CI 0.67–1.54; Fig. 2b). Two-year OS rates were 84% in both cohorts; 4-year OS rates were 72% in both cohorts. Adjusted median rwOS and OS were not reached. Among the independent variables used in the Cox proportional hazards model, baseline covariates with significantly different (P < 0.05) rwOS compared with OS were age at index date and disease stage at initial diagnosis (IIIC/D vs. IIIA/B) (supplementary Table 2). Given that ECOG PS data were missing in 24% of patients in the real-world cohort, compared with 0% in the CheckMate 238 cohort, a sensitivity analysis was conducted that excluded patients with missing ECOG PS data, and the results from that analysis (HR 1.08; 95% CI 0.71–1.64) were consistent with those of the initial analysis (data not shown). For other variables included in the Cox model, missing data were rare.

Adjusted OS and rwOS using the Cox proportional hazards model and IPTW

A total of 820 patients, 369 from the CheckMate 238 cohort and 451 from the real-world cohort, were included in the logistic regression model for IPTW. (One patient with a missing comorbidity profile from the real-world cohort was excluded.) Baseline characteristics that were imbalanced between the two cohorts (with a standardized difference > 0.1) before IPTW were age at index date, race, time from surgical resection to index date, ECOG PS, and atrial fibrillation (Table 3). All the evaluated baseline characteristics were balanced between the two cohorts after IPTW, with the exception of time from surgical resection to index date, which was slightly longer in the CheckMate 238 cohort than in the real-world cohort (Table 3). After IPTW using stabilized truncated weights in a weighted Cox proportional hazards model, rwOS was not different from OS, with an adjusted HR after IPTW and after adjusting for time from surgical resection to the index date of 1.07 (95% CI 0.70–1.64; Fig. 3).

Table 3 Baseline characteristics before and after IPTW in the CheckMate 238 and real-world cohorts
Fig. 3
figure 3

Adjusted IPTW OS and rwOS in patients with resected stage III melanoma (per AJCC-8) who received adjuvant nivolumab in the CheckMate 238 and real-world cohorts, respectively. AJCC-8 American Joint Committee on Cancer, Cancer Staging Manual, eighth edition, CI Confidence interval, HR Hazard ratio, IPTW Inverse probability of treatment weighting, OS Overall survival, rwOS Real-world overall survival

Time to treatment discontinuation and subsequent systemic therapy in the real-word cohort

Among the 452 patients in the real-world cohort, 340 (75%) discontinued treatment during the study period. The median time to treatment discontinuation in the real-world cohort was 10.4 months (95% CI 10.2–10.8), and the rate for remaining on treatment at 6 months was 72% (supplementary Fig. 1).

Among the 452 patients in the real-world cohort, 123 (27%) were reported to have received subsequent systemic therapy (supplementary Table 3). Among the 123 patients who received subsequent systemic therapy, 26 patients (21%) received subsequent treatment in the adjuvant setting, and 97 patients (79%) received subsequent treatment in the post-recurrence setting. The most common subsequent systemic therapies used in the real-world cohort were nivolumab plus ipilimumab (n = 34; 8%), nivolumab (n = 28; 6%), and dabrafenib plus trametinib (n = 17; 4%).

Discussion

Results of this comparative analysis suggest that after adjustment, OS in the pivotal phase 3 CheckMate 238 trial [2] was similar to rwOS in the Flatiron Health database in patients with completely resected stage III melanoma (per AJCC-8) treated with adjuvant nivolumab, validating the results of the RCT. These findings are relevant given the limited real-world studies assessing the clinical outcomes of adjuvant treatments in patients with resected melanoma.

Baseline characteristics were generally similar between patients in the real-world Flatiron Health cohort (who met the key eligibility criteria for CheckMate 238) and those in the CheckMate 238 cohort, although there were a few notable differences. Compared with the CheckMate 238 cohort, the real-world cohort was older in age, possibly reflecting a lesser tendency to treat older patients with resected melanoma in the RCT (particularly because a high dose [10 mg/kg] of ipilimumab was used as the control treatment in CheckMate 238) and greater clinician experience with managing treatment-related toxicities in the real-world setting after regulatory approval of nivolumab. In addition, the real-world cohort had a slightly higher proportion of patients with stage IIIA disease per AJCC-8 than the CheckMate 238 cohort (5% vs. 1% [reclassified per AJCC-8]), which was due to selection criteria not allowing enrollment of patients with stage IIIA disease per AJCC-7 in CheckMate 238. Therefore, even when patients with low-risk, stage IIIB disease in CheckMate 238 were reclassified as having stage IIIA disease per AJCC-8, there were only a few patients with stage IIIA disease in the trial [16]. In addition, patients were more racially diverse in the real-world cohort than in the CheckMate 238 cohort, which may have reflected the underrepresentation of certain racial groups in the RCT. However, it is encouraging that results from a more racially diverse real-world cohort were consistent with RCT data.

The clinical benefit of adjuvant nivolumab observed in CheckMate 238 was similar to that observed in the real-world setting. Unadjusted and adjusted OS and rwOS in the CheckMate 238 and Flatiron Health cohorts, respectively, were not different, as 95% CIs for the HRs included 1. In the unadjusted analysis, the 2-year OS rate was similar to the 2-year rwOS rate (89% and 84%, respectively), as were 4-year OS and rwOS rates (78% and 74%, respectively), despite differences in baseline characteristics between the two populations. After applying similar patient selection criteria and adjusting for key prognostic factors, OS and rwOS rates remained similar between the cohorts (2-year OS and rwOS rates, 84% in both cohorts; 4-year OS and rwOS rates, 72% in both cohorts). In addition, OS and rwOS were not different after IPTW in the adjusted model, which controlled for residual differences between the two cohorts using a weighting approach. Furthermore, subsequent systemic therapy was used in similar percentages of patients in the nivolumab treatment arm in CheckMate 238 [2] and the real-world cohort (29% and 27%, respectively), suggesting that the use of subsequent systemic therapy did not influence the analysis. The results of this comparative analysis validate the OS benefit with adjuvant nivolumab observed in CheckMate 238 and suggest that those findings are generalizable beyond the RCT setting to the real-world setting.

This study had several limitations. As with any database analysis, there was the potential for errors in data entry and underreporting of clinical characteristics in the real-world database. Because disease conditions and comorbidities were defined by diagnosis codes in the real-word database, incompleteness or misclassification may have occurred. There was also the potential for incorrectly reported staging in the real-world cohort. Furthermore, there were complexities in extracting clinically relevant data for the real-world database using current EHR standards, which were largely designed for oncologists treating patients, tracking billing, and managing clinical care, even though strict quality assessment procedures served to maximize data integrity. The results may also have been influenced by unobserved prognostic factors that were not accounted for in the multivariable analysis, such as sentinel lymph node tumor burden in patients with IIIA disease, as this information was not captured in CheckMate 238. Moreover, the limited follow-up in patients with a relatively good prognosis was likely to have resulted in substantial censoring of survival outcomes due to improved outcomes in the real-world setting. The efficacy analysis may have been affected by differences in the definitions for OS in the CheckMate 238 cohort (time between randomization [index date] and death or date last known to be alive) and rwOS in the real-world cohort (time between nivolumab initiation [index date] and death or data cutoff). Given that the real-world database did not have information describing reasons for censoring, rwOS was censored at the data cutoff date. However, this methodology may have potentially overestimated the time at risk close to data cutoff. The findings of this analysis may have also been affected by missing data in the real-world cohort. For example, ECOG PS data were missing in 24% of patients in the real-world cohort, whereas none of the patients in the CheckMate 238 cohort had missing ECOG PS data. However, the results from a sensitivity analysis that excluded patients with missing ECOG PS data were consistent with those of the initial analysis. Median follow-up time also differed substantially between the CheckMate 238 and the real-world cohorts (61.4 vs. 25.5 months). Although patients were monitored regularly for outcome assessment in CheckMate 238, it is unclear how frequently patients were monitored in the real-world setting, which is an important factor in observing recurrences. Finally, this analysis may have been affected by geographic limitations of the flow of data into the Flatiron Health database. Despite these limitations, this analysis provides insights into clinical outcomes with adjuvant nivolumab in patients with resected melanoma in routine clinical practice.

In this comparative analysis involving patients with completely resected stage III melanoma (per AJCC-8) treated with adjuvant nivolumab, OS in the phase 3 CheckMate 238 trial was similar to rwOS in the Flatiron Health database, validating results from the RCT. These findings suggest that results from CheckMate 238 are generalizable to the real-world setting and support adjuvant nivolumab as a standard of care for this patient population.