Using Negative Control Outcomes and Difference-in-Differences Analysis to Estimate Treatment Effects in an Entirely Treated Cohort: The Effect of Ivacaftor in Cystic Fibrosis

Abstract When an entire cohort of patients receives a treatment, it is difficult to estimate the treatment effect in the treated because there are no directly comparable untreated patients. Attempts can be made to find a suitable control group (e.g., historical controls), but underlying differences between the treated and untreated can result in bias. Here we show how negative control outcomes combined with difference-in-differences analysis can be used to assess bias in treatment effect estimates and obtain unbiased estimates under certain assumptions. Causal diagrams and potential outcomes are used to explain the methods and assumptions. We apply the methods to UK Cystic Fibrosis Registry data to investigate the effect of ivacaftor, introduced in 2012 for a subset of the cystic fibrosis population with a particular genotype, on lung function and annual rate (days/year) of receiving intravenous (IV) antibiotics (i.e., IV days). We consider 2 negative control outcomes: outcomes measured in the pre-ivacaftor period and outcomes among persons ineligible for ivacaftor because of their genotype. Ivacaftor was found to improve lung function in year 1 (an approximately 6.5–percentage-point increase in ppFEV1), was associated with reduced lung function decline (an approximately 0.5–percentage-point decrease in annual ppFEV1 decline, though confidence intervals included 0), and reduced the annual rate of IV days (approximately 60% over 3 years).

Randomized controlled trials are the gold standard for estimating treatment effects but are typically infeasible for estimating long-term effects. Observational data provide opportunities to estimate such effects, under strong assumptions. A key assumption is positivity, meaning individuals have a probability less than 1 of receiving (or not) the treatment, given any covariates controlled in the analysis (1).
One situation where this assumption is not met is when an entire cohort of patients receives treatment. Then it is difficult to estimate a treatment effect because we do not observe contemporary untreated individuals. It may be possible to identify a comparable group who could not receive treatment, for example, historical controls prior to its availability.
However, resultant analyses make the strong assumption of no differences between the control and treatment group that affect the outcome, except the treatment itself and its consequences. We consider estimation of causal treatment effects (CTEs) in this setting through an investigation of the effect of the disease-modifying treatment ivacaftor in people with cystic fibrosis (CF).
In the United Kingdom, approximately 10,500 persons have CF (2). The most seriously affected organ is the lung, with long-term deterioration in lung function observed. Ivacaftor has been available in the United Kingdom since 2012, and approximately 5% of the UK CF population-those with a particular CF-causing gating mutation-are eligible to receive it.
Randomized controlled trials have found that ivacaftor improves lung function and reduces the incidence of pulmonary exacerbations (3)(4)(5). Ramsey et al. (3) reported that in patients aged 12 years or older with the specified gating mutation, the mean percent predicted forced expiratory volume in 1 second (ppFEV 1 ) was 10.5 percentage points higher after 48 weeks of ivacaftor treatment versus placebo, and the ivacaftor group was 55% less likely to have a pulmonary exacerbation. Similar results have been reported in younger children (4,5). These patients will take ivacaftor for many years, and it is hoped that it will change their slope of lung function decline ("slope-change effect"), as well as improve lung function during the initial treatment phase ("step-change effect").
In most countries with high CF prevalence, almost all eligible patients now receive ivacaftor, meaning that observational data provide no contemporary controls. Four studies using national patient registry data compared ivacaftor users either with people not receiving ivacaftor because they did not have a gating mutation ("genotype comparison") or people who were eligible for ivacaftor but in the time period prior to its availability ("time-period comparison") (6)(7)(8)(9). The results were similar to randomized controlled trial findings and also suggested longer-term benefits up to 4 years later. However, even after accounting for baseline differences between the treated and untreated, these studies were prone to bias because of people with different CF-causing mutations having different disease trajectories, or because of general improvements over time in the health of the CF population (10). Lung function decline among people with different CF genotypes has previously been found to be similar (11,12), but even small differences could result in biased findings. The health of people with CF has been improving over time, which could affect time-period comparisons.
In this paper, we use directed acyclic graphs (DAGs) and single-world intervention graphs (SWIGs) (13) to illustrate assumptions made in the choice of control groups for the genotype and time-period comparisons. We describe how negative control outcomes (NCOs) can be used as a tool to detect bias in treatment effect estimates in this setting and used in combination with the difference-in-differences approach to obtain unbiased estimates of the CTE under weaker assumptions. The methods are applied using data from the UK Cystic Fibrosis Registry (14) to obtain more robust estimates of the effect of ivacaftor on lung function and annual rate (days/year) of intravenous (IV) antibiotic use (i.e., IV days).

DATA
We use UK Cystic Fibrosis Registry data from 2008-2016. The registry has been described elsewhere (14). Briefly, patients undergo an annual assessment, which captures measures of current health status, events that have occurred since the last assessment, and treatments used. Our analyses include 8,444 people: 467 with a gating mutation, and therefore eligible to receive ivacaftor after its introduction, and 7,977 with a nongating mutation (see Web Figure 1, available at https://doi.org/10.1093/aje/kwab263).
We define 2008-2012 as the pre-ivacaftor period and 2013-2016 as the post-ivacaftor period. Outcomes measured in 2012 were made prior to ivacaftor's becoming available to eligible patients. Key data for this analysis include treatment (ivacaftor), genotype (gating mutation or nongating mutation), calendar period (pre-or postivacaftor period), and outcomes (lung function, annual number of IV days). For lung function, the main analyses use ppFEV 1 . We also present results for 2 other measures of lung function (percent predicted forced vital capacity and percent predicted forced midexpiratory flow). Adjusted analyses use several demographic and clinical variables, as measured in 2008 for analyses in the pre-ivacaftor period and in 2012 for the post-ivacaftor period (see "Analysis" section).
We divide individuals into 4 groups defined by genotype and time period (Table 1). Ivacaftor is only available in group B, this being the group with an eligible genotype (G = 1) in the post-ivacaftor period (P = 1). The other 3 groups do not receive ivacaftor, either because it is not yet available (A) or because they have an ineligible genotype (D) or both (C). Some individuals appear in both groups A and B or both groups C and D.

Causal treatment effect
The CTE of interest is the effect of ivacaftor on an outcome Y in persons who actually receive the treatment (X = 1). Let Y X=0 denote the potential value of Y had, contrary to fact, a patient not received ivacaftor, and let Y X=1 denote the potential value of Y had a patient received ivacaftor. The CTE is the average treatment effect in the treated, and for a continuous outcome (ppFEV 1 ) this is measured using the mean difference: The treated cohort corresponds to individuals with the eligible genotype in the post-ivacaftor period. Hence, the CTE can be written as (2) Since all patients with the eligible genotype did receive ivacaftor in the post-ivacaftor period, Y X=1 is equal to the observed Y; therefore, The first expectation, E(Y|P = 1, G = 1), can be estimated directly as the mean outcome in individuals with (P = 1, G = 1) (group B). Assumptions are needed, however, to estimate the second expectation, since Y X=0 is entirely unobserved in group B. In groups A, C, and D, Y X=0 = Y. Hence, we can estimate E(Y X=0 |P = 0, G = 1) = E(Y|P = 0, G = 1) using group A, E(Y X=0 |P = 0, G = 0) = E(Y|P = 0, G = 0) using group C, and E(Y X=0 |P = 1, G = 0) = E(Y|P = 1, G = 0) using group D. Corresponding methods for the count outcome (IV days), where the CTE is a rate ratio, are described in Web Appendix 1.

Naive treatment effect
Previous observational studies have compared people receiving ivacaftor to people in the time period prior to its availability ("time-period comparison") or people not eligible due to their genotype ("genotype comparison"), typically with adjustment for covariates. We define such comparisons as naive treatment effects (NTEs). The (unadjusted) timeperiod NTE, comparing groups A and B, is defined as Under the genotype comparison, which compares groups B and D, the NTE is Adjusted comparisons are considered below. The above NTEs only correspond to the CTE under the assumption that E(Y X=0 |P = 1, G = 1) is equal to either E(Y|P = 0, G = 1) or E(Y|P = 1, G = 0). This assumption is strong, and considerations of its plausibility can be aided by thinking of the 3 scenarios depicted by the DAGs and corresponding SWIGs in Figure 1. We use these causal diagrams to outline the conditions under which the NTEs correspond to the CTE. When they do not, we describe how NCOs can be used in combination with a difference-in-differences analysis to estimate the CTE. Scenario 1: G and P are conditionally independent of Y given X In DAG A ( Figure 1A), receiving treatment X depends deterministically on G and P, but Y is independent of G and P conditional on X. From the corresponding SWIG ( Figure 1D), we see that Y X=0 is independent of G and P.
. It follows that NTE P and NTE G correspond to the CTE.

Scenario 2: G and P are conditionally independent of Y given X and H
In DAG B ( Figure 1B), measured covariates H (baseline health status) are included that affect Y and are also dependent on G and P. We see from the corresponding SWIG ( Figure 1E) that Y X=0 is not independent of G and P; hence, neither NTE P nor NTE G corresponds to the CTE. However, Y X=0 is conditionally independent of G and P given H, for any p, g. Using this, and standardizing to the distribution of H in group B (P = 1, G = 1), the second expectation of the CTE can be written as In all groups except B, Y X=0 = Y, allowing us to write Therefore, the following adjusted NTEs correspond to the CTE under DAG B ( Figure 1B):

NTE
These results extend to continuous and multivariable H. For these adjusted effects to be estimable requires an assumption that there is overlap in the distribution of H in groups defined by G and P. Under the assumption that the effect of H on Y is not modified by G or P, the adjusted NTEs can be expressed as conditional differences in expectations (for all h): Scenario 3: G and P are not conditionally independent of Y In DAG C ( Figure 1C), there is dependence between Y and G, P conditional on X. In an extended version of DAG C in Web Figure 2, we add covariates H. The main issue is encompassed in DAG C ( Figure 1C), and we focus on this here. The corresponding SWIG ( Figure 1F) shows that Y X=0 is not independent of G and P. Now, the unadjusted NTEs (equations 4 and 5) do not correspond to the CTE. If there remains dependence between Y and G, P after conditioning on H as well as X, the adjusted NTEs (equations 8 and 9) also do not correspond to the CTE.
NCOs are tools for detecting bias due to unmeasured confounding and other sources in observational studies, and they are defined as outcomes that are not affected by the treatment but have the same associations with other variables as the true outcome of interest (15). The difference-indifferences approach can also be used to estimate treatment effects in the presence of unobserved confounding. Sofer et al. (16) showed the link between NCOs and differencein-differences analysis, using pretreatment outcome as the NCO. Here we use these tools to detect bias in the NTEs and estimate the CTE under certain assumptions. While these tools have previously been discussed primarily in the context of addressing unmeasured confounding, we use them instead to address bias due to dependence between Y X=0 and G, P. By noting that NTE G can equivalently be written as NTE G = E(Y|P = 1, X = 1) − E(Y|P = 1, X = 0), it is clear that G is uncontrolled in this difference, and similarly that P is uncontrolled in NTE P . This can be considered a form of unmeasured (or uncontrolled) confounding, as there are backdoor paths from X to Y through G and P that cannot be blocked using a standard analysis because of lack of positivity. The uncontrolled confounder G or P takes the role of an unmeasured confounder (U in Sofer et al. (16)). We consider 2 NCOs that detect the bias in NTE G and NTE P due to uncontrolled confounding.
We begin by considering the outcome observed in period P = 0 (groups A and C) as the NCO. Any difference between E(Y|P = 0, G = 1) (equivalently E(Y|P = 0, X = 1)) and E(Y|P = 0, G = 0) (equivalently E(Y|P = 0, X = 0)) cannot be due to treatment, because the outcome measure preceded the treatment. We define the "genotype negative control effect (NCE)" as (12) A nonzero NCE G would indicate that the estimate of NTE G is due not only to treatment but also to dependence between Y X=0 and G. As in the description of Lipsitch et al. (15) NCE uses the treatment X and assesses its association with the NCO.
Under certain assumptions, the CTE can be written in terms of the NTE and NCE. The CTE can be written as the difference in differences: The first difference can be written E( Under this model, the second difference in the CTE in It follows that under the assumption that γ PG = 0 (i.e., if treatment were set to 0, there would be no product term P × G in the model for Y X=0 ), we have Under this assumption, the second difference in equation 13 is NCE G and the CTE can be written as NTE G − NCE G . We call this the negative-control-corrected treatment effect (NCCTE): An alternative NCO is the outcome observed in genotype group G = 0. This differs from using the outcome in period 0 as an NCO, as it does not involve an outcome that can be observed in all individuals. Instead of using the outcome measured in 1 time period in individuals in both genotype groups, it makes use of outcomes measured in 2 time periods in individuals with G = 0. Because the treatment is not given in either time period in the G = 0 group, we do not assess the association between the treatment and the NCO in this case. However, we show that this NCO can be used to obtain an estimator of the CTE under the same assumptions as those used above. We define the "time-period NCE" as the contrast between the expected outcomes in groups C and D: A nonzero NCE P indicates that the estimate of NTE P is due not only to treatment but also to dependence between Y X=0 and P. The CTE can be expressed in terms of another difference in differences: Assuming γ PG = 0 in equation 14, the CTE can be written as NTE P − NCE P , and we define We have shown how a difference-in-differences approach to estimating the CTE corresponds to using an NCO to detect bias in the NTE when there is a violation of the positivity assumption. The NCCTEs correspond to the CTE under weaker assumptions than the NTEs. In our situation there are 2 possible NCOs, corresponding to different differencein-differences formulae for the CTE (equations 13 and 18). Figure 2 shows reformulations of DAG C ( Figure 1C), such that the outcome is shown separately by time period or genotype group. Figure 2A corresponds to the DAG of Sofer et al. (16), with G playing the role of the unmeasured (or uncontrolled) confounder U. This illustrates that the outcome in period 0 is not affected by treatment but the 2 outcomes share the same association with G. Our second NCO is illustrated in Figures 2B and 2C, where the outcome in group G = 0 ( Figure 2B We have discussed NCCTEs in the context of DAG C (Figure 1(C)). In practice, it is not known which scenario we are in. NCOs are a way of investigating the validity of the NTE as an estimate of the CTE and, combined with the difference-in-differences analysis, of correcting for bias in the NTE. In scenarios 1 and 2, the NCE is null.
An extended scenario of interest is one in which the effects of G and P on Y are partially mediated through measured covariates H (Web Figure 2). The arguments using NCOs and difference-in-differences analysis can be extended to incorporate adjustment for H (Web Appendix 2). Adjusting for H could result in more efficient estimates of the CTE.

ANALYSIS
Our aim is to estimate the causal effect of ivacaftor on those eligible to receive it in the post-ivacaftor period using longitudinal data from the UK Cystic Fibrosis Registry. We estimate NTEs, NCEs, and NCCTEs using the time-period and genotype comparisons. We focus here on the continuous outcome ppFEV 1 . For the count outcome, IV days, the treatment effect is quantified by rate ratios after 1, 2, and 3 years of treatment (Web Appendix 1).
The outcome is measured at up to 4 annual review visits (j = 0, 1, 2, 3) per individual in a given period (P = 0, 1). Figure 2. Reformulation of the directed acyclic graph (DAG) in Figure 1C showing the negative control outcomes (NCOs). A) Use of the outcome in period 0 as the NCO. Panels B and C show the DAGs in genotype groups G = 0 and G = 1, respectively, when using the outcome in group G = 0 as the NCO. In panel A, Y P=0 and Y P=1 denote the pre-and posttreatment outcomes, respectively, and treatment X occurs after Y P=0 . In panel A, G is an uncontrolled confounder and is equivalent to the unmeasured confounder U in the work of Sofer et al. (16). In panel C, P is the uncontrolled confounder.
Let Y ij denote the outcome measured for individual i at visit j in a given period. For most individuals in the analysis data set, j = 0 corresponds to the year 2009 for P = 0 and 2013 for P = 1. X i denotes the treatment indicator. The observed treatment status is X i = 1 for group B and X i = 0 for groups A, C, and D. We estimate a treatment effect with 2 components, a step-change (St) effect and a slope-change (Sl) effect. The analysis model is where represents the slope-change effect. Each NTE and NCE comprises a step-change effect and a slope-change effect. NTE P is estimated by fitting the model in groups A and B (X corresponds to P), and NTE G is estimated by fitting the model in groups D and B (X corresponds to G). To estimate the NCEs, the treatment status in one of the groups is switched for the analysis: NCE P uses groups C and D, setting X i = 1 for group D (so X corresponds to P), and NCE G uses groups C and A, setting X ij = 1 for group A (so X corresponds to G). Each model is fitted using generalized estimating equations, assuming an independence working correlation matrix. Models are refitted with adjustment for variables H i ( Table 2) measured in the year prior to visit 0 in each period, giving adjusted NTEs, NCEs, and NCCTEs (see Web Appendix 2). Nonparametric bootstrapping with 1,000 resamples was used to obtain 95% confidence intervals (CIs) and P values. Results for ppFEV 1 are shown in Figure 3 (Web Table 1). We focus on the adjusted analysis. The unadjusted results are qualitatively similar, with wider 95% CIs. First consider the step-change effect. In the time-period comparison, the NTE (NTE Con P ) estimates a 7.27-percentage-point absolute increase in ppFEV 1 (95% CI: 5.87, 8.57) in the ivacaftor group. The corresponding NCE (NCE Con P ; see Web Appendix 2) estimates a 0.77-percentage-point increase (95% CI: 0.44, 1.08), indicating a small improvement in mean absolute ppFEV 1 in the post-ivacaftor period in the G = 0 group. The resulting NCCTE Con P estimates a 6.50-percentage-point increase (95% CI: 5.06, 7.85) in ppFEV 1 . In the genotype comparison, NTE Con G is estimated to be 6.22 (95% CI: 5.17, 7.24) and NCE Con G is estimated to be −0.37 (95% CI: −1.36, 0.65), resulting in an NCCTE Con G estimate of a 6.59percentage-point increase in ppFEV 1 (95% CI: 5.22, 7.90). The NCE indicates that in the pre-ivacaftor period, mean ppFEV 1 was slightly lower in persons with the ivacaftoreligible genotype than in those ineligible.
Results for 2 other measures of lung function (percent predicted forced vital capacity, percent predicted forced midexpiratory flow) (Web Tables 3 and 4) were similar to those for ppFEV 1 . We conducted sensitivity analyses restricting the G = 0 genotype group to patients who were either heterozygous or homozygous for the phenylalanine 508 deletion (F508del) (Web Tables 5-10) or only those patients who were homozygous for F508del (Web Tables  11-16), corresponding to 90% and 54% of the original G = 0 group, respectively. The results showed no substantial differences.

DISCUSSION
We have shown how NCOs can be used to assess whether a control group is suitable for estimating the treatment effect in a group where everyone receives treatment, and how they can be used in combination with the difference-indifferences approach to provide a more robust treatment effect estimate. Previous descriptions of NCOs have focused on unmeasured confounding bias (15)(16)(17)(18)(19). Potential bias in our situation is due to an inability to block all paths from X to Y using a standard analysis, which could be considered a form of unmeasured confounding. A key assumption of our methods is that there is no genotype × period product term in the model for the counterfactual outcome under no treatment (equation 14). This is a strong assumption that is not verifiable using the data, though it is weaker than the assumptions made when using NTEs. NCCTEs also provide unbiased treatment effect estimates in further scenarios-for example, allowing for unmeasured variables U affecting H and Y (Web Figure 3). It is of interest to investigate how the assumption made in our difference-in-differences analyses using NCOs could be relaxed. One approach could be the use of synthetic control methods, which make use of preand postintervention observations in the group receiving the intervention, and observations in multiple time periods for groups that have not received the intervention (20,21).
Our NTE estimates for the effect of ivacaftor are similar to results from previous studies, which found that ivacaftor results in a step-change absolute improvement in ppFEV 1 of 3.2-8.2 percentage points and a decrease in the rate of annual ppFEV 1 decline of approximately 0.8 percentage points (6,9). However, these are only unbiased estimates of the treatment effect if the assumptions that G and P are conditionally independent of Y given X (or X and H) are valid.
The time-period comparison NCE showed a 0.77percentage-point absolute increase in ppFEV 1 , indicating a small, non-clinically significant improvement in population average lung function since the introduction of ivacaftor. This means that the NTE slightly overestimates the ivacaftor effect. In the genotype comparison, the NCE was negative, indicating slightly lower ppFEV 1 in the eligible genotype group versus the ineligible group in the pre-ivacaftor period, but the NCE was also small, as it was in the timeperiod comparison. This resulted in NCCTE estimates of  Figure 3. Estimated naive treatment effect (NTE), negative control effect (NCE), and negative-control-corrected treatment effect (NCCTE) of ivacaftor on percent predicted forced expiratory volume in 1 second (ppFEV 1 ) among persons with cystic fibrosis, using the time-period comparison (unadjusted: NTE P , NCE P , and NCCTE P ; adjusted: NTE Con P , NCE Con P , and NCCTE Con P ) and the genotype comparison (unadjusted: NTE G , NCE G , and NCCTE G ; adjusted: NTE Con G , NCE Con G , and NCCTE Con G ), UK Cystic Fibrosis Registry, 2008-2016. A) Absolute step change in ppFEV 1 (β St ); B) absolute change in the annual ppFEV 1 slope (β Sl ). In the adjusted analysis, results were adjusted for the baseline variables: sex, age, ethnicity, smoking status, cystic-fibrosis-related diabetes, ppFEV 1 , annual number of days of intravenous (IV) antibiotic use (i.e., IV days) (including an indicator of a nonzero count and a linear term for the nonzero counts), use of mucolytic treatment, and bacterial infection. Bars, 95% confidence intervals (CIs).
step-change improvement in ppFEV 1 of 6.50% and 6.59%, which are similar to the NTE estimates (7.27% to 6.22%) but have wider CIs, correctly reflecting uncertainty in the comparability of the groups.
When considering ivacaftor's effect on the slope change of lung function, the NCE suggested that some of the NTE estimate was due not to ivacaftor but to general improvements in lung function decline over time. The NCCTE suggests  Figure 4. Estimated naive treatment effect (NTE), negative control effect (NCE), and negative-control-corrected treatment effect (NCCTE) of ivacaftor on annual number of days of intravenous (IV) antibiotic use (i.e., IV days) among persons with cystic fibrosis, using the time-period comparison (unadjusted: NTE P , NCE P , and NCCTE P ; adjusted: NTE Con P , NCE Con P , and NCCTE Con P ) and the genotype comparison (unadjusted: NTE G , NCE G , and NCCTE G ; adjusted: NTE Con G , NCE Con G , and NCCTE Con G ), UK Cystic Fibrosis Registry, 2008-2016. A) Year 1 (exp(γ X1 )); B) year 2 (exp(γ X2 )); C) year 3 (exp(γ X3 )). The adjusted analysis adjusted for the following baseline variables: sex, age, ethnicity, smoking status, cystic-fibrosis-related diabetes, percent predicted forced expiratory volume in 1 second, IV days (including an indicator of a nonzero count and a linear term for the nonzero counts), use of mucolytic treatment, and bacterial infection. Bars, 95% confidence intervals (CIs). a beneficial effect of ivacaftor, with an estimated absolute improvement in annual rate of decline of 0.49%, though with 95% CIs including 0.
Findings for the ivacaftor effect on rate of IV days were similar to those from previous studies, indicating a treatment benefit that persists at least up to 3 years. The NCE results estimated that in the absence of treatment, the rate of IV days was slightly lower in the G = 1 group versus the G = 0 group and in the later time period. NCCTE estimates were therefore slightly attenuated in comparison with the NTEs. Our results support evidence of a long-term clinical benefit of ivacaftor.