Bridging disconnected networks of first and second lines of biologic therapies in rheumatoid arthritis with registry data: Bayesian evidence synthesis with target trial emulation

Objective: We aim to utilise real world data in evidence synthesis to optimise an evidence base for the effectiveness of biologic therapies in rheumatoid arthritis in order to allow for evidence on first-line therapies to inform second-line effectiveness estimates. Study design and setting: We use data from the British Society for Rheumatology Biologics Register for Rheumatoid Arthritis (BSRBR-RA) to supplement RCT evidence obtained from the literature, by emulating target trials of treatment sequences to estimate treatment effects in each line of therapy. Treatment effects estimates from the target trials inform a bivariate network meta-analysis (NMA) of first and second-line treatments. Results: Summary data were obtained from 21 trials of biologic therapies including 2 for second-line treatment and results from six emulated target trials of both treatment lines. Bivariate NMA resulted in a decrease in uncertainty around the effectiveness estimates of the second-line therapies, when compared to the results of univariate NMA, and allowed for predictions of treatment effects not evaluated in second-line RCTs. Conclusion: Bivariate NMA provides effectiveness estimates for all treatments in first- and second-line, including predicted effects in second-line where these estimates did not exist in the data. This novel methodology may have further applications, for example for bridging networks of trials in children and adults.


Introduction
The evidence base for health care decision making traditionally consisted of data from randomised controlled trials (RCTs), considered as a gold standard in evaluation of health technologies. In recent years, there has been growing interest in the use of real world data (RWD) from observational studies in health care evaluation. Routinely collected data, from electronic health records or patients' registries, can provide useful information about effectiveness of treatments, where data from RCTs may be sparse or are not available at all for some treatment comparisons. Considerable methodological research has focussed on inclusion of RWD in evidence synthesis with the aim of overcoming some limitations of RCT data [1][2][3]. The focus of such research has been particularly in circumstances where RCT evidence was sparse and combining RCT data with RWD aimed to increase the evidence base to improve the precision of effectiveness estimates [4] and sometimes bridge disconnected networks.
Whilst research to date has largely focussed on exploitation of RWD to mimic or replicate RCT data [5][6][7], we take a step further to explore use of RWD in a scenario of data generation not typical for the RCT setting. In this paper, we explored how RWD can be used to optimise an evidence base by using evidence on first-line therapies to inform second-line effectiveness estimates in evidence synthesis. When data from RCTs are available on effectiveness of a particular treatment, but only in the first line of therapy, a costly trial needs to be undertaken to also evaluate the effectiveness of the new therapy used in patients as a second line treatment (or vice versa). We investigated the added value of registry data, which provides evidence on both first and second lines in each individual, when amalgamating these data in a network of RCTs for both lines of therapies. We developed this approach for incorporating RWD into clinical and HTA decision-making using a case study in rheumatoid arthritis (RA).
We made use of data from the British Society for Rheumatology Biologics Register for Rheumatoid Arthritis (BSRBR-RA) to supplement the RCT evidence available only for either first-or the second-line of therapy. We did so by emulating target trials using the approach developed by Hernán and Robins [7]. We estimated treatment effects of biologic therapies based on the data in emulated target trials, which we then used to inform a bivariate network meta-analysis (NMA) model of first-and second-line treatments. The estimates from the registry data were used to "bridge" disconnected networks for the two lines of therapy. The American College of Rheumatology response criteria (ACR20) was used as an outcome measure.
The remainder of this paper is structured as follows. Data sources and statistical methods are described in Section 2. The results are presented in Section 3, which are followed by discussion and conclusion in Section 4.

Data sources
2.1.1 Summary data from randomised controlled trials Summary data from a literature review of RCTs of biologic therapies in patients with rheumatoid arthritis were obtained for the effectiveness of adalimumab, etanercept, infliximab, golimumab, abatacept and rituximab used as first-line biologic therapies (in biologic naive patients) and the effectiveness of golimumab and rituximab used as second-line biologic therapies in patients who switched from a previous biologic treatment. Data were obtained from 20 trials including 18 for the first-line treatments and two trials for second-line treatments. When constructing a network, placebo arms with methotrexate as concomitant therapy and the arms including a combination of methotrexate and placebo were treated as the same treatment arm in the network. Methotrexate, used in many trials as part of the combination therapy in the biologic arm, was ignored (for some studies methotrexate was included as concomitant therapy where percentage of patients with addition of methotrexate varied across studies, similarly as in the BSRBR-RA target trials).

Registry data
We made use of data from the British Society for Rheumatology Biologics Register for Rheumatoid Arthritis (BSRBR-RA) to supplement randomised trial evidence. Whilst RCTs included only either first-or second-line therapy, registry data provided evidence on both lines of therapy for each patient. BSRBR-RA data consisted of 19410 individuals, 15636 of whom had data recorded on biologic treatment. The data were used to emulate trials of both lines of therapy.

Emulation of target trials
We used the BSRBR-RA data to emulate a series of trials of first-and second-line treatments for a range of biologic therapies using a target trial approach [7]. In the first instance we specified the key components of the target trial protocol, which (following recommendation by Hernán and Robins [7]) included: eligibility criteria, treatment strategies, assignment procedures, the follow-up period, outcome, causal contrast, and statistical analysis. Note that no RCT included both lines of therapy in sequence, whilst the proposed protocol of the target trial did include the treatment sequence. Therefore, we did not aim for the emulated target trials to resemble any existing RCT (an approach previously used in target trial emulation).

Eligibility criteria
Study participants were 18 years of age or older, who had a diagnosis of RA. Patients who were treated with a biologic disease-modifying antirheumatic drug (DMARD) prior to the registration with the BSRBR-RA were excluded.

Treatment strategies
Patients had to have received at least two lines of therapy, which could be any of the biologic DMARDs or methotrexate, which is a synthetic DMARD often used as a combination therapy and/or control treatment in trials of biologic therapies in RA patients. Data from patients who switched from first line biologic therapy to no therapy (or to therapies that are neither biologic DMARDs nor methotrexate) were not included.

Assignment procedures
Patients were grouped into treatment arms according to the sequence of treatment in two lines of therapy. These groups of patients (sequence treatment arms) were matched to form experimental and control treatment groups. Matching was conducted based on size of the trial, ensuring well balanced treatment contrasts, with methotrexate always taken as the control treatment and rituximab as an experimental treatment. Other biologic therapies could be used as either experimental treatment or control. The matching procedure had to ensure unique treatments in experimental and control arms for each line. The process is schematically described in Figure 1. This procedure resulted in target trials of two lines of therapy recorded on the same patients who switched treatment in both treatment arms. For example, patients in the experimental arm receiving first line adalimumab switched to infliximab and those in the control arm receiving first line etanercept switched to methotrexate, thus resulting in the first line comparison of adalimumab versus etanercept and in the second line comparison of infliximab versus methotrexate. Since patients were not randomly allocated, we assumed no unmeasured confounding at baseline conditional on a number of prognostic factors, measured at baseline or initiation of each treatment, that could influence the response. The prognostic factors included age, gender, duration of the disease and a number of clinical measures related to the disease activity including the number of tender and swollen joints, serology (being positive for rheumatoid factor), acute-phase reactants (CRP and ESR) and 28-joint count disease activity score (DAS-28).

The follow-up period
The minimum follow up time had to ensure that data were collected 24 weeks after initiation of each line of therapy. Start of the second-line therapy varied depending when patients needed to switch to second-line treatment, which was typically due to either lack of response or adverse reactions.

Outcome
Patients were assessed according to ACR20 response criteria, which classified them as responders if they had at least 20% improvement according to ACR criteria. Due to a large number of missing values on some of the components of ACR within BSRBR-RA data (the register did not capture patient pain or physician global score), the definition of response was relaxed allowing patients to be classed as responders if they had at least 20% improvement in at least one of the joint count components (tender or swollen joint count) and at least one of the remaining five components of the ACR measure (physician global assessment, patient global assessment, pain, HAQ, ESR (or CRP)) [8].

Causal contrast and statistical analysis
Baseline characteristics for each group were summarised to ensure that the covariates were similarly distributed across the treatment arms. The numbers of responders were then adjusted for covariates using regression adjustment in each line of therapy. Logistic regression was applied to the data with baseline characteristics and treatment as covariates. We estimated the per-protocol effect in all emulated trials.

Bivariate network meta-analysis
We used bivariate NMA to model jointly the treatment effects on ACR20 for first-and second-lines of therapies. A standard approach to any multivariate meta-analysis is to use a hierarchical model with a multivariate normal distribution used to describe variability at two levels: within-study (where the correlation occurs due to the modelled multivariate quantities, such as treatment effects on multiple outcomes, being measured in the same individuals) and between-studies (where the correlation is a result of heterogeneity between the average effects, measured on each outcome in each study, varying across studies due to, for example, differences in population or treatment doses). Accounting for the within-study correlation is important in such analysis [9]. However, modelling jointly non-normal outcomes, such as Binomial responses, would require transforming data, which can lead to biased results [10]. Papanikos et al, carried out a simulation study showing that, when the within-study correlation is weak, a multivariate meta-analysis model with independent binomial likelihoods is preferable [11]. Exploratory analysis of the BSRBR-RA data set, estimating the within-study correlation using the bootstrapping approach [12,13], showed that the within-study correlation between the treatment effects for the two lines of therapy transformed onto the log odds ratio (OR) scale was close to zero. We, therefore, adapted the approaches to multivariate/bivariate NMA by Achana et al. [14] and Bujkiewicz et al. [15] by assuming independent binomial likelihoods at the within-study level, as in Papanikos et al [11], to model the number of responders r ijk to treatment k in line of therapy j = 1, 2 out of the number of participants n ik in study i, with the probability of response denoted by p ijk ; In this hierarchical model, θ ijk denotes log odds of response to treatment k in line j and study i, µ ijb is the baseline treatment effect in study i line j, and δ ij,bk are true (random) treatment effects in each study i comparing treatment k with the baseline treatment b in line j. We assume that the true effects follow a bivariate normal distribution that is common to the studies investigating the same treatment contrast bk with mean effect d j,bk and the between-studies covariance matrix T (here assumed homogeneous across the treatment contrasts). The network structure of the data is taken into account by assuming that the pooled effects d j,bk for each contrast bk and treatment line j = 1, 2 satisfy the consistency assumption (2), where d j,1k denote the basic parameters (average treatment effects of each treatment k = 1, . . . , n t in the network relative to the reference treatment 1 and treatment line j = 1, 2, with n t denoting the number of treatments in the network. Prior distributions are placed on the between-studies heterogeneity parameters τ j ∼ U (0, 2), the between-studies correlation ρ = r * 2−1, r ∼ Beta(1.5, 1.5), the baseline effects µ ijb ∼ N (0, 10 3 ) and the basic parameters d j,1k ∼ N (0, 10 3 ). When the between-studies correlation is zero, the model reduces to two univariate models for the two outcomes modelled independently; with δ ij,bk ∼ N ((d j,1b − d j,1k ), τ 2 j ), j = 1, 2. To predict treatment effect in the second line when data are only available for the therapy in first line, additional assumptions of exchangeability need to be made, where instead of placing a prior distributions on basic parameters, we add another level of hierarchy to the model as in Bujkiewicz et al [15]. For each treatment arm k; ancillary parameters ϑ jk for the two treatment lines j = 1, 2, such that d j1k = ϑ jk − ϑ j1 , are assumed exchangeable (for the biologic therapies only) and correlated: for k = 2, . . . , n t (across the biologic therapies only). Prior distributions are placed on the parameters ω j ∼ U nif (0, 2) and ρ t = r t * 2 − 1 with r t ∼ Beta(1.5, 1.5).

Summary of data and the network structure
Summary data were obtained from 20 RCTs of biologic therapies with 18 trials for first line treatment (including adalimumab, etanercept, infliximab, golimumab, abatacept and rituximab) and two for second-line treatment (including golimumab and rituximab). BSRBR-RA data included 12657 individuals given first line biologic at the time of registration. Follow up data included 112983 observations, which was on average 8.93 follow-ups per individual. For a large proportion of the visits, methotrexate was recorded as a concomitant therapy to a biologic treatment. Target trial emulation using the BSRBR-RA data led to generation of six trials of biologic therapies in two lines.  Figure 2a shows the network structure of RCT data for the first and second lines of therapy and Figure 2b illustrates the network structure of target trials emulated from BSRBR-RA data for both lines of therapy. For the target trials, both treatment lines correspond to the same trial, in contrast to the RCTs which report only either first or second line of treatment. To emphasise this in Figure  2, we used the same colour of the network edges for both treatment lines for the target trials, in contrast to the RCTs where different colours of edges for different treatment lines represent different trials. In this paper, we aimed to demonstrate the value of the registry data in estimating the effect of the biologic therapies when used as second-line treatments. The network of RCT data for the second-line therapy was particularly sparse, including only two trials; for golimumab and rituximab. BSRBR-RA data gave additional information about adalimumab, etanercept, infliximab as well as rituximab used as second-line therapies. The network structure for RCT and BSRBR-RA data combined is shown in Figure 2c.

Results of network meta-analyses
Results of two univariate NMAs of biologic therapies used as second-line treatments are shown in Table 2, where the lower triangle includes the results from the NMA using data from RCTs alone and the upper triangle shows results of NMA combining data from RCTs and BSRBR-RA register. Including the registry data allowed for estimation of treatment effects for second line biologic therapies which were not included in the RCT network. There was also some improvement in the precision of treatment effect estimates for those already included in the RCT data.    Table 2: Results of a univariate NMA of biologic therapies used as second-line treatments using data from RCTs alone (lower triangle) and combining data from RCTs and BSRBR-RA registry (upper triangle).
The results of a bivariate NMA combining data from RCTs and BSRBR-RA of biologic therapies in both lines of therapy are shown in Table 3, with the results using the "standard" bivariate NMA model in the upper triangle and the results from the analysis assuming exchangeability of biologic therapies in the lower triangle. Combining data from the first and second lines of therapy through the use of the bivariate NMA led to a further decrease in uncertainty for most of the individual treatments when compared to the results of the univariate NMA of second-line therapy alone. For example, ACR20 response to adalimumab vs methotrexate from the bivariate NMA was OR=4.36 (0.67, 15.5) compared to OR=4.69 (0.6, 17.5) from the univariate NMA.
The bivariate NMA approach assuming the additional exchangeability of the absolute effects of the biologic therapies allowed for predictions of treatment effects that had not been evaluated in trials in a second-line setting. In this case, it produced effectiveness estimates for abatacept in the second line of therapy against all other treatments in the network. Moreover, this additional exchangeability led to a noticeable reduction in uncertainty around the remaining estimates of effect for other therapies. This was a result of additional borrowing of information. However, there may have been some degree of smoothing of the effects across the biologic therapies, which was difficult to judge due to the large uncertainty. A sensitivity analysis was carried out using a t-distribution in place of the normal distribution in model (3), which largely produced very similar results but inflated the uncertainty around the effectiveness estimate for abatacept in the second line.

Discussion and conclusions
In this paper, we provide a conceptual approach for using RWD, such as from registries or electronic health records, to generate estimates of effectiveness of treatments in first and second lines of therapy and combining them with RCT data to enhance the evidence base and provide effectiveness  estimates of therapies in the second line, where data on effectiveness in the second line are not available from RCTs. In such circumstances, producing these estimates would require conducting expensive and time-consuming additional clinical trials. The proposed approach can be used to carry out feasibility analysis or provide inputs to the trial design or even be used for evidence based decision-making where evidence is sufficiently robust. When carrying out this research, we came across a number of limitations. Some of them were related to data. In particular, the RCT data were relatively sparse with a star-shaped network for the first line treatments and only two trials reporting the effectiveness of biologic therapies in the second line. The data set was simplified by combining the control arms (including methotrexate as either combination therapy or concomitant therapy with placebo) into the same control arm denoted as methotrexate. This was done to strengthen the network structure to better illustrate the methodological aspect of this work. Most of the biologic arms also included methotrexate. Considering that for a large proportion of visits in the BSRBR-RA data methotrexate was recorded as a concomitant therapy to a biologic treatment, an assumption was made that a large proportion of patients receiving biologic therapy, across all studies, also received methotrexate. The registry data set contained a substantial amount of missing data, in particular for some of the components of the ACR20 response criteria, which was not due to the issues of quality of the data but owing to the fact that some of the components are not routinely collected by the register. In order to estimate the response to the biologic therapies, we chose to relax the definition of the response. In addition, the register only captures 28 joint counts, which may be different from some of the trials. Considering these potentially strong assumptions around the data sources, the results of our analysis should not be used for clinical interpretation, but only as an illustration of the proposed methodology.
There were only six target trials generated from the registry data, which resulted in substantial uncertainty around the between-studies correlation, as these were the only studies contributing data to estimating the correlation. The combined network was still limited with lack of data on each contrast and line across study designs. Target trial data were incorporated in the NMAs at face value, assuming they were equivalent with RCT data. Extensions of the analysis could include a power prior approach [16], allowing for down-weighting of RWD, or hierarchical modelling to differentiate between the two study designs [4]. Further investigation into data scenarios and model assumptions needs to be carried out to understand when this framework can be most efficient.

Conclusions
Registry data can be used to bridge networks of first and second lines of therapy which are disconnected when using RCT data alone. Bivariate NMA of combined data from RCTs and RWD can be used to predict effectiveness of a treatment in second line use when the therapy is only investigated in a RCT as first line (or vice versa). The approach can be applied to other settings where RCT data are available for disjoint subsets of population, such as, for example, children and adults and registries may provide data covering follow-up period from childhood to adulthood for each individual.

Contributions
SB and KRA conceived the research idea. SB conceptualised the study. SB, JS, LW, DJ and KH curated the data. SB and JS undertook formal analyses. KH provided clinical input. SB and KRA provided supervision for JS, LW and DJ. SB contributed the original draft of the manuscript and visualisation. All authors contributed to manuscript revisions.