Evidence Synthesis Methods to Estimate Subgroup Effects for Distributional Cost-effectiveness Analysis of New Treatments

Distributional cost-effectiveness analysis (DCEA) is an extension of conventional cost-effectiveness analysis to quantify health equity impacts. Although health disparities are recognized as an important concern, the typical analyses conducted to inform health technology assessment of a new intervention do not include a DCEA. One of the reasons brought forward is the relative sparseness of the available evidence for a new intervention. The objective of this paper is to review advanced evidence synthesis methods to estimate subgroup specific treatment effects relevant for a DCEA of new interventions. The paper will outline the evidence needs and gaps, present alternative evidence synthesis methods followed by an illustrative example, and conclude with some practical recommendations. Evidence challenges for estimating relative treatment effects are due to lack of inclusion of relevant subgroups in the randomized controlled trials (RCTs), lack of access to individual patient data, small subgroups resulting in uncertain effects, and reporting gaps. Evidence synthesis methods can help overcome evidence gaps by considering all relevant direct, indirect, and external evidence simultaneously. Methods of potential relevance include (network) meta-analysis with shrinkage estimation, conventional (network) meta-regression analysis, multi-level (network) meta-regression analysis, and generalized evidence synthesis. For a new intervention for which only RCT evidence is available and no real-world data, estimates can be improved if the assumption of exchangeable subgroup effects or the shared or exchangeable effect-modifier assumption among competing interventions can be defended. Future research is needed to assess the pros and cons of different methods for different data gap scenarios.


INTRODUCTION
Health technology assessment (HTA) can help decision-makers with creating policies to ensure appropriate and efficient use to achieve optimal health outcomes.Health disparities are an important concern.However, the typical analyses conducted to inform HTA do not include a formal evaluation of the impact a new intervention will have on health equity, despite the availability of a quantitative framework to do so: distributional cost-effectiveness analysis (DCEA) [1][2][3][4][5][6][7][8][9][10].
A DCEA of a new intervention may be challenging given the relative sparseness of the available evidence at the time of market introduction.Clinical trials, which are primarily designed for regulatory approval, frequently do not have the power or are not designed to estimate the required treatment effects by relevant subgroups that are of interest for a DCEA.In the last decade, evidence synthesis techniques have been developed to combine direct, indirect, and external evidence, as well as to combine RCT evidence with observational evidence.These methods can potentially be used to improve subgroup specific treatment effects for new interventions in the face of evidence challenges to facilitate model-based DCEA [11][12][13][14][15][16][17][18][19][20][21][22][23][24].
The objective of this paper is to review advanced evidence synthesis methods to estimate subgroup specific treatment effects relevant for DCEA of new interventions.The paper will first outline the evidence needs and challenges for a model-based DCEA.Next, the alternative evidence synthesis methods will be summarized, followed by an illustrative example.The paper will conclude with some practical recommendations.

EVIDENCE NEEDS AND CHALLENGES FOR MODEL-BASED DISTRIBUTIONAL COST-EFFECTIVENESS ANALYSIS
DCEA aims to provide decision-makers with information to make trade-offs between improving total health and reducing health inequality [1][2][3][4].For each competing intervention for a given condition, the expected health outcomes (e.g.quality-adjusted life years (QALYs)), costs, and the net health benefit as a measure of cost-effectiveness factoring in opportunity costs (NHB = QALYs -costs/willingness-to-pay) is estimated as well as the distribution of NHBs across the different social subgroups across which you want to analyze health equity (Figure 1).For the remainder of this paper we use the term equity relevant subgroups to refer to these social subgroups, which can be defined according to sex, gender, race and ethnicity, education, socioeconomic status, or geographic location (urban or remote).With information on how adverse society (or a decision-maker) is to health inequality (and expressed with, for example, the Atkinson inequality aversion parameter), we can quantify the health equity impact (expressed with an index measure) from an increase (decrease) in inequality in the distribution of NHB across the equity relevant subgroups due to implementation of the interventions.The equity impact metric can be combined with the total NHB (the cost-effectiveness estimate) in an overall equity-weighted measure or social welfare index, e.g.equally distributed equivalent QALYs, that combines concern for both equity and cost-effectiveness [2,4].
[Insert Figure 1] For a DCEA we need credible estimates regarding the expected outcomes and costs of interest for the equity relevant subgroups for the alternative interventions being compared.In the absence of single empirical study that provides all this information directly, we need to use health economic models where we combine multiple sources of relevant evidence on the natural course of disease or outcomes with a reference treatment, relative treatment effects for alternative interventions, resource use, costs, and utility estimates for the different disease states.In Figure 2, a simple influence diagram of a fictitious, but representative, health economic model for a DCEA of cancer treatment is presented.It depicts the elements of the model (boxes) and assumptions which elements influence each other directly (arrows).With such a figure in mind, the equity-relevant subgroup-specific evidence needed for a DCEA can be identified more clearly.Under the simplest structural model assumption, we only need equity-specific subgroup evidence for the baseline and relative treatment effects and not for its functional parameters utilities and costs to facilitate the DCEA.However, it is required to relax this assumption when we have evidence that utility and other costs as a function of health status or outcomes vary by the equity-relevant subgroups.
[Insert Figure 2] For a conventional model-based cost-effectiveness analysis, standard practice is to estimate outcomes associated with the natural course of the disease or reference treatment, i.e. the baseline arm of the model, as well as relative treatment effects of the alternative interventions of interest versus no treatment or the reference intervention.The relative treatment effects are applied to the baseline arm to obtain estimates of the expected outcomes for each of the alternative interventions.For a DCEA we need these estimates to be representative of the equity relevant subgroups.An intervention will have an impact (positive or negative) on inequality in health outcomes when its relative treatment effects vary between the equity relevant subgroups (i.e.heterogeneous treatment effects) or when there are differences in baseline event rates across the equity relevant subgroups that are affected by treatment.To clarify: absolute differences in the event rates between the subgroups will be affected if a relative treatment effect expressed as a ratio measure (e.g.odds ratio, relative risk, or hazard ratio) is applied to the baseline risk, even if this relative treatment effect is homogenous across these subgroups.Heterogenous baseline and relative treatment effects are represented in Figure 2 with the green-colored arrows.Inequality in health outcomes is further impacted if there is unequal access or uptake across the equity relevant subgroups of the target patient population or if opportunity costs are not evenly divided over the social subgroups of interest.For this paper, however, we focus on estimating the equityrelevant baseline and relative treatment effects required for a model-based DCEA.
Randomized controlled trials (RCTs) are the preferred source of evidence for relative treatment effects.Since relative effects remain relatively stable from one study population to next (in contrast to absolute effects of a treatment), the low internal bias (i.e.absence of confounding or selection bias) in RCTs relative to observational studies outweighs any concerns about the external bias (i.e.applicability of estimates to different populations) of relative treatment effect estimates.A covariate is more likely to be a prognostic factor of an outcome than a relative treatment effect modifier because the impact of a covariate on outcomes cancels out for relative treatment effects.As such, we likely do not need to estimate relative treatment effects stratified to the same level of detail as the social subgroups of interest across which we want to analyze health equity because not all variables that define these subgroups are modifiers of the relative treatment effects.
The available RCTs for biopharmaceutical interventions have typically been designed to detect a relative treatment effect for the overall study population of interest to support regulatory approval.This will pose challenges for a DCEA when subgroup effects have not been reported or subgroup data are not available for levels of effect-modifiers relevant to the equity relevant subgroups of interest.Even if the available RCTs do provide information on relative treatment effects relevant for the subgroups of interest, the studies may not have been powered to detect these subgroup effects and the relative treatment effect estimates may be characterized by substantial uncertainty due to small sample sizes.Another potential evidence gap is that the RCT study population excludes certain equity relevant subgroups of interest.An additional evidence challenge, which applies to studies of both new and existing interventions, is that there is frequently no access to the individual level patient data (IPD) of the RCTs; there is only access to aggregate-level information from study reports or publications.This means that even if patient characteristics related to social and biological constructs have been recorded for individual patients in the data set, reported study results may not have been stratified by these characteristics.If there would have been access to IPD relative treatment effects for the equityrelevant subgroups of interest can be estimated [25].In this discussion, it is important to note that race is increasingly recognized as a poor surrogate of underlying social and biological constructs responsible for disparities in health outcomes.However, it may be all the information available in the RCTs [26][27][28].
Since absolute effects are likely to vary with the study population, the sources of evidence to estimate the baseline arm of the model should match the equity relevant subgroups of interest as much as possible.More specifically, the available evidence needs to match the subgroups of interest regarding prognostic factors if the baseline arm of the model reflects the natural course of disease in absence of treatment, and the available evidence needs to match the subgroups of interest regarding both prognostic factors and effect-modifiers if the baseline arm represents the reference treatment.Preferred evidence for the baseline arm comes from real-world cohort studies because of a more diverse and representative patient population and with possible access to IPD.
To summarize, evidence gaps and challenges to estimate baseline and relative treatment effects for the equity relevant subgroups can be characterized as follows, as depicted in Figure 3: 1) no evidence for some or all of the subgroups of interest due to exclusion of representative individuals from the studies; 2) lack of access to IPD, and aggregate-level information is not stratified by the subgroups of interest (e.g.results are provided for the combination of subgroup A and B, but not for A and B separately); 3) subgroup effects are uncertain due to small sample sizes; and 4) a combination of any these factors.All this with the notion that relative treatment effects need to be estimated for the different levels of the equity relevant effect modifiers, and the baseline effects need to be estimated for the equity relevant prognostic factors (and effect modifiers if the baseline arm of the model reflects the reference treatment).

EVIDENCE SYNTHESIS METHODS TO ESTIMATE TREATMENT EFFECTS ACROSS EQUITY RELEVANT SUBGROUPS
The fundamental premise of evidence synthesis is that each empirical study is a piece of a larger evidence base and its findings are interpreted as such.Each study evaluates a subset of all information of interest and by considering the findings of all relevant studies simultaneously we have a lot more information to estimate the parameters of interest.For example, under the assumption of consistency, study 2 and 3 in Figure 3 in combination informs the estimates for subgroup A. When we add study 4 in the synthesis we also get information for subgroups.
Adding study 5 results in more precision for the estimates for subgroup A, B, and C. In this scenario, study 2 provides direct evidence, and studies 3, 4, and 5 provide relevant indirect evidence.

Estimation of relative treatment effects
In the following sections, we will discuss evidence synthesis methods that can be relevant to estimate relative treatment effects for new and existing interventions for equity relevant subgroups.These methods are modifications of the standard network meta-analysis (NMA) approach and include shrinkage estimation, network meta-regression, IPD-level network meta-regression analysis, and multi-level network meta-regression-based methods [11,19,23,[37][38][39][40][41].
We will first provide a summary of the standard NMA methodology to provide a foundation for these modified methods.These methods do not only apply to the synthesis of networks of trials, but also to pairwise meta-analysis involving two competing interventions.We take a Bayesian approach to statistical inference.

Standard network meta-analysis
With an NMA we aim to estimate relative treatment effects for a specific target population based on a connected network of multiple RCTs each comparing a subset of all the competing interventions [13,17,18,[29][30][31][32][33][34][35][36].For a credible and relevant NMA we want to avoid systematic differences in known and unknown effect modifiers between studies, and no differences in patient and context related effect-modifiers relative to the target population and setting of interest [18,22,29,32,42].In principle, a standard NMA can be performed by equity relevant subgroups, evidence permitting.
The general random-effects NMA model can be described as follows: (1) where g is an appropriate link function (e.g. the logit link for binary outcomes) and !"# is the linear predictor of the expected outcome with intervention k in trial i (e.g. the log odds).$ " is the study i specific outcome with comparator intervention b. & ",(# reflects the study-specific relative treatment effects with intervention k relative to comparator b and are drawn from a normal distribution with the pooled relative treatment effect estimates expressed relative to the overall reference intervention A: ) (# = ) +# − ) +( (with ) ++ = 0).Estimates of ) +# reflect the relative treatment effect of each intervention k relative to overall reference intervention A based on direct and/or indirect evidence.Variance parameter 0 1 reflects the heterogeneity across studies.With a fixed effect NMA, & "(# ~345678() +# − ) +( , 0 1 ) is replaced with & "(# = ) +# − ) +( because 0 1 is assumed to be 0. The model applies to many types of data, by just specifying an appropriate likelihood describing the data generating process and corresponding link function [31].With the NMA performed in a Bayesian framework, we need to define prior distributions for the parameters to be estimated, $ " , ) +# , and 0 1 .

Network meta-analysis with shrinkage estimation
If we have an evidence network of RCTs for which results are reported relevant for the equity relevant subgroups but are uncertain due to a limited number of studies and small sample sizes in each of the subgroups, borrowing strength from other interventions or subgroups by deriving a shrinkage estimate may be useful [16,23,43,44,45].This approach can be implemented by grouping the multiple interventions in the network into a smaller set of classes with the underlying assumption that the intervention-specific relative effects within a class of interventions are exchangeable.Interventions assigned to the same class, for example, based on mechanism of action, are deemed more alike regarding relative treatment effects for a specific subgroup than interventions from different classes [38].The model expressed with equation 1 can be modified accordingly by defining that the relative treatment effect parameters ) +# are assumed to come from a distribution with a common mean and variance, if they belong to the same class: (2) where G # is defined as the class to which intervention k belongs.6 H I is the mean class effect in class G # , and 0 H I 1 are the within-class variances.The key benefit of the exchangeability assumption is that unstable estimates for ) +# of interventions within a class due to limited subgroup data will be shrunken towards the class mean effect and become more precise than obtained with model 1 where ) +# are assumed to be independent.Informative distributions and sensitivity analysis may be needed for 0 H I 1 if the number of interventions per class is limited [38].With this approach we perform a separate NMA for each of the equity relevant subgroups and highly uncertain relative treatment effects are stabilized by borrowing information from the data from other interventions for the same subgroup.
Another approach to implementing shrinkage estimation for NMA is by assuming that the subgroup-specific relative treatment effects are exchangeable within interventions.All the mutually exclusive subgroups are incorporated in the NMA of the competing interventions simultaneously according to: ( where !"L,# is the linear predictor for the expected outcome with intervention k in subgroup s of trial i. $ "L is the expected outcome with comparator intervention b in subgroup s of study i. & "L,(# reflects the relative treatment effect with intervention k relative to comparator b in subgroup s of trial i and are drawn from a normal distribution with the pooled estimates expressed in terms of the overall relative treatment effects versus intervention A in that subgroup ) L,+# .With this model we make the additional assumption that the subgroup-specific relative treatment effects ) L,+# are drawn from a common normal distribution with mean G +# and intervention-specific variance 0 # 1 .As a result, highly uncertain relative treatment effects for each subgroup are stabilized by borrowing information from the data from other subgroups for that intervention [16].
With these shrinkage models we improve estimation for both new and existing interventions by assuming exchangeability between interventions or between subgroups.The first approach may be difficult to defend if a new intervention has a very different mechanism of action and efficacy than its competing interventions.The second approach does not rely on this assumption.The assumption of exchangeable subgroup-specific relative treatment effects for a given intervention is in line with the long tradition in meta-analysis and epidemiology that relative treatment effects are relatively stable across subgroups.

Network meta-regression
When there are observed differences between the equity relevant subgroups of interest and the study populations of the individual RCTs regarding effect-modifiers, a meta-regression can potentially be used to adjust for this external bias and provide relevant relative treatment effect estimates [22,37,38].When the available evidence base only consists of aggregate-level data, the model presented in equation 1 can be extended with a covariate according to [37,38,46-49]: (4) 6 " is the study-level covariate value of the effect-modifier of interest for trial i. M +# represent the covariate effects with intervention k relative to the overall reference intervention A. N OPQRSO is the centered covariate value representing the target subgroup of interest.) +# represent the relative effect of the intervention k compared to intervention A for the target subgroup of interest.With this model we do not only assume consistency regarding relative treatment effects, but also regarding the parameters reflecting the impact of the covariates (M (# = M +# − M +( and In model 4 the impact of the covariate on the relative treatment effects is assumed to be independent for each intervention k relative to A. However, we can also simplify the model by assuming the impact of the covariate is the same for every intervention k relative to A, M +# = B, or assume these to be exchangeable, M +# ~345678(B, 0 T 1 ) [37,38,46].This shared or exchangeable effect-modifier assumption is useful when the number of studies is limited.
It is important to emphasize the limitation of meta-regression analysis involving patient characteristics based on trial level information extracted from summary reports or publications [37,38] the number of studies is small, and the contrast in the study-level covariate between studies sufficiently large, a spurious relationship between the relative treatment effect and the covariate may be statistically significant [38].With continuously distributed patient characteristics, the within-study variation is typically much larger than the variation in aggregated means used for the meta-regression analysis, thereby not having the power to detect a true relationship [37,38].
Using aggregated information regarding patient characteristics in a network meta-regression is vulnerable to ecologic bias.Due to study-level confounding the estimated relationship between a study-level patient characteristic and the relative treatment effect based on between-study comparisons may be very different than the within-study relationship [39,[50][51][52][53][54].Such ecological bias can occur in non-linear models in the absence of study level confounding [53][54][55][56][57].
With meta-regression analyses, we can improve the estimation of subgroup-specific results of the new intervention with the shared or exchangeable effect-modifier assumption and the trials of the new intervention have the overall reference intervention (i.e.intervention A) included as a control group.

Individual participant data (IPD)-level network meta-regression analysis
The limitations of estimating relative treatment effects for equity relevant subgroups of interest through network meta-regression based on aggregate-level data can be avoided with IPD [38,39,[58][59][60].If IPD is available for all the RCTs in the evidence network, one evidence synthesis model we can use is the following: (5) j reflects the individual in study i.M V" is the main effect of covariate x on the outcome of interest in study i.N "W is the value of the covariate for individual j in study i.Here we assume the interaction effect M +# is fixed across studies.We can also separate the within and between-study interaction between intervention and covariate and use a model with a covariate for the study level mean value of the patient characteristic and a covariate for the individual patient value of this effect-modifier minus the mean value in that study to describe the within-study variation [54,61]: M +# P represent the between-study coefficient for the covariate effects with intervention k relative to the overall reference intervention A. M +# X represent the within-study coefficient for the covariate effects with intervention k relative to the overall reference intervention A. If the withinstudy and between-study interactions are different then ecological bias may be present and inferences regarding relative treatment effects for specific target subgroups should be based on the within-study interactions [38].Again, models ( 5) and ( 6) can be modified by assuming that the impact of the effect modifier is the same for every intervention k relative to A.

Network meta-regression with participant-level data and aggregate-level data
In reality, there is hardly ever access to IPD for all trials in an NMA.At best we have IPD only for a subset of studies.We can perform the evidence synthesis based on a combination of IPD studies and aggregate-level studies with the following model [61-64]: With this model both IPD studies and aggregate-level data studies contribute to the estimation of the treatment-by-covariate interaction effects.If these are believed to be the same for each intervention k relative to A then depending on how IPD and aggregate-level data is distributed IPD studies: over the available direct comparisons in the network, we may be able to "transfer" the withintrial interaction estimate for Ak comparison for which IPD is available to the Ak comparisons for which we only have aggregate-level data.Unfortunately, this only works for specific evidence structures.We can potentially improve the precision of the interaction effects for studies with only aggregate-level data for any network structure based on the available IPD, if we simplify the model with a single treatment-by-covariate interaction parameter for the within-and between-trial comparisons [39,61,65].However, as mentioned, this will bias the estimates when there is study-level confounding or when we have non-linear models.
It is not uncommon that an analyst has only access to IPD for the trials of the new intervention.
Although this facilities subgroup analysis for the new intervention, the question is how much does the aggregate-level information for the competing interventions contribute to estimation of a shared effect-modifier parameter (M +# = B), even if we are not worried about ecological bias.

Multilevel network meta-regression with participant-level data and aggregate-level data
A promising new method relevant for the estimation of relative treatment effects for equity relevant subgroups is multilevel network meta-regression (ML-NMR) [39,41].Unlike the abovementioned limitation of a network meta-regression model with a shared interaction-effect parameter for the IPD studies and aggregate-level studies required to "transfer" information from studies with IPD to studies involving other comparisons for which there is only aggregate-level data, ML-NMR avoids such aggregation or ecological bias.A simple ML-NMR model for dichotomous patient-related effect-modifier can be described as follows: (8) The part of the model relevant for IPD studies is the same as used in model 5 with the exception that the coefficient for the prognostic effect of the covariate, M V , is fixed across studies.synthesis methods when IPD is only available only for a subset of studies, including synthesizing networks of any size and -important for decision-making -producing estimates in any target population of interest [41].
IPD studies: Aggregate-level data studies: Although concerns about ecological bias are mitigated with ML-NMR, the question remains whether aggregate-level information from existing interventions can contribute much to the IPD when that is only available for the new intervention.

Network meta-regression with subgroup aggregate-level data integrated over covariate distributions
Using the principles of ML-NMR, we can also imagine a synthesis approach suitable for an evidence base where some studies provide direct evidence for the subgroups of interest and other studies provide indirect evidence via information for the combination of subgroups.(See representative studies 2, 3, 4, and 5 in Figure 3).An appropriate model when we have only aggregate-level data can be described as follows: (9) Parameters of the part of the model relevant for studies providing subgroup-specific evidence are defined as follows: : "# L is the expected outcome in study i with intervention k for subgroup s.
$ " V is the study i specific outcome with comparator intervention b in subgroup s=0.M V L is the fixed difference in outcomes with intervention b in subgroup s relative to the reference subgroup s=0.& ",(# reflects the study-specific relative treatment effects with intervention k relative to Studies providing direct evidence for a subgroup Studies providing evidence for a combination of subgroups: With this method, we can improve the estimation of equity relevant subgroup effects for the new intervention if we assume the impact of effect-modifiers that define the subgroups of interest is the same for all interventions compared, and the trial(s) of the new intervention have the overall reference intervention (i.e.intervention A) included as a control group.

Generalized evidence synthesis
The evidence synthesis models discussed in the previous sections aim to estimate relative treatment effects for the equity relevant subgroups of interest based on RCT evidence.When the RCT evidence is too limited to obtain relevant and stable estimates for the subgroups of interest, we may want to consider relevant real-world data for the alternative interventions to supplement the RCT evidence [66].Real-world data sources have likely more information about the effect of the intervention in heterogeneous populations than what is available in RCTs and can therefore be very useful.Of course, real-world data is not available for a new intervention but will be available for established interventions that can provide indirect evidence regarding the differences between subgroups or the exchangeability between trial and real-world effects potentially applicable to the new intervention as well.Since relative treatment effect estimates derived from comparative observational studies are typically at greater risk of bias than those obtained from RCTs, we do not want to replace RCT evidence with observational evidence to inform relative treatment effect estimates but use both sources of information wisely in the evidence synthesis [66].
One approach to consider RCT and observational evidence simultaneously in a Bayesian evidence synthesis is to use the relative treatment effect estimates for the equity relevant subgroups obtained from observational data to define informative prior distributions for the relative treatment effect and interaction effect parameters in models 1-9.For example, a typical non-informative prior distribution for ) +# in model 1 for a specific subgroup analysis is ) +# ~345678(0, 10 i ), but can be replaced with an informative distribution according to If we have some information or informed belief about the extent of bias in observational studies relative to RCTs, we can define the prior for the relative treatment effects according to [66]: with ) +# jkl * = ) +# jkl − u, u~345678(0, 0 ("PL 1 ) and m +# jkl * = m +# jkl + 0 ("PL 1 .
u represents the bias in observational evidence, which can be obtained from external evidence.

("PL
is the variance of this bias estimate.In line with the approach described by Welton et al [66], the expected value for the bias is set at zero to indicate that we do not know the direction of the bias, but by incorporating0 ("PL 1 in the prior distribution for the relative treatment effects, the observational evidence is downweighed according to concerns about bias.This approach cannot only be applied to the standard NMA (equation 1) but to models 2-9 as well.For example, with the meta-regression-based synthesis (equation 4) we can define informative prior distributions for ) +# and M +# .When the impact of effect-modifiers is assumed to be exchangeable or shared between interventions, the observational evidence can contribute to subgroup effects for the new intervention as well, even though no real-world data is (yet) available for it.
Another commonly used approach to combine RCT evidence and observational evidence is with hierarchical models [66].With such a model we get different estimates for the relative treatment effects for RCTs and observational studies, but these are related given the hierarchical structure of the model (similar to shrinkage estimation).As such, unstable RCT-based estimates will gain precision given the additional information from observational studies due to the assumption of exchangeability across study designs.In such a hierarchical model we can also include factors to downweigh the impact of observational studies given its potential bias.
Relative treatment effects obtained from observational evidence may not only be different from RCT estimates due to internal bias and the difference in the study populations but also due to suboptimal adherence, which may be relevant for the DCEA.Depending on the extent the observational study has been adjusted for internal and external bias relative to the corresponding RCTs, any remaining difference in relative treatment effect estimates may reflect the impact of suboptimal adherence.If this modifying effect is assumed to be the same for all interventions, it can be used to predict how RCT-based relative treatment effects for a new intervention will translate to a routine practice setting.

Estimation of baseline effects
Evidence for the absolute effects with the reference intervention or outcomes associated with the natural course of the disease in the absence of treatment required for the DCEA is more prone to external bias than evidence for relative treatment effects.Absolute effects do not reflect the target equity relevant subgroups if there are differences in prognostic factors and effect-modifiers between the study populations of the available studies and the target populations.Representative evidence for the outcomes of interest in routine practice can be expected to be available for the reference intervention since it is likely to represent a standard of care.With access to IPD, which is more likely to be accessible for real-world data sources than RCTs, we can estimate baseline effect estimates representative of the subgroups of interest, assuming these data sources have collected the relevant patient characteristics.If multiple IPD data sets are available, we can use generalized linear mixed models to obtain "pooled" estimates for the outcomes by subgroup of interest.If there is no access to IPD and we need to estimate the absolute effects based on aggregate-level data from publications of observational cohort studies or registries, models similar to equation 1, 3, 4, and 9 are modified such that they reflect absolute effects and the impact of prognostic factors, rather than relative treatment effects and effect-modifiers.

EXAMPLE
In this section, the different evidence synthesis methods for aggregate-level data are illustrated with a hypothetical, yet realistic example.The analyses were performed in a Bayesian framework with non-informative prior distributions unless otherwise stated.is the most efficacious with the greatest efficacy in subgroup 3.
The nine available RCTs for this hypothetical example are presented in Table 1.There are three AB trials comparing intervention B versus A, two AC trials, two BC trials, an AD trial, and a three-arm ABD trial.Five trials included a heterogeneous population of subgroups 1,2, and 3.
Two trials included a heterogeneous population of subgroup 1 and 2. Another two trials included a heterogeneous population of subgroup 2 and 3.Only three out of the nine trials reported subgroup data for each of the equity relevant subgroups of interest.The sample size and number of responders for each study arm stratified by subgroup are listed where available.The proportion of each subgroup in each trial as reported is listed as well.This hypothetical evidence base can be considered representative of the information that is typically available for aggregatelevel data evidence synthesis.
[Insert Table 1] Results of the analyses with different models (1, 2, 3, 4, 9, and 10) are presented in Table 2.In the first row of the table, we see the estimated relative treatment effects with interventions B, C, and D relative to A for each of the three subgroups obtained with an NMA by subgroup if all studies would have reported subgroup results for each of the subgroups: the "benchmark estimates".The efficacy of intervention B is consistent across subgroups.Efficacy of intervention C and D are heterogeneous across subgroups.Given the heterogeneous relative treatment effects, applying the all-comers average relative treatment effects to these subgroups in a DCEA would not be appropriate.
In rows 2-6 of Table 2 we see the relative treatment effects obtained with the alternative methods based on the RCT data that is available as reported in Table 1.With the standard NMA by subgroup (model 1) we get similar results as the benchmark results, but the 95% credible intervals (95%CrI; Bayesian equivalent to 95% confidence intervals) are wider because we can only use the subset of studies for which subgroup results were reported (See Table 1).For subgroup 3 we do not have an estimate for intervention C because no subgroup results were reported.
If we use an evidence synthesis model with the assumption of exchangeable treatment effects (model 2), we see that the contrast between interventions regarding relative treatment effects for each of the subgroups is reduced.In comparison to results obtained with model 1, the estimates for intervention B have shifted upwards a bit and estimates for intervention D have shifted downwards a bit.The treatment-specific estimates have shrunken towards the average effects across interventions.These changes are not statistically significant but do result in smaller 95%CrIs, closer to the benchmark results.
When the analyses are performed under the assumption of exchangeable subgroup effects (model 3), we see that the contrast in relative treatment effects between interventions is not really reduced, but the differences between subgroups are somewhat.The benefit is that we get more precise estimates (i.e.smaller 95%CrI), closer to the benchmark results.The additional benefit is that we do get an estimate for intervention C for subgroup 3, albeit it is very uncertain.
When we perform a conventional meta-regression analysis (model 4) using specific subgroup data, where available, and otherwise the mixed population data, we get the results presented in row 5 of Table 2.The meta-regression model has two covariates representing the difference in the log-odds ratio between subgroup 2 and subgroup 1 and the difference between subgroup 3 and subgroup 1.Overall, relatively precise estimates are obtained given the data reported.
However, the contrast in relative treatment effect estimates between interventions and between subgroups is reduced because we had to assume that the impact of the effect-modifiers associated with the subgroup was the same for intervention B, C, and D relative to A due to the limited number of studies available.The actual benchmark results show that the trend in relative treatment effects for intervention C versus A is the opposite of the trend for intervention B versus A, thereby canceling each other out in the meta-regression analysis resulting in relatively similar estimates across the subgroups.
An analysis using the same aggregate level data as with the conventional meta-regression analysis but according to the structural assumptions of model 9 in line with the principles of ML-NMR, we see results closer to the benchmark results than with the conventional meta-regression analysis.Again, the analyses are limited by the structural assumption that the difference in relative treatment effects between subgroups (on the log odds ratio scale) is the same for interventions B, C, and D.
Finally, a network meta-regression based on RCT and observational real-world evidence according to the principles expressed in equation 10 was performed.The relative treatment effect estimates of interventions B and C relative to A were strengthened with evidence about the treatment effects of these interventions in routine practice.Observational studies showed odds ratios of response that were 25% smaller than those observed in RCTs.The variance was reduced by about 40% due to the larger sample size in real-world data studies.Given the use of external observational evidence, treatment-specific interaction effects could be used (which was not feasible with the meta-regression based on the nine RCTs).To help improve the estimates for new intervention D across subgroups, it was assumed that the effect-modification for intervention D would not vary more than 2 times the effect-modification seen with intervention B. (Let's say this information was obtained employing a formal expert elicitation exercise.)A study by Ioannidis et al. [74] of 19 meta-analyses comparing RCT with observational study results provided an estimate for the variance of the bias (0 ("PL 1 ) in observational studies, which was used to downweigh the observational evidence according to equation 10 [66].Results of this generalized evidence synthesis are presented in the last row of Table 2 and show estimates closer to the benchmark analysis than any of the other approaches.

SOME PRACTICAL RECOMMENDATIONS
A few practical recommendations are outlined here to estimate treatment effects based on existing data to parameterize a health economic model to perform a DCEA.
Once equity relevant subgroups of interest have been defined, the first step is to determine how this translates into the relevant (levels of the) effect-modifiers and prognostic factors to consider in the evidence synthesis.In general, we want to use IPD as much as possible.With access to IPD, we can define the effect-modifiers and prognostic factors more specifically.With aggregate-level data, we have limited ability to adjust for differences between characteristics of the study populations of the relevant individual studies and the target subgroups of interest for the DCEA.As mentioned, a distinction needs to be made between estimating relative treatment effects and the baseline effects for the equity relevant subgroups.For the former, we only need to worry about differences in effect-modifiers between groups, and for the latter, we need to worry about differences in prognostic factors as well.The final (levels of the) effect modifiers and prognostic factors will be a trade-off between the relevance for the decision problem and data availability.
Given the greater risk of external bias in absolute effects than relative treatment effects, it is recommended to attempt obtaining access to real-world IPD to estimate the baseline effects with the reference intervention or standard of care for the subgroups of interest.If this is not feasible, sufficiently detailed aggregate-level data needs to be obtained from published studies.
Given the potential challenges in estimating subgroup-specific relative treatment effects, it makes sense to first assess whether it is even worthwhile to do so given the baseline effects for the equity relevant subgroups of interest.When the health economic model is parameterized with subgroup specific baseline effect estimates, we can assess whether using relative treatment effects that differ by subgroup results in meaningful different estimates of cost-effectiveness and health equity impact in comparison to using the same relative treatment effect estimate for all subgroups (based on the all-comers trial populations).If the differences in estimates do impact the conclusion which intervention to prefer, it is worthwhile to proceed with estimating equity relevant subgroup specific model input parameters regarding the relative treatment effects.
It is recommended to use multiple evidence synthesis methods to estimate the relative treatment effects of interest given the potential sensitivity of the estimates to the method of choice and the structure of the statistical model.If IPD is available for a subset of the RCTs, it is recommended to use ML-NMR-based approaches.Simulation studies have shown that estimates for the target populations of interest obtained with ML-NMR are greatly improved over other methods to for external bias [41].When multiple competing models are considered for the same RCTbased data set, model fit criteria, such as the deviance information criterion can be used to help identify the most parsimonious models for the data at hand.
For the final set of evidence synthesis models considered most appropriate for the information available from RCTs and the obtained subgroup-specific estimates of relative treatment effects, we want to evaluate whether using observational evidence can improve the precision of the estimates while explicitly acknowledging its potential internal bias in the synthesis model.
For a new intervention, there is typically only RCT evidence available.Its subgroup-specific relative treatment effect estimates can potentially be improved if the assumption of exchangeable subgroup effects is defendable or if there are certain similarities with (some of) the alternative interventions that impact the relative treatment effects.If this is the case, we can use shrinkage estimation assuming exchangeability across (some) interventions or rely on the shared or exchangeable effect-modifiers assumption.With these approaches, any beneficial information from real-world evidence for the competing established interventions will also "transfer" to the new intervention.When considering these approaches, a trade-off needs to be made between potentially biasing the subgroup-specific estimates of the new intervention and precision gains.
If the new intervention is deemed too different then these structural assumptions of the evidence synthesis models are inappropriate.As a last resort, formal expert elicitation can be considered to improve subgroup-specific estimates for the new health technology [68][69][70].A well-established approach for expert elicitation in the context of health economics is SHELF [71][72][73].In essence, expert judgment is combined with empirical treatment effect data from RCTs in a formal and reproducible manner to improve the estimation of equity relevant treatment effects. Finally (# ~345678() +# − ) +( , 0 1 ) For the aggregate-level data part of the model, : "# is the overall expected outcome in study i with intervention k and is determined by integrating the individual-level model over the joint withinstudy distribution of the binary covariate that defines the two subgroups of interest.: "# equals the sum of the proportion of subjects with covariate x=1 in each aggregate-level data study (6 " ) multiplied with : "# Z and the proportion of subjects with covariate x=0 (1-6 " ) multiplied with : "# [ .: "# Z represent the marginal expected outcome with intervention k for a subject with the covariate x=1 in study i.Similarly, : "# [ is the equivalent for a subject with x=0.The key feature of ML-NMR is that an individual model is averaged over the population in study i to obtain the aggregate-level model for that study.A generalization of model 8 has been described by Phillippo et al. [41].ML-NMR addresses several limitations of other proposed evidence

1 )
comparator b for the overall reference subgroup s=0. ) +# reflects the pooled relative treatment effect of each intervention k relative to overall intervention of reference A for the overall reference subgroup s=0.M +# L represent the difference in the relative treatment effect with k versus A in subgroup s relative to the reference subgroup s=0.For the part of the model describing evidence for a combination of subgroups, the additional parameters are defined as follows: : "# is the overall expected outcome in study i with intervention k and is determined by integrating the subgroup-level model over the joint within-study distribution of the categorical covariate that defines the multiple subgroups of interest.: "# equals the sum of the proportions of subjects in each of the subgroups s in study i, 6 " L , multiplied with the corresponding expected outcome with intervention k, : "# L .
jkl is the relative treatment effect estimate obtained from observational data and m +# jkl = and r a factor to define the weight the observational evidence will have in the synthesis[66,67].If r = 1, RCTs and observational studies carry the same weight in the overall estimate of ) +# .If 0 < r < 1 the observational studies have less weight than the RCTs to accommodate concerns of greater bias in the relative treatment effect estimates based on observational data.Sensitivity analyses regarding r are recommended.
Imagine we want to perform a DCEA of four alternative interventions A, B, C, and D indicated for a certain condition with D the new intervention.These four interventions have been evaluated in different RCTs.Real-world observational evidence is available for A, B, and C. For the DCEA we are interested in three equity relevant subgroups: population 1, population 2, and population 3. The outcome of interest is treatment response, a dichotomous endpoint.The baseline effects in routine practice with intervention A are about 25%, 20%, and 15% in subgroup 1, 2, and 3 respectively.The efficacy of intervention B relative to A is consistent across the equity relevant subgroups.Intervention C is more efficacious than B, with relative treatment effects greater in subgroup 1 than in subgroup 2 and subgroup 3. New intervention D

Figure 2 :
Figure 2: Influence diagram for a fictitious health economic model to perform a distributional cost-effectiveness analysis.Arrows in green represent relative treatment effects and outcomes in the baseline arm of the model representative of the equity relevant subgroups of interest.

Figure 3 :
Figure 3: Evidence challenges for baseline and relative treatment effects for the equity relevant subgroups.Lighter color bars reflect uncertain evidence due to small sample sizes.Black/grey and green represent evidence for two different treatments.Bars crossing multiple subgroup columns (e.g.study 3) reflect studies that only report results for a combined population and not for the specific subgroups of interest.(In contrast, study 6 provides results for all subgroups of interest.

Foundation of DCEA: estimation of the net health benefits for the equity relevant subgroups
, with subgroup-specific relative treatment effect estimates for all the competing interventions of interest available based on what have been deemed the most appropriate methods for the evidence at hand, it is recommended to perform multiple DCEAs as sensitivity analyses to understand their impact on cost-effectiveness and health equity impact estimates.