survHE : Survival Analysis for Health Economic Evaluation and Cost-Eﬀectiveness Modeling

Survival analysis features heavily as an important part of health economic evaluation, an increasingly important component of medical research. In this setting, it is important to estimate the mean time to the survival endpoint using limited information (typically from randomized trials) and thus it is useful to consider parametric survival models. In this paper, we review the features of the R package survHE , speciﬁcally designed to wrap several tools to perform survival analysis for economic evaluation. In particular, survHE embeds both a standard, frequentist analysis (through the R package ﬂexsurv ) and a Bayesian approach, based on Hamiltonian Monte Carlo (via the R package rstan ) or integrated nested Laplace approximation (with the R package INLA ). Using this composite approach, we obtain maximum ﬂexibility and are able to pre-compile a wide range of parametric models, with a view of simplifying the modelers’ work and allowing them to move away from non-optimal work ﬂows, including spreadsheets (e.g., Microsoft Excel ).


Introduction
Broadly speaking, the objective of publicly funded health care systems -e.g., those in the United Kingdom (UK), Canada, Australia and many other countries around the worldis to maximize health gains across the general population, given finite monetary resources and a limited budget. Bodies such as the National Institute for Health and Care Excellence (NICE) in the UK provide guidance on decision-making on the basis of health technology assessment (HTA) and cost-effectiveness analysis (NICE 2013). These cover a suite of analytical approaches for combining costs and consequences of intervention(s) compared to a control or status quo, the purpose of which is to aid decision-making associated with resource allocation. To this aim, much of the recent research has been oriented towards building the health economic evaluation on sound and advanced statistical decision-theoretic foundations, arguably making it a branch of applied statistics (Willan and Briggs 2006;Briggs, Sculpher, and Claxton 2006).
Interventions that impact upon survival form a high proportion of the treatments appraised by NICE (Latimer 2011). Interestingly, in order to quantify accurately the economic benefits of a new intervention, it is necessary to estimate the mean survival time (rather than usual summaries, such as the median time). Thus, it is necessary to extrapolate the observed survival curve (which often only covers a limited time frame and is subject to high degree of censoring) to a longer time horizon. Consequently, a parametric approach to the survival analysis is usually followed and it is recommended by a highly influential NICE Decision Support Unit (DSU) technical document (Latimer 2011).
In general terms, a Bayesian approach to statistical inference is particularly helpful in economic evaluation, because it allows to fully characterize uncertainty in the model parameters. This in turn is a fundamental component of the cost-effectiveness analysis, which are often made by several potentially correlated "modules", which may be informed by different and diverse sources of evidence. For this reason, it is crucial to assess the impact of this uncertainty on the decision-making process Baio and Dawid 2011;Baio 2012), a process termed probabilistic sensitivity analysis (PSA).
In addition, the clinical evidence used to inform the survival analysis for HTA can be limited, thus emphasizing the problem of uncertainty in the extrapolation of the survival curves. The inclusion of prior information (e.g., to "anchor" cancer patients' survival well below that of the healthy population, or from evidence on drugs with similar therapeutic mechanisms), which is instrumental to a Bayesian analysis, would again improve the model performance. However, in addition to the need of specifying suitable prior distributions that are consistent with the information available for the case at hand, Bayesian models for survival analysis fitted using standard Markov chain Monte Carlo (MCMC) algorithms can be computationally intensive and sometimes have problems with convergence. Moreover, existing Bayesian software, notably BUGS (Lunn, Spiegelhalter, Thomas, and Best 2009) or JAGS (Plummer 2003), require the full specification of the modeling assumptions, which often proves an insurmountable barrier to implementation for inexperienced researchers/modelers. Perhaps for this reason, frequentist methods to perform survival analysis with an emphasis in economic evaluations represent the industry standard. In particular, survival analysis is often embedded in health economic evaluations using a multi-step/multi-software approach. Firstly, estimates from a survival model are obtained. These are sometimes obtained by published clinical studies presenting point and interval estimates for the model parameters; alternatively, survival models are fitted to (pseudo)individual level data (see Section 9.1). This part is by necessity based on proper statistical software (e.g., Stata, StataCorp 2019; SAS, SAS Institute Inc. 2013; or R, R Core Team 2020). Secondly, modelers produce a set of simulations for the model parameters using tools such as Cholesky decomposition and Monte Carlo (MC) procedures . Finally, these are used to produce a large number of survival curves, which are fed to the economic model (for example to determine the benefits of a given intervention). The last two steps are typically performed using a spreadsheet, almost invariably Microsoft Excel.
From both the scientific and the practical point of view, this process is less than ideal. Firstly, in the MC simulations, potential correlation among the model parameters is only approximated and not fully accounted for, potentially introducing bias in the estimate for the survival curves and thus in the outcome of the decision-making process. A full Bayesian approach, e.g., based on MCMC, would eliminate this issue because the inference would be produced directly on the full joint distribution of the model parameters. While it is in theory possible to build a frequentist model by constructing a multivariate joint likelihood of all the model parameters, this is in practice not very common. Incidentally, this exercise would be likely to increase the theoretical and computational complexity, thus de facto eliminating the classical objections to a Bayesian approach, in favor of a "simpler" one based on maximum likelihood (ML).
The limitations of spreadsheets calculators such as Microsoft Excel, in terms of statistical modeling (and particularly in survival analysis), are increasingly often recognized in the health economics literature (Baio and Heath 2016;Williams, Lewsey, Briggs, and Mackay 2017;Krijkampet, Alarid-Escudero, Enns, Jalal, Hunink, and Pechlivanoglou 2018;Incerti, Thom, Baio, and Jansen 2019). In fact, it is no accident that a number of R packages have recently been developed to use in cost-effectiveness analysis, notably including among others BCEA (Baio, Berardi, and Heath 2017), SAVI (Strong, Oakley, and Brennan 2014), heemod (Filipović-Pierucci, Zarca, and Durand-Zaleski 2017), hesim (Incerti and Jansen 2020), EVSI (Heath, Manolopoulou, and Baio 2017) and missingHE (Gabrio, Mason, and Baio 2019). With this in mind, the objective of this work is to develop a suite of functions and tools for the freely available statistical software R, specifically designed for the needs of modelers using survival analysis results to build extensive models for health economic evaluation.

The R package survHE
survHE is an R package specifically designed to aid in the process of survival analysis for health economic evaluation and cost-effectiveness analysis. In fact, survHE can be actually considered as a wrapper for three other R packages; the first one, flexsurv (Jackson 2016), is in turn a general-purpose tool for performing several types of survival analysis using maximum likelihood estimates (MLEs). The second one, rstan (Carpenter et al. 2015), is a relatively new R package that can be used to perform Bayesian analysis using Hamiltonian Monte Carlo (HMC). This is a form of MCMC algorithm, which can be used to produce samples from a joint posterior distribution of a set of model parameters and unobserved quantities. Finally, the third one, INLA (Martins, Simpson, Lindgren, and Rue 2013), can be used to perform fast Bayesian computations (on a limited set of survival models) using integrated nested Laplace approximation (Rue, Martino, and Chopin 2009). In a sense, thus, survHE is a very simple tool that specializes functions from other relevant packages and builds some other specific commands to simplify and standardize the process of using survival data in health economic evaluation.
Package survHE (Baio 2020) is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=survHE and there is also a development version maintained under GitHub (https://github.com/giabaio/survHE). The design for survHE is modular; sets of function have specific objectives, which can be broadly categorized as "Data preparation", "Model fitting and assessment" and "PSA and extrapolation", as shown schematically in Figure 1. These are described in detail in the following sections.
Depending on the user's instruction, survHE maps internally to different code, which calls either flexsurv, rstan or INLA in the background to produce the relevant estimates. Once  Figure 1: A schematic of the design for survHE. The rounded boxes represent the "modules", a set of functions used to perform a specific task. For example, the functions digitise and make.ipd can be used to process digitized data from published papers and recreate an individual patient dataset. This can in turn be fed to fit.models, which estimates the parameters for a given survival model. The output of this function can be processed to assess model fit, or produce relevant summaries and plots. In addition, it can be fed to the function make.surv, which simulates the survival curves and effectively performs PSA. The output of this process can be plotted using the function psa.plot, or exported to a spreadsheet via the function write.surv.
these are obtained, the output is standardized and returned in a form that is comparable across the inferential methods. In other words, for the set of models that are implemented in survHE, practitioners can take full advantage of the modeling capabilities of the three packages and analyze survival data under both a frequentist or a computationally efficient Bayesian approach, using the same syntax. More importantly, survHE also includes specific built-in functions to simplify and standardize the process of using survival data in health economic evaluation.

Modeling framework
The general modeling framework considered in survHE can be described as follows. The observed data are at least made by the pair (t i , d i ), for i = 1, . . . , n, where t i > 0 is the observed time at which the event under study occurs and d i (for "dummy" variable) is an event indicator, taking value 1 if the event actually happens, or 0 when the ith individual is "censored". If d i = 1, then t i is indeed observed; conversely, if d i = 0, we do not know whether the event actually occurs -it may in the future, but we just do not have this information. Consequently, when d i = 0, then the observed t i does not represent the true "survival time". Notice here that we consider for the sake of simplicity "right censoring", which is the most common for applications in health economics.
The observed data t i are modeled using a suitable probability distribution characterized by a density f (t i | θ), as a function of a vector of relevant parameters θ. This can be linked to the survival function indicating the probability of an individual surviving up to time t, as well as to the hazard function which quantifies the instantaneous risk of experiencing the event. In the presence of censoring, the resulting log-likelihood function is modified to account for the possibility of partially observed data (in correspondence with censoring) and is expressed as This basically models the risk of experiencing the event at any time point t, conditionally on the fact that the ith unit has in fact survived up to that time point; if they have not, then the probability of experiencing the event again is 0.
When formulating a parametric survival model, we need to specify the form of the probability distribution assumed to describe the underlying uncertainty in the observed times. As mentioned above, it is good practice to test a set of (more or less) plausible parametric models for the survival data. This is the procedure recommended by the NICE-DSU technical document (Latimer 2011).
In general terms, we can specify the vector of relevant parameters as θ = (µ(x), α(x)). In this notation, consistent with flexsurv, we consider: a vector of potential covariates x (e.g., age, sex, trial arm, etc.); a location parameter µ(x), which indicates the mean or the scale of the probability distribution; and a (set of) ancillary parameters α(x), which describe the shape or variance of the distribution. While it is possible for both µ and α to explicitly depend on the covariates x, usually the formulation is simplified to assume that these only affect directly the location parameter.
In addition, since t > 0, we typically use a generalized linear formulation to model the location parameter. The function g(·) is typically the logarithm; notice that here we slightly abuse the notation and omit the dependence of µ i on x. Generally speaking, (2) can be extended to include additional terms -for instance, we may want to include random effects to account for repeated measurements or clustering. We indicate this possibility using the [+ . . .] notation and highlight the fact that this is rather straightforward in a Bayesian context (particularly for INLA, but also for rstan).
The objective of the statistical analysis is the estimation of the parameters θ, which can then be used to obtain estimates for all the relevant quantities (e.g., the survival function), which are in turn used in the economic modeling, typically through a state-transition structure (Siebert et al. 2012). In a frequentist setting, the estimation procedure concerns some relevant statistics, i.e., functions of the observed data and is performed via ML estimation. Conversely, in a full Bayesian setting, the parameters are directly modeled using a prior probability distribution, which is updated by the observed data into a posterior. It is this posterior distribution that is the object of the inferential process. Thus, when using a Bayesian framework, the model needs to be completed by specifying suitable prior distributions for the parameters θ.
Assuming that the location parameter is specified using a linear predictor form, on the scale determined by g(·) and as a function of J covariates, we can model β = (β 0 , β 1 , . . . , β J ) iid ∼ Normal(µ β , σ β ), where, in general, we use the notation σ to indicate a standard deviation. Note that survHE expands any categorical covariates to a set of dummy variables: so if a covariate has four categories, in line with R notation, survHE considers three binary indicators. Thus the profile (0, 0, 0) indicates the first (reference) category, while the profiles (1, 0, 0), (0, 1, 0) and (0, 0, 1) indicate the second, third and fourth category, respectively. In survHE, the number of covariates J depends on this full expansion of the design matrix.
In both its Bayesian versions, survHE assumes by default µ β = 0 and σ β = 5 for the models in which the linear predictor is defined on the log scale and σ β = 100 for those in which the linear predictor is defined on the natural scale and thus g(·) is the identity function. This amounts to specifying a "minimally informative" prior on the regression coefficients that determine the location parameter -in other words, we are not including strong prior information in this aspect of our model. The observed data (and the censoring structure) will be mainly driving the update to the posterior distribution. When genuine prior knowledge is present, e.g., about the likely size of a treatment effect, it is possible to modify these priors to encode the information in the model formulation.
Notice that, particularly in case where the sample size is small or censoring is large, even these minimally informative priors generally contribute to stabilize the inference, by placing most of the probability mass for the regression coefficients on a relatively small interval. For instance, if β ∼ Normal(0, 5), we are then implying a negligible prior probability that β > 10 (recall that in this case, β is on the log scale). If the data suggested that the effect of a particular covariate is in fact large, the model would respond by updating the prior distribution accordingly. However, because large effects are not very likely in the prior, the model would respond in a "skeptical" way, thus counter-acting the potential bias implied by taking at face value the signal from the limited amount of information present in the data.
As for the ancillary parameter, the choice of prior depends on the specification of the probability distribution selected to model the observed data. Table 1 shows a summary of the models directly implemented in survHE. In each, by default we specify minimally informative priors on the relevant parameters; for example, in the Weibull model, we define α ∼ Gamma(a, b), for given values of a, b.
All the models presented in Table 1 are available using MLE and HMC as inferential engines. On the other hand, INLA currently only handles Exponential, Weibull, log-Normal and log-Logistic models. The names presented in the rightmost column can be used when calling survHE. We mostly follow the notation of flexsurv, but also allow for some specific differences in the INLA notation: for example, the log-Logistic distribution can be referred to using flexsurv (llogis), or INLA notation (loglogistic). survHE will internally map the different strings of text and select the correct routine.
Notice also that, in line with flexsurv and INLA, survHE allows the two versions of the Weibull Data model Location parameter model, i.e., using an accelerated time failure (AFT, weibull or wei) or a proportional hazard (PH, weibullPH or wph) parameterization. When the interest is in estimating the effect of some covariates on the survival time, these two versions yield of course different estimates. However, in the case of health economic evaluation, the interest is really in producing an estimate of the distribution of the survival curves. survHE internally maps the estimated coefficients to the correct transformation so as to estimate S(t) correctly, irrespective of the original parameterization. For example, when the AFT parameterization is used, then The "original" parameterization of the Generalized F and Generalized Gamma distributions (respectively genf.orig and gengamma.orig) are currently available only when the estimation is performed using MLE in flexsurv (this is mostly for backward compatibility).

Example
In the following, we use a running example to present the features of survHE. Suppose that the user has a suitable dataset, perhaps obtained from a trial, in which data are recorded for the time at which observations are made, an "event" indicator taking value 1 if the clinical outcome of interest (e.g., progression to a cancerous state, or death) has actually occurred at that time and 0 if the individual has been censored (e.g., we have not observed any event at the end of follow up), as well as an arm indicator, specifying whether the individual to whom the data refer belongs in the control or the active treatment arm of the trial. We also have information about the individuals' age, sex, socio-economic circumstances (as measured by an "Index of Multiple Deprivation", typically referred to as IMD; Noble, Wright, Smith, and Dibben 2006) and ethnic background. Other variables may be observed, e.g., relevant co-morbidity.
For the moment we consider the simple case in which the data are available in the R workspace as a data-frame (say, data) that can be partially visualized using the following command: R> rbind(head(data), tail(data)) to show the first 6 and the last 6 rows: The dataset consists of 367 individuals in total, grouped in two arms (here arm = 0 indicates the controls and arm = 1 indicates the active treatment; sex = 0 or 1 indicates Male and Female, respectively; imd takes values 1, . . . , 5 to indicate increasing levels of small area deprivation; and ethnic = 1, . . . , 5 groups various ethnic backgrounds, for instance in line with census data).

Modeling survival data:
To be or not to be (Bayesian) . . . ?
By relying on either flexsurv, INLA or rstan, survHE can fit models under the frequentist or Bayesian framework.
As already discussed, a Bayesian approach can be beneficial to propagate the underlying uncertainty in all the model parameters in a principled way. In the specific case of survival analysis, this is often helpful because it allows the formal inclusion of external information, e.g., in the form of data coming from population registries or, at the very least, expert knowledge. This is not a major benefit when the underlying time-to-event data are mature (e.g., the follow up and the underlying event rate of the study are such that most individuals are fully observed -and censoring does not affect a large proportion of the subjects). However, particularly with the recent development of immuno-therapies, the case where the observed data do not even reach the median survival time is increasingly frequent. This is clearly a major limitation, when the main objective is to extrapolate the survival curve way beyond the observed follow up. As mentioned earlier, in such cases, it is fundamental to anchor the extrapolation of the survival curves, using external information -for example by combining the trial data with life-tables or registries to ensure that the extrapolation does not artificially favor a given treatment. A Bayesian approach naturally lends itself to this aim, by allowing the modeler to include genuine prior information, e.g., on the probability that the average survival time does not exceed some set threshold (Benaglia, Jackson, and Sharples 2015;Jackson et al. 2017;Guyot, Ades, Beasley, Lueza, Pignon, and Welton 2017).
On the other hand, in addition to the need of specifying suitable prior distributions that are consistent with the information available for the case at hand, Bayesian models for survival analysis fitted using MCMC can be computationally intensive and sometimes have problems with convergence; perhaps for this reason, often practitioners use MLE-based routines to obtain relevant estimates from the survival component of the wider economic model. These are usually simple to obtain. In order to deal with PSA, flexsurv uses bootstrap (based on multivariate Normal distributions), which may be a good approximation of the underlying full joint posterior distribution of the survival parameters.
A good compromise between frequentist and fully Bayesian models is provided by INLA, which is effectively an alternative method of performing Bayesian inference. By using a specific (albeit rather general), clever model specification and an approximation algorithm, INLA typically requires a computational time that is very close to that of MLE-based routines, while also estimating an approximation to the full joint posterior distribution for the model parameters. However, in general terms, it is a bit more complex to embed INLA within a more complex model; in addition, currently, INLA can only fit a limited number of survival models.
For these reasons, we chose to design survHE around these three approaches: ideally, we would build the whole economic model under a Bayesian framework and take full advantage of the flexibility provided by MCMC estimation -this would be naturally obtained by using rstan. Note that we choose rstan over other software such as OpenBUGS or JAGS; the reason for this is that HMC often proves a superior mode of inference compared to Gibbs sampling (the MCMC algorithm upon which OpenBUGS and JAGS are based). In particular, for the specific set of models that we consider here, HMC proves faster and more reliable in terms of convergence than Gibbs sampling. In addition, rstan models can be "pre-compiled" and thus the computational time required is totally devoted to sampling from the relevant posterior distributions, which makes a full Bayesian approach more competitive, in comparison to MLE-based methods.
However, despite these very useful features of rstan, we acknowledge that at times Bayesian survival models can still be challenging from the point of view of computation (especially with large datasets and for particularly complex model structures). In these cases, INLA is very helpful, for the range of distributions it supports and going forward, to expand the modeling to flexible structures (e.g., including smooth random effects); similarly, particularly in specific settings, some practitioners may want to use a standard approach to statistical inference and MLEs.
The combined use of survHE and its "dependencies" allows all of these options in a unified framework. More importantly, irrespective of the way in which the inference on the survival model is performed, survHE has a set of built-in functions that can be used to produce a standardized post-processing of the results, for their inclusion in the economic model and PSA.

MLE via flexsurv
survHE allows the user to define in R a vector of model names (in the format that flexsurv or INLA can recognize). We could for instance decide that we want to consider the Exponential, Weibull, Gamma, log-Normal, log-Logistic and Generalized Gamma models for our analysis.
We can do this in R by using the following commands.
At this point, we are almost ready to actually perform the survival analysis using the 6 models specified above; before we can do this, however, we need to specify the model "formula", for example as the following.

R> formula <-Surv(time, event)~as.factor(arm)
This creates an object instructing R to analyze the data using a survival model in which the only covariate is the treatment arm, interpreted as an R "factor" (i.e., a categorical variable). Of course, it is possible to use an even less structured version of the linear predictor by simply defining formula <-Surv(time, event)~1, which would fit a model with no covariates (i.e., intercept only). Other covariates can also be added to the formula.
The survHE function fit.models can be used to actually perform this analysis in batches, e.g., by typing the command: R> m1 <-fit.models(formula = formula, data = data, distr = mods) The function fit.models takes as mandatory inputs the analysis formula, the name of the dataset to be used and the type of distribution(s) to be fitted. Just like in this case, this may be a vector, in which case fit.models will store all the different analyses in the resulting object.
Executing the command above creates an object m1 of class 'survHE', in which the results of the survival analyses are stored for each parametric model considered. The usual R command R> names (m1) returns the names of the several elements in the list.
[1] "models" "model.fitting" "method" "misc" The object models is itself a list, in this case containing 6 elements (one for each of the parametric models fitted). The command R> names(m1$models) returns the string of names used to label each, in this case [1] "Exponential" "Weibull (AFT)" "Gamma" "log-Normal" [5] "log-Logistic" "Gen. Gamma" For example, the first model can be accessed using the standard R notation m1$models[[1]] by position (i.e., using "double square brackets"), or alternatively by using its name and $ via m1$models$Exponential and can be inspected typing the commands
The other elements of the object m1 are: • model.fitting: a list storing some suitable statistics that can be used to assess, well . . . model fitting. These are the Akaike, Bayesian and deviance information criteria (AIC, BIC and DIC, respectively). The former two can be estimated using both the Bayesian and frequentist approach, while the latter is specific to Bayesian modeling. Note that because we are storing the results obtained from fitting 6 models in the same object, the elements $aic, $bic and $dic are vectors. In general, the model associated with a lower information criterion value tends to be associated with a better fit, as we discuss in Section 6.4.
• method: a string describing the method used to fit the model(s). In this case the code R> m1$method returns the output • misc: a list containing some miscellanea -these are mainly used internally by the other functions in survHE to do plots and tables or other calculations. Specifically, the elements of this objects are -time2run: the time used to run the model(s) (in seconds); -formula: the R object containing the formula used to define the model(s); -data: the data-frame containing the original data used to fit the model(s); -model_name: a (vector of) abbreviation(s) associated with the model(s) fitted; -km: the Kaplan-Meier estimate produced automatically by survHE (using the function npsurv from the R package rms; Harrell Jr 2020).

Integrated nested Laplace approximation
Integrated nested Laplace approximation (INLA; Rue et al. 2009) can be used to perform direct numerical calculation of posterior densities in a wide sub-class of Bayesian hierarchical models (called latent Gaussian models, LGMs), avoiding time-consuming Markov chain Monte Carlo simulations. The INLA implementation covers models of the form where y i is the observed variable, φ is a set of main parameters (which may and often has a very large dimension) and ψ is a set of "hyperparameters", so that the full set of parameters is θ = (φ, ψ). The main restrictions of the INLA formulation are the fact that the number of hyperparameters needs to be small (for computational convenience) and the form of the prior imposed on φ. This is a multivariate Normal distribution where the precision (i.e., inverse variance) matrix Q −1 (ψ) depends on the hyperparameters and exploits conditional independences across the parameters. This general structure is called a Gaussian Markov random field (GMRF; Rue and Held 2005).
The basic principle is to approximate the posterior density for φ and ψ using a series of nested Normal approximations. The algorithm uses numerical optimization to find the mode of the posterior, while the marginal posterior distributions are computed using numerical integration over the hyperparameters. INLA is a very fast method of inference and can be applied to many models that can be written in the form of LGMs -for example generalized linear models (including structured components, such as simple random effects, as well as spatial or temporal effects).
On the other hand, not all models can be easily framed within the LGM formulation. In addition, INLA's estimates are, by definition, approximations to the full joint posterior distribution of the model parameters. Thus, there is a trade-off between the computational complexity and the accuracy of the estimation. In many general cases, INLA produces a good compromise by allowing a good level of accuracy (often comparable with simulation methods such as MCMC that, if run for long enough, are guaranteed to give the "exact" value) and running time, often in the same order as standard ML algorithms.

Using survHE to fit models with INLA
When fitting models using a Bayesian approach via INLA, survHE allows the user to select a vector of distributions; as mentioned above, currently, INLA and its R implementation allow four survival models: Exponential, Weibull, log-Normal and log-Logistic. The user can specify a vector distr <-c("exp", "weibull", "lognormal", "loglogistic"), or select only a subset of those models, or maybe run the fit.models command separately for each of them. If a distribution is specified that is not allowed in INLA, then survHE will fall back on the MLE specification and use flexsurv instead.
One important distinction is in the way in which flexsurv and INLA handle the names of the distributions to be fitted and the formula specifying the model. As for the latter, the correct notation is the following: "exponential", "weibull", "lognormal" and "loglogistic"these do not directly match the flexsurv notation. As mentioned above, to avoid issues, survHE recodes internally the names given to the distributions. Thus, if the user specifies the additional option method = "inla" in the call to fit.models, then the string "exp" (which would be accepted by flexsurv) will be recoded to "exponential", which is required by INLA. Similarly if a distribution that is accepted by flexsurv is given by the user in INLA terminology but the method is either unspecified or specifically set to "mle" in the call to fit.models, then survHE will recode the name of the distribution in flexsurv terms.
With regards to the model formula, INLA requires that this is specified using the notation formula <-inla.surv(time, event)~...
where time and event are the variables in which the times and the event indicator are stored and ... is a suitable form for the combination of covariates to be used. Again, survHE tries to simplify the modeler's life by fixing the code provided for the formula, depending on the value specified for the option method. So if method = "inla" but the formula is specified using the flexsurv terminology, survHE will recode this to make it acceptable to INLA.
A suitable call to fit.models using INLA is the following.
m2 <-fit.models(formula = formula, data = data, distr = distr, method = "inla", ...) where distr is a vector of strings containing names from the four models available in INLA and ... indicates optional arguments. In particular, when method is set to "inla", it is possible to add the following arguments.
• dz: defines the step length for the grid search over the hyperparameters space (default = 0.1). As mentioned above, INLA estimates the value of the hyperparameters in the model (e.g., the shape of a Weibull distribution), using a grid search. The finer this grid, the more accurate (but more computationally expensive!) the resulting estimates for all the parameters, e.g., for both the shape and scale of the Weibull distribution.
• diff.logdens: defines the difference in the log-density for the hyperparameters to stop the numerical integration used to obtain the marginal posterior distributions (default = 5). Again, this is related to how the hyperparameters are estimated in the first stage of the nested algorithm. Decreasing this difference is likely to increase the computational time, since the estimation of the hyperparameters will become more accurate.
• control.fixed: defines the default for the prior distributions, unless specified by the user. By default, INLA assumes that "fixed effects" associated with covariates are modeled using a Normal with mean 0 and variance 1000, while the overall intercept is modeled using a Normal with 0 mean and even smaller precision. survHE overrules this and sets the precision of the covariates in the linear predictor to 1/5 2 = 0.04 -this is consistent with the default setting used when HMC is selected as the inferential engine.
• control.family: a list of options controlling the model for the observed data. If distr.inla is a vector, containing all or a subset of the models supported by INLA, this can be provided as a named list of options; for example: R> m2 <-fit.models(formula = formula, data = data, distr = distr.inla, + method = "inla", control.family = list(weibull = list(prior = + "gamma", param = c(.1, .1)), lognormal = list(initial = 0.1))) would instruct INLA to assume a Gamma(0.1, 0.1) prior distribution for the shape parameter of the Weibull model and to use an initial value of 0.1 for the approximation routine of the log-Normal model. Notice that the names of the elements of the control.family list must adhere with the survHE notation shown in the right-most column of Table 1.
Using INLA advanced controls is very powerful and allows much versatility in fitting the models. However, some knowledge and understanding of the INLA syntax and philosophy is required. Guidance is provided in the INLA help functions as well as, for example, in Rue et al. (2009);Blangiardo, Cameletti, Baio, and Rue (2013); Blangiardo and Cameletti (2015) and Wang, Ryan, and Faraway (2018).
Back to the running example, we may fit the models in INLA using the following code.
R> m2 <-fit.models(formula, data, c("exp", "weibull", "llogis", "lnorm"), + method = "inla", control.family = list(exponential = list(), + weibull = list(prior = "gamma", param = c(.1, .1)), + loglogistic = list(), lognormal = list(initial = 1))) Note that it may be necessary to fiddle with the control.family option to successfully fit some of these models. This highlights the fact that, because the process of economic evaluation is embedded within a fundamentally statistical analysis, the modeler needs a thorough understanding of the underlying statistical tools -perhaps even more so when using a Bayesian approach, but, it is our strong belief, under any modeling circumstances.

Hamiltonian Monte Carlo
Hamiltonian Monte Carlo (HMC; Radford 2011; Gelman, Carlin, Stern, Dunson, Vehtari, and Rubin 2013; Kruschke 2015; Betancourt 2012) is one of the algorithms belonging to the general class of MCMC methods. In a nutshell, HMC is based on the physical concept of Hamiltonian dynamics, which can be used to model the idealized situation of a frictionless particle sliding over a surface of varying slope. Basically, the movements of the particle depend on: (i) the potential energy, a function of its current location l, which is proportional to the height of the surface at the current position; and (ii) the kinetic energy, a function of its momentum m, depending on the mass of the particle. The way in which these movements happen can be described by a set of ordinary differential equations: this means that if we are able to compute the derivatives of these two functions and given a set of initial conditions specifying the starting location l 0 and momentum m 0 , at time t 0 , then we can predict the location and momentum of the particle at any point in time, by simulating these dynamics for a given duration.
Leaving aside all the technical difficulties, the basic intuition behind HMC is the following: the surface of interest is the unnormalized posterior log-density for the parameters in the model In general, we are not in a position of knowing the target distribution logp(θ | t) exactly and in closed form 1 . Moreover, even if we were, this would only be proportional to the actual posterior density for the parameters (because the expression above is computed without rescaling by the marginal log-density for the observed data t; for this reason, we use the notationp).
However, both log-densities on the right-hand side of (3) are known because they are part of the model specification. If we can compute the derivatives of logp(θ | t), given initial values for the location of the parameters θ and their momentum, we can simulate Hamiltonian dynamics. As it turns out, this is extremely efficient at exploring the (negative) posterior logdensity, by proposing a move to a new position that is determined by letting the "particle" slide over the density.
This means that if the current position is far away from the portion of the parametric space where most of the probability mass lies, the potential energy is large and thus the "particle" will have higher speed when sliding over a very steep surface. More specifically, unlike simpler (and far less efficient methods) such as the Metropolis algorithm, the proposed moves will not necessarily be characterized by symmetrical distributions and will tend to be pulled towards the mode of the joint posterior distribution more quickly.
Especially for models where the size of θ is very large, in comparison to other MCMC algorithms such as Gibbs sampling, HMC often proves to be very efficient in computational terms. This is mainly due to the fact that it updates the joint distribution of all the model parameters at once, instead of sequentially looping through each conditional distribution for one parameter given the observed data and all the other parameters. In addition, in comparison to other acceptance-rejection methods (e.g., the simpler Metropolis-Hastings), HMC is capable of adapting the proposal to non-symmetrical distributions, which translates to a faster rate of convergence to the target posterior, as well as faster decay in autocorrelation. These properties mean that Bayesian inference can be performed on models of arbitrary complexity (thus extending the limitations of INLA), at a reduced computational time and improved convergence, with respect to Gibbs samplers.
In practice, HMC can be very complex, because in addition to the specific computation of possibly complex derivatives, it requires fine tuning of several parameters. However, rstan provides a very clever system in which most of the adaptation is automatic. The user can still specify some of the basic inputs (and at times this is crucial to improve, or even reach convergence to the target posterior distributions), but rstan is a very general system to perform HMC estimation on a very wide range of models.

Using survHE to fit models with HMC
Much of the work performed by rstan consists in determining a set of derivatives from the model structure, that are used to apply the Hamiltonian dynamics and explore the parametric space in an efficient way. This requires a preparatory step, which rstan does by compiling the model in C++ (via R). This step can be quite lengthy, but interestingly it is possible to pre-compile a model -if all that changes is the data (but not the structure and the distributional assumptions), then the pre-compiled model can be used directly, thus saving substantial computation time.
This is another attractive feature of rstan, because it means that survHE can pre-compile all the standard models presented in Table 1. Thus, it is possible to estimate them by using a command such as the following.
R> m3 <-fit.models(formula, data, distr = mods, method = "hmc") In this case, survHE will perform the following steps: 1. Format the original data contained in the R object data in a way that can be used by rstan;  2. Select a pre-compiled model code, depending on the distributional assumptions; 3. Call rstan in the background to sample from the posterior distribution of the model parameters.
As with any MCMC estimation, it is important to thoroughly assess convergence -we return to this point in Section 6. would produce respectively a graph with the "traceplots" for the relevant variables, as shown in Figure 2(a) and an autocorrelation plot, shown in Figure 2(b). In this case, we can be confident that convergence is satisfactorily reached for all the variables monitored, since the traceplots show good mixing in the two chains; autocorrelation does not seem a major problem either, as the level of dependence in consecutive iterations wanes down relatively quickly. Additional model checking tools are also available in the package shinystan (Stan Development Team 2016), an add-on to rstan, which creates a web-app that the user can access locally through the default web browser. Ultimately, we reiterate that it is the utmost responsibility of the modeler to ensure that the procedure has reached convergence and the output can be used safely.
There are several additional options that can be used when the inferential method is specified to "hmc", which we describe in the following.
• chains: the number of chains to run in the HMC (default = 2); • iter: the total number of iterations (default = 2000); • warmup: the number of "warm up" iterations (default = iter/2). The warm up is the adaptive phase in which the basic inputs of the HMC procedure are tuned. rstan does that automatically and once this stage is completed, the procedure is ready to immediately sample from the posterior distributions of interest; • thin: the number of thinning (default = 1). For example, setting thinning to some value h, consists in selecting for inference every one in h simulations from the posteriors and can sometimes reduce the level of autocorrelation (for an equally large number of iterations used for the final estimation); • control: a list specifying rstan-related options, e.g., control = list(adapt_delta = 0.85), which can be used to tune more finely the acceptance rate in the HMC procedure (the closer this rate is to the upper limit of 1, the less likely there are to be numerically unstable simulations); • seed: the random seed (to make the analysis replicable); • pars: a string vector with the names of the relevant parameters. By default, survHE selects the location and ancillary parameters, as well as the coefficients associated with the covariates included in the model; • include: a logical indicator (if set to FALSE, then the parameters specified in pars are not saved); • priors: a list (of lists) specifying the values for the parameters of the prior distributions in the models (see the example below); • cores: the number of CPU (cores) used to run the sampling procedure using rstan (default = 1); • save.stan: a logical indicator (default = FALSE). If TRUE, then saves the model text file(s) to the user's working directory. These can be used as templates to modify the basic model structure.
In practice, the user should not need to fiddle much with these optional arguments -certainly not without a clear understanding of the underlying modeling assumptions and the implications of any change to the default structure. Perhaps the default number of chains or iterations may be increased; or may be specific numeric values for the parameters of the prior distributions could be defined.
For instance, the default prior for the linear predictor coefficients is β = (β 0 , . . . , β J ) ∼ Normal(µ β , σ β I J+1 ), where: µ β and σ β are vectors of size (J + 1); and I J+1 is the (J + 1) × (J + 1) identity matrix (see Table 2). Suppose the user wanted to select a smaller standard deviation for the Generalized Gamma model; this can be done using the following command -the list containing the custom values for the priors needs to be named, using the survHE terminology highlighted in Table 1. 3 R> m4 <-fit.models(formula, data, distr = "gengamma", method = "hmc", + priors = list(gengamma = list(sigma_beta = rep(5, 2)))) Note that we need to specify the values for the standard deviation for all the J = 2 covariates (the intercept and the treatment arm) and so in this case we define sigma_beta = rep(5, 2), i.e., a vector of two elements, each equal to 5. Of course, there is nothing special about the value 5 and we could also select different values for the intercept and the treatment arm, e.g., sigma_beta = rep(10, 2).
It is also possible to specify multiple values to modify the priors, for example R> priors <-list(exp = list(sigma_beta = rep(4, 2)), + wei = list(mu_beta = rep(2, 2))) R> m4 <-fit.models(formula, data, distr = mods, method = "hmc", + priors = priors) would instruct survHE that the user wants to set: (a) the value for the standard deviation of the parameters β to 4, in the first model to be considered (Exponential); and (b) the value for the mean of the parameters β to 2, for the second model (Weibull).
Because the number of models in mods is 6, then survHE will complete the list priors by adding 4 more empty lists and survHE will use the default values for the remaining models. Consequently, it is important that the user specifies the required values in the correct order. For instance, if we wanted to specify σ ∼ Gamma(2, 4) for the Generalized Gamma model (the sixth in the list mods), we would need to make sure that this information is contained in the sixth element of the list priors. This could be done by using the following code R> priors <-vector("list", 6) R> names(priors) <-mods R> priors[[6]] <-list(a_sigma = 2, b_sigma = 4) which creates 5 empty lists to be associated with the first five models and the required list of values for the sixth one. Note that we still need to provide names for the elements of the list prior, so that survHE can figure out which element should be used for a given model among those specified in the call to fit.models. Even more succinctly, the same goal can be achieved by typing the following code R> priors <-replicate(5, list()) R> priors[[6]] <-list(a_sigma = 2, b_sigma = 4) R> names(priors) <-mods -although we note that, in general terms, when it is necessary to specify complex options (such as the definition of the priors), it is perhaps a better idea to use one single distribution in the call to fit.models.
Then we can run survHE with the same command as before.

Model
Location parameters

Ancillary parameters
Natural parameters Shape (1) = q a 1 = a_sigma = 0.1, b 1 = b_sigma = 0.1 Shape (2) = p a 2 = mu_P = 0, b 2 = sigma_P = 0.5 a 3 = mu_Q = 0, b 3 = sigma_Q = 2.5 Table 2: A summary of the default assumptions used for the models defined by survHE using rstan. Table 2 shows a summary of the distributional assumptions used to define the default priors in the models implemented by survHE using rstan, together with the names assigned to the parameters of these distributions. For instance, if we wanted to specify a Normal(1, 4) prior for the ancillary parameter of the Gompertz model (the fourth in the vector mods), we would need to specify the following command.
R> priors <-replicate(6, list()) R> names(priors) <-mods R> priors [[4]] <-list(mu_alpha = 1, sigma_alpha = 4) The use of informative priors is one of the examples in which external information can be included in the model, to stabilize the inference and complement the limited evidence provided by the data -although, again, it is the responsibility of the modeler to understand the specification and the implications of these assumptions into the model.
If the option save.stan is set to TRUE, then survHE will also save the model code as a text file (with the extension .stan) in the current directory. The data list formatted in a way that rstan can use is also automatically stored in the element $misc$data.stan inside the output of fit.models. The user can then modify the model structure starting from this templatefor example it is possible to change the distributional assumptions and use, e.g., a Uniform prior for the scale σ of a Generalized F model. This will require a new compilation and, at present, the new model has to be run using rstan commands directly. Again, this illustrates the process of adapting relatively standard models to more complex situations, in which there is the need to bring external information to bear, although this may mean that the user needs to have a more thorough understanding of the underlying statistical problem.

Summarizing the results from survHE
Objects of class 'survHE' (such as m1, m2, m3 and m4 above) have associated methods such as print, summary and plot that can be used to summarize and visually inspect the results of the models analyzed. We describe them in the following.

Tabular form
When the models have been estimated, we usually want to summarize the estimates using a tabular format. survHE has a specialized function print that can do this, e.g., by typing R> print(m1) which returns the following In this case, the object m1 contains many models; but unless the user specifies which one to print, survHE will assume that the first one should be used. If, for example, we wanted to visualize the estimates for the log-Logistic model (the fifth element of the string vector distr), then we would need to type R> print(m1, mod = 5) which would return the following output. In both cases, survHE standardizes the format of the output, so that the results are reported for the "basic" parameters (e.g., rate, shape or scale) as well as the covariate effects. Notice that the "basic" parameters are always reported on the natural scale, while the covariate effects are in the scale defined by the linear predictor (as presented in Table 1). Thus, in the cases presented above, the value of the coefficient as.factor(arm)1 represents the impact of the treatment arm on the log scale, because both for the Exponential and the log-Logistic the location parameter is modeled using a log link -and thus in Table 1 we write µ i = exp(. . .).
The print method for 'survHE' objects has an additional option, which allows the user to visualize the summary of the model results in the original notation used by the relevant package used to perform the estimation. Thus, in this case typing R> print(m1, mod = 6, original = TRUE, digits = 3) would show the results for the Generalized Gamma model (the sixth in the string vector mods) using the original flexsurv formatting and using only 3 significant decimal places. For HMC models, even more importantly, when using the option original = TRUE we are able to look at helpful convergence statistics, which should be used to assess whether the MCMC procedure has been successful in exploring the relevant posterior distributions. For example, we could use the code R> print(m3, mod = 2, original = TRUE, digits = 6) which returns the following output. Inference for Stan model: WeibullAF. 2 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=2000. The effective sample size gives an indication of the underlying autocorrelation in the MCMC samples -values close to the total number of iterations, or at any rate not too low, indicate a low level of autocorrelation (which is what we want). The PSR is an analysis-of-variance-type of statistics, indicating for each variables whether convergence is reached. If the procedure is ran on more than one parallel chain, then Rhat is computed as a function of the ratio of the variance within to the variance between chains -if this is close to 1 and definitely less than 1.1, convergence can be satisfactorily declared. If not, it may be necessary for example to increase the number of iterations.

Call
A further optional inputs to the print methods is digits (default = 6), which determines the number of digits printed in the table.

Visual interpretation
Once an object of the class 'survHE' has been created using the command fit.models, it is possible to visualize the resulting survival curve(s) by simply using the plot method. survHE uses ggplot2 (Wickham 2016) as the graphical engine. This means that the graphs produced by survHE are highly customizable. However, survHE simplifies the modeler's work as most of the code needed to produce the graphs is already implemented in the plot method.
For example, the command R> plot(m1, add.km = TRUE) displays the graph shown in Figure 3(a). The option add.kM = TRUE instructs survHE to add a Kaplan-Meier estimate to the plot (in this case, one for each treatment arm is produced as the grey areas undelying the colored curves).
The plot method allows to display the output from different models. For example, we may be interested in comparing the survival curves from some of the models obtained using either MLE or HMC. We could do this by using the following command.  R> plot(m1, m3, mods = c (2,3,8,9), colour = c("blue", "green", "red", + "yellow"), lab.model = c("Weibull (MLE)", "Gamma (MLE)", + "Weibull (HMC)", "Gamma (HMC)"), lab.profile = c("Control","Treatment"), + add.km = TRUE) This command instructs survHE to stack together the objects m1 and m3, which basically produces a single 'survHE' object containing 12 models (6 from m1 and 6 from m3). The option mods = c (2,3,8,9) selects the models in positions 2, 3, 8 and 9 (i.e., the second and third from m1 and then the second and third from m3). The option colour can be used to select the colors with which to plot each curve on the graph. The options lab.model and lab.profile can be used to customize the graph legends. The former specifies the labels to be associated with the several models plotted, while the latter customizes the "profile" of covariates depicted. The resulting graph is shown in Figure 3(b). 4 Notice that, in this particular instance, the differences in the frequentist and Bayesian analysis are not striking. This is due to the fact that we have considered relatively vague priors in combination with mature data (notice that the Kaplan-Meier estimates are fairly close to 0 for both arms, indicating that, in this study, most individuals have experienced the events and the number of censored subjects is not large). This means that the observed data are in fact driving the overall inferential process, thus limiting the impact of the priors and consequently the differences between the MLE and the Bayesian analysis. We note again however that, particularly when data are not mature and are subject to a large amount of censoring, the inclusion of genuine prior information does contribute more substantially to the final inference.
The full list of the options for the method plot in survHE is given below.
• add.km: a logical variable. If TRUE (the default value), then also add the Kaplan-Meier estimates of the data to the plot; • newdata: a list (of lists) providing the values for the relevant covariates to stratify the survival curves. If NULL, then survHE will use the mean value for all the covariates if at least one is a continuous variable, or the full combination of the categorical covariates, in line with flexsurv; • xlab: a string with the label for the x-axis (default = "Time"); • ylab: a string with the label for the y-axis (default = "Survival"); • lab.profile: a (vector of) string(s) indicating the labels associated with the strata defining the different survival curves to plot. This defaults to the value used by the Kaplan Meier estimate given in fit.models; • t: a vector specifying the time horizon over which the survival curves should be computed and plotted. It defaults to the observed time range in the data, but for instance, the user can specify the extrapolation to time t = 60 using the code plot(m1, t = seq(0,60)). The range of the input t determines also the x axis limits in the plot; • colors: a vector of characters defining the colours in which to plot the different survival curves; • annotate: a logical indicator (defaults to FALSE), describing whether survHE should also add text to highlight the observed vs extrapolated data in the survival curves; • legend.position: a vector of proportions to place the legend. Default to c(.75,.9), which means 75% across the x-axis and 90% across the y-axis; • legend.title: suitable instructions to format the title of the legend. This defaults to element_text(size = 15, face = "bold"). Thus, for example, the command plot(m1, legend.title = element_text(size = 20)) would increase the font size for the title of the legends (i.e., the strings "Models" and "Profile") up to 20 points. Further customization is possible, but requires knowledge of ggplot2; • legend.text: suitable instructions to format the text of the legend; It defaults to element_text(colour = "black", size = 14, face = "plain"), but again can be further customized using ggplot2 commands; • nsim: if this input is set to a value greater than 1, then survHE will automatically construct simulations for the entire distribution of the underlying survival curves (see Section 7 for more details). By default, survHE sets this to 1 and only plots the average of the distribution for the survival curves.
Most of these options are actually trivial; perhaps only newdata deserves a more detailed explanation. survHE follows the philosophy of flexsurv to construct the survival curves. Consequently, when the model contains categorical covariates, a single survival curve is estimated for each combination of their modalities. Consider for example the following command.  As shown in Figure 4(a), this simple model already becomes complex to visualize, because there are 4 strata identified by the two covariates, each of which is binary. Things become even more complicated if we mix a continuous and a categorical covariate, e.g., treatment and age. Thus, it may be helpful then to plot the results using a different strategy. For example we can define a list in which we specify the value for the covariates that we want to use to compute the survival curves, e.g., R> m5 <-fit.models(Surv(time, event)~as.factor(arm) + age, + data = data, distr = "exp") R> newdata <-list(list(arm = 0, age = mean(data$age)), + list(arm = 1, age = mean(data$age))) which would create two "profiles" by varying the treatment arm and keeping the value for sex to the observed average in the data. The resulting plot can be obtained using the following code.
R> plot(m5, newdata = newdata) which produces the graph in Figure 4(b). When the option newdata is used, it is probably best to plot a single model in a graph -even if the 'survHE' object contains many (just like m1, m2 or m3), the option mods can be used to select only one of them, as shown above. When selecting the "profile" to be specified in the list newdata, the user needs to specify a value for all the covariates that appear in the model formula, although the order in which they are entered is not important.
As all survHE graphs are ggplot2 objects, they can also be manipulated using ggplot2 functions and commands. So for instance, the user can save a survHE plot to a named object and then add ggplot2 customization, such as in the following example R> p <-plot(m1) R> p + theme_classic() which would draw the graph in a template similar to the one used by base R. Again, knowledge of ggplot2 is required in order to take full advantage of its functionalities -but we reiterate here that survHE also takes care of many of the basic needs of the modelers.

Estimation of the mean survival time
As mentioned earlier, in a health economic evaluation it is of interest to estimate the mean survival time -this is the quantity that is relevant to determine for example the effectiveness of a given intervention. For some of the parametric models described above, the mean survival time can be computed analytically. For example, for a Weibull model, this is where µ is the scale, α is the shape and Γ(·) is the Gamma function.
More generally, it is possible to approximate this quantity using the "trapezium rule" over a large enough time horizon as where T = {0, (t + τ ), (t + 2τ ), . . . , (t + Kτ )} is a discrete set of (K + 1) time points and τ is an arbitrarily small increment. Notice that in order to approximate the mean sufficiently well, it is important to extend the range T long enough so that all the survival curves actually fade out to 0. Also, the smaller the increment τ , the more trapezoids are fitted under the survival curve and thus the better the approximation. In practice, there is a trade-off between the level of approximation and the computational time required for the calculation.
survHE automatically performs this calculation by means of the method summary; so for example, the R code R> summary(m1) produces the following output.
Estimated average survival time distribution* mean sd 2.5% median 97.5% as.factor(arm)=0 9.929582 0.6185205 8.778254 9.90262 11.11727 as.factor(arm)=1 12.753808 0.6510119 11.390461 12.79599 13.94223 *Computed over the range: [ 0.030-20.920] using 1000 simulations. NB: Check that the survival curves tend to 0 over this range! Because the user has not specified a time range over which to compute the mean survival, survHE assumes the observed range of times (in this case [0.03-20.92]). As is obvious from Figure 3, in this range the survival curves have not reached 0 and thus the estimated mean survival is certainly biased downwardly. To correct this, it is sufficient to use the code R> summary(m1, t = seq(0, 60)) which instructs survHE to compute the means over a range of times between 0 and 60, with default unit increments. The resulting values are substantially different. for the control and the treatment arm, respectively (with β 0 the intercept and β 1 the treatment effect as estimated in m1). The estimate produced by the summary command can be improved by selecting a longer time horizon and/or a lower value for the increment τ , e.g., t = seq(0, 100, 0.1). It is also possible to increase the number of simulations used to characterized uncertainty in the underlying parameters (and hence survival curves -we return to this point in Section 7). The default value of 1000 can be modified by specifying the optional argument nsim to a different value. Similarly, it is possible to specify a specific "profile", in terms of the covariates included in the model by using the optional argument newdata. So for example, the command R> summary(m1, t = seq(0, 60), newdata = list(list(arm = 1))) produces an estimate of the distribution of the mean survival time for the treated. Finally, the user can also include a vector labs in the call to summary, which contains a suitable number of text strings that can be used to label the values in the resulting table, for example labs = c("Controls", "Treated").

Model assessment
Health economics guidelines suggest that the assessment of the models is done by complementing the visual inspection with some more formal testing. The suggested way is through the use of a specific information criterion (IC), such as Akaike IC (AIC) or the Bayesian IC (BIC). These statistics are based on the model deviance −2 log L(θ) and a penalty function, which typically depends on the number of parameters and the complexity of the model -the rationale being that models that are too complex tend to "overfit" the data; this means that they may do very well at estimating the data at hand, but usually have poor predictive ability of other data.
An additional IC specific to Bayesian models is the deviance IC (DIC), proposed by Spiegelhalter, Best, Carlin, and Van Der Linde (2002)  In panel (c) of Figure 5, we specify that the bar plots should be "stacked", which means that models using the same distribution from different objects are grouped together. The objects can be named (as for the main plot function), which automatically creates a label to associate with the colors with which the bars are plotted. If nothing is specified, then survHE uses the generic labels Object1, . . . , ObjectN (assuming there are N survHE objects to be plotted). The option name_legend specifies the title of the legend. If nothing is specified, then survHE uses the default label survHE object.
In panel (d) of Figure 5 for some of the models the bar is not plotted: this is because for model m1 the DIC cannot be computed, as it is fitted using MLE.
5 In fact, for Bayesian models obtained via HMC, survHE also computes the DIC using the slightly different definition suggested by Gelman et al. (2013); the two versions of the DIC may differ in the presence of extreme asymmetry or multi-modality in the posterior distributions. The alternative estimates for the DIC are stored in the element $model.fitting$dic2 of a 'survHE' object, but are only present if the inferential method is set to "hmc".
6 Notice that rstan does not provide measures of model fit based on information criteria. There are several arguments to prefer other methods to assess the performance of a statistical model, for example the posterior predictive check, as discussed in Gelman et al. (2013). However, because NICE guidelines suggest using AIC and BIC (and, by extension, DIC), survHE computes the relevant model fit statistics and reports them using the print method and the model.fit.plot function.

Model comparison based on AIC
(c) Stacked model fit plots for selected models in m1 and m3, using the AIC.  There are many optional arguments to the model.fit.plot function: • type: a string specifying the statistic to be used (possible values are "aic", "bic" or "dic"); • scale: if scale = "absolute" (default), then plot the absolute value of the relevant IC. If scale = "relative" then plot a rescaled version taking the percentage increase in the *IC in comparison with the best-fitting model; • colour: specifies a vector of strings to assign colors to the bars. In fact, the user can use the code col or color interchangeably; • stacked: a logical indicator, defining whether the bars should be stacked and grouped by 'survHE' objects (default = FALSE); • name_legend: a string defining the title of the legend that is added to the bar plot when stacked is set to TRUE; • main: a string specifying the title of the plot; • models: a (vector of) string(s), specifying the label to associate with the models being plotted.
As a final note, we stress here that while model fitting statistics are useful to rank the plausibility of the distributional assumptions tested, they cannot inform the quality of the extrapolation of the survival curves. This is because any IC is based on the observed data and quantifies how well a specific model fits to them. Thus, there is no measure as to how a model performs on data that are as yet unobserved, based on such criteria. For this reason, it is recommended that plausibility of the extrapolation is judged using a combination of visual inspection, formal testing and the effectively unavoidable inclusion of external information and clinical opinions -again suggesting the usefulness of the Bayesian approach.

Probabilistic sensitivity analysis
Unlike standard epidemiological analysis where the objective is often to estimate the effect of some relevant covariates on the survival time, in health economic evaluation the goal is rather to produce an estimate of the entire survival curve over a long period of time (or at any rate, a longer period than the observed follow up). This estimate is then used to populate the economic model, e.g., by obtaining estimates of the transition probabilities between states in a Markov model or, perhaps even more commonly, in a partitioned survival analysis (Woods, Sideris, Palmer, Latimer, and Soares 2017). More importantly, we need to quantify the impact of uncertainty in the model parameters on the decision-making process and thus we typically repeat this exercise for a large number of times, upon varying the value of the parameters that determine the survival curves. In a fully Bayesian approach this uncertainty is induced by the full joint posterior distribution of the parameters.
survHE is designed to perform this task directly in R, through the function make.surv. A typical call is as follows R> psa <-make.surv(fit = m3, nsim = 1000, t = seq(.1, 63)) which generates an object psa containing, among other things, nsim = 1000 simulations for the survival curves. In the above case, m3 contains in fact 6 different models (upon varying the distributional assumptions), but because the user has not specified a value for the input mod (in this case a number between 1 and 6), make.surv uses the first one, i.e., the Exponential, by default. Adding the option mod = 6 to the call to make.surv would consider the sixth model (Generalized Gamma) instead.
The resulting output psa can be accessed directly by the user. The command R> names(psa) shows that it comprises several elements [1] "S" "sim" "nsim" "mat" "des.mat" "times" each of which can be accessed using either the "dollar" or the "double bracket" notation in R e.g., psa$S or psa [[3]]. A brief description of each of these elements is given in the following.
• S: a list -for each simulated value of the parameters, a list with the survival curves associated with the configuration of the covariates; • sim: simulated values for the main parameters (e.g., scale, shape, rate, mean, sd) for each configuration of the covariates; • nsim: the number of simulations saved; • mat: a list -for each configuration of covariates a matrix with nsim rows and as many columns as time points with the survival curves (to be read row-wise); • des.mat: a design matrix with the combination of the covariates used (each represents an element in the lists S and mat).
• times: the vector of times used in the computation. By default, this is the set of times in the data passed as argument to fit.models, but the user can specify a different set of times.
The reason why make.surv creates so many outputs is mainly for internal convenience. In fact, this is a central function in how survHE works and it is called internally by other utility functions (e.g., plot and print). By and large, the user does not need to manipulate the output directly.
The list of inputs for the make.surv function is as follows.
• fit: the result of the call to the fit.models function, containing the model fitting (and other relevant information), e.g., m3 from above; • mod: the index of the model. The default value is 1, but the user can choose which model to visualize, if the call to fit.models has a vector argument for the input distr; • t: the vector of times to be used to create the survival curves. By default, survHE uses the times observed in the original data, but the user can specify a longer follow up (this is actually a useful feature for the purpose of performing a health economic evaluation and PSA); • newdata: a list (of lists), specifying the values of the covariates at which the computation is performed. For example list(list(arm = 0), list(arm = 1)) will create two survival curves, one obtained by setting the covariate arm to the value 0 and the other by setting it to the value 1. In line with flexsurv notation, the user needs to either specify the value for all the covariates or for none (in which case, newdata is set to NULL, which is the default). If some value is specified and at least one of the covariates is continuous, then a single survival curve will be computed in correspondence of the average values of all the covariates (including the factors, which in this case are expanded into indicators); • nsim: the number of simulations from the distribution of the survival curves. The default is at nsim = 1, in which case the point estimate for the relevant distributional parameters is used and the resulting "average" survival curve computed.
To visualize the results of the PSA, survHE has a specialized function called psa.plot, which can be used, for example as follows.

R> psa.plot(psa)
This command produces the graph in Figure 6(a). The graph shows the average survival curve and 95% interval estimates around them. If the method is set to "mle", then the intervals are obtained by multivariate Normal bootstrap, while for the Bayesian models they are obtained using samples from the relevant posterior distributions.
The user can specify the labels for the x-and y-axis (respectively by including the additional arguments xlab = "..." and ylab = "..."). In addition, if no color specification is given by the user, e.g., in the form col = c("blue", "red"), then survHE will randomly choose a coloring scheme. Finally, the parameter alpha (default at 0.1) determines the transparency of the curves; values for alpha close to 0 imply greater transparency, while values closer to 1 create a solid plot (with no transparency). A fully customized call to psa.plot is as follows and produces the graph in Figure 6(b).

Exporting the results to Microsoft Excel
Once the PSA samples are available for the survival curves, it may be easy to continue building the economic model using R. In fact, this is the strategy that we advocate (Baio and Heath 2016). However, we acknowledge that practitioners use Microsoft Excel to produce the economic assessment. Thus, survHE has a specialized function, called write.surv, that allows the user to export the simulations for the survival curves to a .xls(x) file 7 , so that they can be easily used when constructing a Markov model or a partitioned survival analysis in Microsoft Excel.
As mentioned above, the user can actually manipulate the output of the call to make.surv independently and so in a way bypass write.surv entirely. However, survHE tries to simplify the work process and can be used as follows.  R> write.surv(psa, file = "temp.xlsx") which produces the following output in R 1000 simulation(s) for the survival curve written to file: temp.xlsx Profile(s): as.factor(arm)1=0 as.factor(arm)1=1 and creates a file containing the relevant simulations. The R output clarifies that the resulting file contains two spreadsheets: the first one is for the survival curves considering the treatment arm set to 0 (e.g., controls), while the second one is for the intervention arm (set to 1). Different model specifications would create a different number of matrices (with nsim rows and as many time points for columns) depending on the covariate combinations.

Advanced models
While the models presented in Table 1 are likely to produce at least one "good" candidate in most situations, it is possible that more complex model structures may be needed to accommodate a particular dataset or analysis. The main advantage of using these more complicated models is that they are able to define hazard functions that can better approximate the lifetime risk of the specific population under study, potentially including competing causes or external knowledge that can be used to anchor the survival probabilities below a certain threshold (e.g., the healthy population or a subset of the population affected by similar diseases or treated with drugs with similar action).
In particular, we focus here on two "flexible" models: the first one (which is already implemented in flexsurv) is based on cubic splines, while the second is an extension of the standard Weibull model.

Royston-Parmar splines
Splines are numeric functions typically defined as a collection of local polynomials; the main idea is to partition the x-axis into a set of intervals defined by "knots" and then within each interval use a different polynomial function to approximate the underlying "true" function on the y-axis. This construction provides great flexibility and effectively an arbitrary number of parameters, depending on how many knots (and hence on the density with which the x-axis is partitioned) are selected.
In the context of survival analysis, splines can be used to model flexibly (a suitable transformation of) one of the basic functions, e.g., the survival or the hazard (Royston and Lambert 2011). An increasingly popular model is the one developed by Royston and Parmar (2002). Basically, this defines a probability distribution to model the observed and censored times as a function of an "augmented" set of parameters θ = (β, γ) and covariates (X, B). Here, β are the coefficients associated with the observed covariates X, exactly as in (2). In addition, for each individual (and, hence, observed time) in the dataset, we consider a set of (M + 2) "basis" function The vector of knots is defined as k min = 0 < k 1 < . . . < k max = ∞; typically, the M "internal" knots are set up in terms of the quantiles of the observed distribution of the times. For example, if we set M = 3, survHE would automatically consider the three quartiles (q 1 , q 2 , q 3 ), representing the 25%-, 50%-and 75%-percentiles of the observed times distribution.
The Royston-Parmar (RP) model defines a modified linear predictor which is used to model directly log (− log S(t i )) = log H (t i ) = η i (notice that we use the [+ . . .] notation here to highlight the fact that the model may not include any covariate X and thus only rely on the splines structure) 8 .
Given this set up, it is possible to prove that and with η i = M +2 m=1 γ m B im -the first derivatives B im are easy to compute, recalling that ∂x 3 ∂x = 3x 2 . Substituting (4) and (5) into (1), we can completely characterize the resulting likelihood.
The RP model is directly available in flexsurv and is also implemented under a pre-compiled HMC-based Bayesian framework in survHE. The two versions can be obtained via a specialized call to fit.models.
R> formula <-Surv(time, event)~as.factor(arm) R> m6 <-fit.models(formula = formula, data = data, distr = "rps", k = 2) R> m7 <-fit.models(formula = formula, data = data, distr = "rps", k = 2, + method = "hmc") survHE accepts the string "rps" to indicate the "Royston-Parmar splines" distribution and also requires the input k to specify the number M of internal knots to be used (if this is not provided by the user, survHE assumes that k = 0, which reduces the RP model to a Weibull PH formulation). The formula is used to specify the "fixed" component of η i , i.e., the set of covariates in X. In this case, we use only the treatment arm.
All the usual methods are available for the resulting 'survHE' objects m6 and m7. The Bayesian version of the RP model is much more computationally intensive, although HMC does a very good job and keeps the time to generally acceptable levels; also, it helps in this case to take advantage of the multi-processing capability of rstan: adding the option cores = 2 reduces the time from 57.851 to 32.777 seconds, for 2 chains of 2000 iterations each.
In this case, there is very good agreement in the point and interval estimates for the two versions of the model, but in general the MLE may underestimate the underlying level of correlation among the γ coefficients in particular.

Poly-Weibull
In a nutshell, the Poly-Weibull model (Berger and Sun 1993;Demiris, Lunn, and Sharples 2015) extends the basic set up of a Weibull survival model by accounting for the possibility that in fact the observed times are the result of a mixed data generating process, depending on several independent Weibull components. For example, we may consider that the occurrence of the event under study depends on M independent causes and that we are willing to model each using a suitable Weibull distribution. In line with (1), the resulting density is where θ = (θ 1 , . . . , θ M ) and θ m = (α m , µ im ) are the shape and scale for the mth component of the mixture.
survHE implements this density as an add-on model to be estimated using HMC and rstan (i.e., there is only a Bayesian version for this model). While it is in principle possible to model both the shape and the scale as functions of a set of covariates, survHE considers the simpler version where only the location parameter is allowed to depend on X. It is fairly easy to modify this structure and implement a version of the rstan model in which also α m depends on the covariates.
Practically, survHE has a specific function to run the Poly-Weibull model. A typical call is as in the following.
R> formula.pw <-list(Surv(time, event)~1, + Surv(time, event)~as.factor(arm)) R> m8 <-poly.weibull(formula.pw, data, cores = 4) The main difference with respect to the standard call to fit.models is that the formula input now needs to be made by a list of objects of class 'formula'. This is because we need to specify a formula for each of the components that we want to fit to the mixture model identified by the Poly-Weibull distribution. For instance, in the above example, the object formula.pw is a list with two elements. This instructs R to assume a model with M = 2 components and the following specification for the linear predictors µ i1 = exp (β 01 ) and µ i2 = exp (β 02 + β 12 Arm i ) .
By default, survHE places minimally informative priors on the parameters β m by using β m iid ∼ Normal(0, 10); the values for the mean (mu_beta) and the standard deviation (sigma_beta) can be modified using the option prior, as shown earlier. In addition, we need to impose an identifiability constraint on the shape parameters α = (α 1 , . . . , α M ) so that they are ordered (i.e., α 1 < . . . < α M ) -see Demiris et al. (2015) for a discussion of this issue. The components of the vector α are then given a vague prior over their entire domain.
As is often the case for mixture models, convergence to the posterior distributions may be a crucial issue. This is essentially due to the fact that it may be very difficult (or even impossible) for the model to distinguish two or more of the components. For example, the above call to poly.weibull returns the following warning variances and tail quantiles may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#tail-ess indicating that the HMC algorithm has failed to fully explore the posterior density of the parameters. Possible solutions are to either include more information in the priors (e.g., by reducing the range of variation of the coefficients in the β m ), or to increase the value of the acceptance rate (see Section 5.2). For example, the command R> m9 <-poly.weibull(formula.pw, data, cores = 4, control = list( + adapt_delta = .95, stepsize = .005, max_treedepth = 100), iter = 4000) modifies the standard rstan settings to have a denser discrete approximation of the continuous Hamiltonian dynamics 9 . This increases the running time from 79.608 to 225.776 seconds, but successfully estimates the posterior distributions. The results can be analyzed by using the print method. The resulting table identifies the parameters of each component by using a suffix "_m", so for instance (Intercept)_1 is the intercept for the first component.

R> print(m9)
Notice that in this particular case, using a more complex model does not seem to improve things substantially: for example, the DIC for the simpler Weibull model we fitted in the 9 In brief, a single iteration of HMC proceeds to update the value of the parameters for max_treedepth steps before deciding whether to accept or reject the current value. Each step is scaled by a factor stepsize, which determines the level of the discrete approximation to the underlying continuous Hamiltonian dynamics. Thus, increasing the target acceptance rate adapt_delta effectively forces smaller step sizes, meaning that the algorithm will be in principle more thorough in visiting the posterior density (at the expense of computational time). Similarly, increasing max_treedepth means that the algorithm will take more samples within each step, thus potentially being more accurate. In practice, it is important to strike a balance and avoid too fine a discretization of the Hamiltonian dynamics, while guaranteeing sufficient coverage of the posterior density. More technical details can be found in Hoffman and Gelman (2014) and Carpenter et al. (2015). second element of the object m3 is 1203.152 (cfr. Figure 5c) -essentially the same as for the Poly-Weibull model, indicating that a simpler version is perhaps to be preferred to this specification of the more complex one. This is consistent with the potential issues in convergence and identifiability, which again indicate perhaps that the Poly-Weibull model may not be appropriate for this particular set of data.

Digitizing Kaplan-Meier curves from published studies
Often, the individual level data from, e.g., a clinical trial measuring the survival times are not directly available for the health economic evaluation. Perhaps for one of the treatments being considered, the sponsor of the trial is able to make the data available, but this could only cover one of the relevant interventions/drugs. To overcome this limitation, usually modelers try and use systematic reviews of the available literature to gather information on the possible comparators.
One clever way of doing this is by "digitizing" Kaplan-Meier data available from published papers. This is done by using specific software (e.g., DigitizeIt; Bormann 2013); the user needs to click on several points on the survival curves and the values are digitized and exported to some output files, describing the input survival times from graph reading and reported number of individuals at risk at several time points in the follow up.
Once these two files are available, survHE takes them as input and following the algorithm developed by Guyot, Ades, Ouwens, and Welton (2012), which can be used to map from the digitized curves back to the unobserved Kaplan-Meier data by numerical approximation. Assuming that the DigitizeIt outcome is saved in the two files survival.txt and nrisk.txt in the current working directory, a typical survHE call to perform this task is the following.
Kaplan Meier data written to file: KMdata.txt IPD data written to file: IPDdata.txt These can be then used as input data to fit the survival models and then perform the PSA.
The final feature worth mentioning is that often we will be in a position of creating several individual level datasets to mimic data originally used to produce Kaplan-Meier estimates for the survival curves, e.g., for many different treatments, or for the same treatment observed in several papers. survHE has another specialized function that can be used to stack the different files with the individual level data into a single dataset. This function is called make.ipd and it takes as inputs: a list of all the names of the individual level data files created as output of digitise; the index of the file associated with the control arm (by default, this is the first file) and the control arm will be coded as 0; and a vector of labels for the column of the resulting data matrix. These should match the arguments to the formula specified for the function fit.models and should be 3 elements (each representing the time variable, the event variable and the treatment arm). Using the output of the digitization process shown above, a call to this function would look like the following.
R> data <-make.ipd("IPDdata.txt", ctr = 1, var.labs = c("time", "event", + "arm")) This generates a R 'data.frame', which can then be fed to the fit.models function. Naturally, make.ipd assumes that there are no other covariates in addition to the treatment arm, because it is unlikely that digitized data are recorded for different strata, or values of additional variables.
Note also that we can use a collection of digitized data and stack them to create a single dataset made by several pseudo-data obtained in this way. For instance, if we had available three digitized datasets, say IPD1.txt, IPD2.txt and IPD3.txt, we could create a single dataset using a command similar to the following.
R> ipd_files <-list("IPD1.txt", "IPD2.txt", "IPD3.txt") R> data <-make.ipd(ipd_files, ctr = 1, var.labs = c("time", "event", "arm")) This generates a R 'data.frame', which can then be fed to the fit.models function. Naturally, make.ipd assumes that there are no other covariates in addition to the treatment arm, because it is unlikely that digitized data are recorded for different strata, or values of additional variables.

Limitations and current/future developments of survHE
In this paper we have presented a comprehensive tutorial into how survHE can help modelers in developing survival analyses, with the specific aim of conducting a health economic evaluation. The main aim of survHE is to provide a suite of standardized commands, allowing the user to perform the analysis both within a frequentist (via flexsurv) and a Bayesian approach (through INLA or rstan). One of the main potential advantages to using survHE is that much of the "entry cost" in using a Bayesian model may be reduced by how survHE guides the user in forming the model, using commands that are extremely similar to those applied to frequentist analyses. We note, however, that this is and should not be a substitute for a thorough understanding of the theory and assumptions underpinning either of these methods.
The current version of survHE is robust, comprehensive and fully tested and it already allows for a wide range of model specifications. However, there are other venues that we wish to explore.
At present, the main limitations of survHE are related to improvements to the range of models that can be fitted within a Bayesian approach. For example, we are currently working to expand the range of INLA survival models to cover all the parametric distributions described in Table 1. Even more importantly, some of the advanced tools such as RPS models are particularly suitable to an INLA implementation. We expect to develop all versions of RPS within INLA, which will allow modelers to use them under a Bayesian approach and in a computationally efficient manner. At the same time, we also plan to expand the current HMC implementation to include other versions of the RPS model, as well as different forms of censoring (including left and interval). We will also explore the functionality of alternative Bayesian approximation inferential methods, e.g., "variational Bayes" (Kucukelbir, Ranganath, Gelman, and Blei 2015), which are already implemented in rstan.
In terms of rstan, it is interesting to note the recent surge of interest for survival modeling, within the HMC community, which has also been stimulated by our work with survHE, as evidenced by the discussion in the rstan forum 10 . Related to this, we aim at also expanding the Poly-Weibull model by implementing a version that allows the inclusion of covariates on the ancillary parameters as well as on the location. This effort is also part of a wider plan to bring into survHE more models based on mixtures, e.g., in the form of cure models (Amico and Van Keilegom 2018). This may result in an efficient way of accounting for the underlying complexity of the data generating process, as well as extrapolating the survival curves, potentially including external data.
As a general principle, we see the development of survHE as continuously ongoing -in fact this is the reason why we have established the GitHub repository; our aim is to expand the community of users and continuously liaise with them, so as to implement more and more models to keep the package up-to-date with the latest methodological developments.