FormalPara Key Points for Decision Makers

Cost-effectiveness models to evaluate diagnostic tests mostly rely on linking outcomes related to a treatment. In turn, that has been assumed to be a given, based on a test result. Understanding the data and assumptions used to link interventions and their outcomes to test results is key to assess the validity of the cost-effectiveness results. Good practice recommendations can provide guidance for developing and appraising such cost-effectiveness models.

1 Introduction

Like treatments, evaluating the cost-effectiveness of diagnostic technologies is undertaken to ensure that the benefits to patients warrant any additional associated cost. The impact that a test has on patient outcomes is typically indirect; the mechanism of benefit is through a change in patient management and the effectiveness of that patient management in improving patient health. Ideally, the data to inform a cost-effectiveness analysis of a diagnostic technology would come from an “end-to-end” study, i.e., a study which follows patients from the point of testing, through any patient management or treatment given, to the measurement of clinically relevant final outcomes [1].

While such end-to-end studies may be possible in some situations, they may not be feasible, ethical, or advisable in others. For example, consider a new diagnostic test used to identify whether a patient with atrial fibrillation should receive an oral anticoagulant for stroke prevention. Arguably, this diagnostic test should not be assessed in an end-to-end study with stroke as an endpoint, given the very extensive body of evidence demonstrating the effectiveness of anticoagulation in this indication, from both randomised and observational studies, as well as mixed treatment comparisons and meta-analyses [2, 3]. Another possibility is that an end-to-end study is feasible, but could only represent some of the many different possible diagnostic strategies. This is common where a sequence of diagnostic tests is being evaluated, such as in the cost-effectiveness analysis by Faria et al, where there were 32 clinically feasible combinations of tests for prostate cancer [4]. In such complex scenarios, decision analytic models can provide a more useful framework to evaluate the cost-effectiveness of diagnostic tests [5, 6].

Most decision-analytic models of diagnostic tests require linking diagnostic accuracy data to treatment efficacy data to estimate the impact that a test will have on patient outcomes and costs [6]. Currently, there is no specific methodological guidance on how this should be done [7, 8]. Often, modellers may be tempted to assume a ‘uniform’ action to a test result, such as that all patients receive treatment if they test positive, for example, for a certain infectious disease. However, in clinical practice there will be a probability distribution of how many individuals who test positive actually receive treatment, and there may be differences in treatment strategies based on other clinical factors. Assuming completely test-directed decision making is only likely to be appropriate for very specific situations, such as for companion diagnostics.

A review of Health Technology Assessments (HTAs) in the UK noted that the rigour in which the evidence on treatment efficacy was identified, quality assessed and synthesised within model-based economic evaluations of diagnostic tests was poor and that evidence synthesis efforts were largely focused on diagnostic accuracy [7, 9]. An earlier review of 149 HTAs from eight countries found that intermediate outcomes, such as the impact of test results on patient management, are frequently assessed in medical test HTAs, but interpretation of this evidence is inconsistently reported [10]. It was recommended that evaluators explain the rationale for using intermediate outcomes, identify the assumptions required to link intermediate outcomes to patient health outcomes, and assess the quality of included studies [10].

This paper will build on and expand these recommendations by reviewing the guidance from selected HTA bodies to summarise current recommendations on evidence synthesis and linkage of treatment effectiveness evidence within economic evaluations of diagnostic tests. We will then explore a case study focused on a specific decision problem to better understand current practice. Based on the findings, we derive a set of proposed preliminary good practice recommendations with the aim of advancing the methodological rigour of future cost-effectiveness analyses of diagnostic tests (note: there are likely additional considerations when evaluating screening, monitoring or prognostic tests which are not covered in this paper).

2 Methods

To understand current recommended best practice in terms of evidence synthesis and linkage of treatment-effectiveness data, we reviewed modelling guidance from HTA bodies. We focused on two specific questions: (i) how are test results linked to patient management decisions, and (ii) what evidence is used to translate patient management into clinical outcomes (Fig. 1). We focused on HTA bodies with well-established cost-effectiveness requirements, such as, but not limited to, Australia (Medical Services Advisory Committee, MSAC), Canada (Canadian Agency for Drugs and Technologies in Health, CADTH), and the UK (e.g. The National Institute for Health and Care Excellence, NICE). Ten different guidelines (three for the UK, one in the EU, one for Canada, one for the United States, two for Australia, one for Sweden, and one for the Netherlands) were reviewed in total. The guideline documents were accessed in September 2022. Our aim was to identify any specific guidance on how to identify and synthesise appropriate evidence on treatment effectiveness for the inclusion in a diagnostic cost-effectiveness model, and how to link these data to diagnostic test results.

Fig. 1
figure 1

Schematic of a cost-effectiveness model for comparing two diagnostic tests, showing two modelling questions on linked evidence. ICER incremental cost-effectiveness ratio, QALYs quality-adjusted life years

The second part of this paper focuses on reviewing different cost-effectiveness models for a specific decision problem. We chose the evaluation of the biomarker troponin for the diagnosis of myocardial infarction (MI) as this test is well established, and the recommended actions and treatments are well documented [11], as are suitable study endpoints, such as 30-day mortality in clinical studies [12]. The search was conducted in PubMed, EMBASE, The Cochrane Library, The International HTA Database and EconLit, on July 13, 2022. The search was restricted to English language, from 2012 onwards, and the following countries: UK, Canada, US, Australia, Sweden and The Netherlands. Search terms included: Modelling studies, troponin, myocardial infarction, and diagnostics. The search resulted in four unique cost-effectiveness models, with additional publications using variations of these unique models. Our aim was to provide a snapshot of current methodological practice in terms of evidence synthesis of treatment effectiveness and linkage of this evidence within the identified models. The focus of data extraction was, therefore (apart from some standard information [e.g., type of model, setting and perspective]), the information that was used to link test results with clinical actions, and to link clinical actions with short- and long-term outcomes. We were not specifically interested in the treatment that was undertaken (e.g., in case of a positive test result), but rather in the clinical outcomes that were modelled, and how they were linked to the patient management decision.

Last, based on information from the HTA guidelines and the observed patterns in our example, we created initial proposed recommendations for linking test results to patient management and for translating patient management to clinical outcomes in model-based economic evaluations of diagnostic tests. For each recommendation, we provide a brief description of the problem to be considered, and specific actions as to how these should be dealt with when developing your own model.

3 Results

3.1 HTA Body Recommendations on Linked Evidence for Diagnostic Models

The Australian HTA Guidelines from the MSAC had the most extensive guidance for evidence linkage containing several specific sections on linked evidence in diagnostic technology assessments (Technical Guidance [TG] 12 and 13) [13]. These sections feature in the clinical evaluation section as, in the scenario that end-to-end studies are not feasible, a linked-evidence approach to clinical evaluation may be adopted instead. They provide a very thorough overview of the evidence requirements in this context, and thus are very informative when thinking through how to link this evidence within a decision-analytic model for a diagnostic test.

Particularly useful for our first question is that the guidelines separate the actions that may follow a test result into three components, each of which will impact the resulting adoption rate of an action (TG 12): “change in diagnostic thinking, change in recommended management and actual management” [13]. Furthermore, it is stated that consideration should be given to whether the tests under comparison should have different actions following the same results (i.e., a positive result leads to a different action in one test vs the other) and to provide justification either way.

With regard to our second question, TG 13 details the thought process for identifying the most suitable linked evidence, by working through four considerations: (a) availability of management strategy/treatment, (b) effectiveness of management strategy/treatment, (c) what happens to wrongly classified patients (false positive and false negatives), and (d) is the available evidence (i.e., created under current tests) likely applicable to the population selected with the new test [13].

The NICE Health Technology Evaluations Manual in the UK provides guidance throughout the document on the importance of evidence linkage for cost-effectiveness models of diagnostic interventions, as direct evidence leading from test result to relevant clinical outcomes are mostly not available [1]. As potential data sources, NICE recommends the use of study data, clinical guidelines, or rely on expert clinical input, if needed. We could not identify specific recommendations relating to our two questions in the Canadian guidelines [14].

3.2 Example of Troponin for Diagnosis of Myocardial Infarction

Tables 1 and 2 summarise the publications reviewed for data extraction (base-case models for each) and provide details on the data linkages performed for each model [15,16,17,18]. Table 1 provides a summary of key model characteristics and patient utility data, while Table 2 shows details with regard to linked clinical data.

Table 1 Overview of cost-effectiveness models for early diagnosis of myocardial infarction
Table 2 Cost-effectiveness models for early diagnosis of myocardial infarction: extracted data on patient management and clinical outcome(s)

The model structure from Thokala et al. [15] was applied to other decision contexts [19, 20]. Westwood et al. [16] expanded the model structure by Thokala et al. [15] including, among others, an additional Markov model with more health states and using outcomes from larger studies [21, 22]. In a recent publication, Westwood et al. [23] utilised the same model structure as in their 2015 publication [16].

None of the described models included actual details of the treatment, but Vaidya et al. assumed that in the Dutch setting percutaneous coronary interventions (PCI) would be used for all patients who tested positive [17]. In the CADTH model, treatment was implicitly included by using a weighted cost, based on the observed mixture of codes for MIs treated with either bypass surgery, PCI or non-invasively [18]. Overall, the models worked by directly linking outcomes to test results.

The Dutch model [17] linked outcomes from patients having undergone PCI [24, 25], given their model assumed that all MI patients will undergo PCI. Those came from studies in US hospitals. Others aimed to link outcomes data on mortality and re-infarction mostly from the country where the model was based [15, 16]. One study from the US [21] on the outcomes of false negative tested patients discharged from hospital untreated, was used in two models [16, 17]. In all models where the outcomes (mainly mortality) have been tested in one-way sensitivity analyses, they had significant impact on the findings of the model [16,17,18].

Patient utility data for the four investigated models predominantly came from the UK (Table 1). In summary, model input on how treatment was implemented based on test results was assumption based, and in a way that ALL patients uniformly received the action (e.g., all patients with positive test result received the treatment; all patients with negative test result were discharged).

For the linked clinical outcomes, a variety of sources have been utilised, including clinical and observation studies (e.g., disease and procedure registries), meta-analysis, inputs and outputs from other published cost-effectiveness models, national statistics (for overall life expectancy) and expert opinion. The data may have come from the country where the model was based, or from another country. The models used published data, or re-analysed patient-level data to fit the model population [15]. For some of the linked outcomes, there was a significant time period between the outcomes data collection and the model publication. For example, clinical data collected in 1993 [21] was used in a model published in 2015 [16], representing more than 20 years of time difference.

For false positive patients it was mostly assumed that no harm was done and normal life expectancy was modelled, with the exception of the Dutch model that accounted for the risk of an invasive procedure. There was wider variation in the modelling approach for false negative patients. Data from clinical studies were used to model short-term outcomes, while assumptions and previous model inputs and outputs were used to extrapolate beyond the initial period.

4 Proposed Recommendations

Given the highly relevant details provided in the Australian guidelines [13], we aimed to translate this information into practical considerations for use in economic modelling.

Our proposed recommendations regarding research question 1 (how is the action based on the test result modelled, see Fig. 1) build on the Australian framework (TG 12) [13], which describes three consequences that may follow a diagnostic test result: (a) change in diagnostic thinking, i.e., how a test is interpreted, (b) change in recommended management, i.e., what recommendations are made in response to the test results, and (c) change in actual management, i.e., what patient management, if any, is adopted.

The recommendations are therefore focused on linking diagnostic test results to actions and should be applied for each testing strategy. If the same linkage assumptions are made across different testing strategies, then this should be explicitly stated. Details of the six proposed recommendations can be found in Box 1. Fundamental to many of these recommendations is the identification and synthesis of ‘change in management’ studies. These types of studies are well described in the Australian Framework (see TG 12.2). Given the importance of these studies to justify the linking of evidence in decision-analytic models for diagnostic tests, the search and screening of studies, risk of bias assessment, presentation of results and meta-analysis (if appropriate) should follow the same methodological rigour as when synthesising evidence on diagnostic accuracy or treatment effectiveness.

With regard to our second question (what evidence is used to translate the patient management into clinical outcomes, see Fig. 1), TG 13 in the Australian guidelines [13] is centred around four relevant aspects (a) availability of treatment, (b) treatment effectiveness, (c) outcomes of wrongly classified patients, and d) the applicability of the evidence for treatment effectiveness. Again, given the high relevance of this information, we have converted these into good practice recommendations to support the translation of patient management to clinical outcomes when developing model-based economic evaluations of diagnostic tests (see Box 2). The first aspect around availability of treatment is already captured in the final recommendation in Box 1, and therefore is not included in Box 2.

There were a number of observations from the case study which may be useful to note here. First, with regard to selecting treatment and its effect, there may be a need to trade precision for accuracy when synthesising data on outcomes. For example, Westwood et al. used a patient-level re-analysis, which meant reducing the original sample size from 2092 to 170 [16]. Furthermore, ensure that the outcomes used as model input indeed link to the selected treatment. For example, Vaidya assumed that all patients with a positive test undergo PCI. Mortality was then derived from two PCI registries [17]. Additionally, in all of the models where linked outcomes were assessed in a one-way sensitivity analysis, outcomes had a significant impact on model results [16,17,18]. We therefore recommend that this is done as standard practice. Another consideration is to assess the recency of evidence for linked outcomes and consider whether practice patterns have changed to the extent that the evidence will be outdated. Typically, neither benefits nor harms are assumed for true negative patients, although this may not be a reasonable assumption, depending on the risks of the diagnostic test itself. If harm of the diagnostic procedure is a concern, this should be considered in the model (e.g., if invasive procedures are needed to obtain samples for testing), especially if different among modelled treatment strategies.

Second, modelling outcomes of wrongly classified patients can be challenging, as data may be lacking. In our case study, Vaidya et al. used the PCI procedure risk for false positive patients [17]; and two publications used outcomes from a clinical study with patients discharged from hospital despite having an MI for false negative patients [16, 17]. If evidence is missing and assumptions need to be made, clinical validation by experts will become important, as will be robust sensitivity analyses of those assumptions.

More generally, when reporting diagnostic models with linked evidence, it would increase transparency if model inputs were presented by test result category (true positives, false positives, etc.), as this would make it easier to assess what evidence has been used for outcomes and whether linkage assumptions are reasonable.

Similar to models investigating therapeutic interventions, model calibration may be required, as using linked evidence may lead to overly optimistic or pessimistic cumulative model outcomes. For example, the standard of care arm model outcomes, such as projected life expectancy, could be compared to available evidence not used for model input, such as from disease-specific national statistics.

5 Discussion

Economic models typically utilise evidence from a variety of sources with differing grades of evidence [26]. Our case example illustrates this, with evidence across a spectrum of grades being used, from meta-analysis, to clinical studies, to real-world data studies and clinical expert advice.

While in cost-effectiveness models of new treatments, typical challenges in assessing clinical outcomes are (a) the extrapolation of evidence beyond observed data in order to relate efficacy to real-world effectiveness, and (b) evidence selection and synthesis for comparators, economic models for diagnostic tests have additional specific challenges that need to be addressed. We aimed to identify these challenges and formulate a preliminary set of recommendations to support model development in the field.

Modelling for diagnostic tests with dichotomous outcomes allows representation of an explicit set of outcomes flowing from the different diagnostic test classifications, i.e., true and false positives, true and false negatives. However, while the modeller can assign actions to test results accurately, the health care provider will not know, when presented with a test result, which patients are correctly (true positive and true negative), and which are incorrectly assigned (false negative and false positive). Other information, such as that from repeated testing or other diagnostic tests, will eventually identify incorrectly classified patients. However, when there is no other existing method to confirm or exclude the diagnosis, such as for first-in-class tests, uncertainty of the correct classification of an individual patient will remain.

In our case study, while evidence was available on the outcomes of patients sent home with an MI (false negative) and those who underwent an invasive procedure while not having an MI (false positive), this may become much more complex in chronic diseases when the correction of the wrong course of action may be months to years later and when this period significantly affects outcomes, such as in oncology.

Real-world data studies may be useful to identify the proportion of patients actually receiving treatment based on a test result. “Purpose-built” datasets (in contrast to datasets re-purposed for research, such as claims data), particularly disease registries, are likely to be the most suited. Such purpose-built cohorts may then also be used to determine true and false positives (for an example see Fernandes et al. [27]).

In addition, a common assumption (highlighted in our case study) within economic modelling for diagnostic tests is that all patients receiving a diagnosis will receive the same course of action, and that all patients with a diagnosis ruled out will receive another common course of action. We know this is unlikely to happen in clinical practice, although clinical practice guidelines and local hospital care protocols aim to foster such consistency in clinical decision making and behaviour. For example, prescription of antibiotics in patients with ventilator-associated pneumonia remained high in a randomised, controlled trial comparing a biomarker-guided approach to antibiotic prescription with standard of care, despite the high negative predictive value of the biomarker-based approach [28].

A further consideration when implementing the methodological recommendations in this paper is to ensure that model development and results are transparently reported. The AGREEDT (‘AliGnment in the Reporting of Economic Evaluations of Diagnostic Tests and biomarkers’) reporting checklist is a comprehensive reporting tool, which encourages explicit reporting of (1) the impact of a test on patient management strategies, and (2) the impact of patient management strategies on health outcomes and costs [41]. Future research should focus on developing a standardised appraisal and reporting tool for cost-effectiveness models of diagnostic tests where a linked-evidence approach is used, incorporating and expanding on our proposed preliminary good practice recommendations.

Our paper has several limitations. The case study serving as an example to derive recommendations came from one clinical decision problem (identification of MI with a blood test) and the models reviewed to see how our research questions have been addressed have not been systematically and critically appraised by us. Potentially more case studies could have identified additional issues, such as dealing with complex test strategies in parallel and serial testing. Despite this, the major issues were identified, given that we worked from a general model structure (see Fig. 1) and a defined sequence of linkage (from test to treatment to outcomes). Second, the HTA guidelines and the case study came from only a few countries (Australia, Canada, Netherlands, and the UK) with established costs-effectiveness hurdles and hence more advanced methods guidance. However, there is no reason to believe that the challenges and recommendations would not be applicable to other countries. Last, our case study did not investigate the appropriateness of the framing of the decision problem and whether the test should be used in this context; rather, this was assumed to be a given. In practice, however, this may pose an additional set of challenges, such as mentioned in the introduction and outlined by the Australian guidelines [13], whether the available evidence (i.e., generated with current tests) is likely applicable to the population selected with the new test. As with any innovation, by design, this evidence may only become available with coverage of the new test and as real-world evidence is generated as a consequence of its use.

In conclusion, there exist several unique challenges for cost-effectiveness modelling of diagnostic tests which need forethought in the design of an economic evaluation. Selected evidence and assumptions need to be justified, particularly for the link of the test results to the treatment. Upfront consideration of how a test and its results will likely be incorporated into patient diagnostic pathways is key to exploring the optimal design of such models. We propose several preliminary good practice recommendations to aid in these tasks.