Health economic studies of colorectal cancer and the contribution of administrative data: A systematic review

Introduction: Several forces are contributing to an increase in the number of people living with and surviving colorectal cancer (CRC). However, due to the lack of available data, little is known about the implications of these forces. In recent years, the use of administrative records to inform research has been increasing. The aim of this paper is to investigate the potential contribution that administrative data could have on the health economic research of CRC. Methods: To achieve this aim, we conducted a systematic review of the health economic CRC literature published in the United Kingdom and Europe within the last decade (2009– 2019). Results: Thirty- seven relevant studies were identified and divided into economic evaluations, cost of illness studies and cost consequence analyses. Conclusions: The use of administrative data, including cancer registry, screening and hospital records, within the health economic research of CRC is commonplace. However, we found that this data often come from regional databases, which reduces the generalisability of results. Further, administrative data appear less able to contribute towards understanding the wider and indirect costs associated with the disease. We explore several ways in which various sources of administrative data could enhance future research in this area.

CRC, but also there are indirect implications for patients, their families and wider society in terms of the impact of CRC on labour force participation and on both physical and mental well-being. It is crucial that we can measure these implications in order to assess the impact of CRC and to help inform policymakers decisions on how best to allocate a finite health budget.
The current availability of data to inform this understanding is somewhat limited and more often than not, data from clinical trials are used to make assumptions about the possible impact of an intervention on the entire population and ultimately inform decisions about resource allocation. Unfortunately, the generalisability of efficacy and cost-effectiveness measures from clinical trials to real-life populations can be limited by sample selection, size and attrition (He et al., 2020;Leon et al., 2006). Furthermore, clinical trials can be expensive to implement and run and often have short follow-up periods, meaning that longer term outcomes cannot be observed (Fitzpatrick et al., 2018).
One potential solution to the issue of the generalisability of trial data to whole populations lies in the use of administrative data. That is, data that are collected routinely 'by government departments and other organisations for the purposes of registration, transaction and record keeping, usually during the delivery of a service' (Woollard, 2014). Examples include hospital admissions data, education records and tax records. The routine collection of administrative data presents an exciting opportunity to conduct population level research that offers insights into healthcare resource use, costs and outcomes across a variety of domains such as education, income and retirement, through the linkage of these records to other data sets including clinical trials (Card et al., 2010;Einav & Levin, 2014;Fitzpatrick et al., 2018). Moreover, administrative data can overcome the short follow-up period inherent in trials by tracking individuals over time, for example as they move in and out of hospital, into long-term care and even up to the end of their lives.
Despite these advantages, since administrative data are not generated for research purposes, they often lack the usual auxiliary measures that are used in social research to draw causal inference from a data set (Connelly et al., 2016). Thus, one of the central prospects for administrative data is for its use as a complementary source of information alongside clinical trials and survey data. The benefits of linking administrative records to observational data are documented elsewhere (Doiron et al., 2013).
Over the years, the potential of administrative data in research has been recognised worldwide and efforts have been made to harness that potential (Card et al., 2010;Einav & Levin, 2014). In the Nordic countries in particular, robust data sharing infrastructures have been developed to facilitate researchers in making use of administrative data sets (Connelly et al., 2016). Moreover, the linkage aspect of administrative data has led to large data repositories emerging, where data sets are linked together and researchers can apply to access specific data sets and cohorts, to carry out their analysis (Doiron et al., 2013). Further, data repositories enhance research transparency because their indefinite storage allows for the replication of results. The success of such repositories has been made clear, for example the Western Australia Data Linkage System (WADLS) repository includes over 30 population-based data sets and has produced over 250 journal publications (Doiron et al., 2013).
Of course, the creation of such repositories is not without its challenges. In particular, any research project that uses personal health data where informed consent is not obtained from patients may pose a risk to individual privacy. Therefore, central to the creation of a research repository is striking the appropriate balance between public benefit and patient privacy. That means being clear and transparent about the purposes of the research and its potential to generate patient or public benefit, at the same time taking measures to minimise the risk to patient privacy for example through the pseudonymisation or anonymisation of data.
We have identified that Scotland is in a unique position to demonstrate the potential contribution of administrative data, as well as an administrative data repository, within the health economic research of CRC. This is primarily due to the current data sharing and linkage infrastructure. Specifically, all Scottish residents have a unique Community Heath Index (CHI) number that permits the linkage of their administrative health records to one another and to other data sets.
The overarching aim of this paper is to investigate the potential contribution that administrative data could have on health economic research of CRC. To achieve this aim, the objectives were as follows: 3. To explore the benefits and limitations of using administrative data in this research; 4. To discuss the ways in which administrative data, using Scotland as an exemplar, could contribute to this research in the future.
In what follows we outline the methods employed for the systematic review. Section 3 presents the results, and Section 4 discusses the findings and concludes.

| Selection criteria
Full-text publications of health economic studies were included when available in English language. The definitions of health economic studies are outlined in Table 1. Articles that were not carried out in Europe or the UK were excluded. Further, review articles were also excluded.

| Data extraction
The articles were grouped into the study groups as outlined in Table 1. A proforma was used to extract the relevant data from each article within these groups. For all types of studies, the country, perspective taken, method employed, data sources used (including administrative data), types of costs included, TA B L E 1 Definition of health economic studies included in final review

Study Description
Budget Impact Analysis (BIA) Budget impact analyses assess the affordability of a novel healthcare intervention or policy change applied to a specific healthcare budget, at an aggregate population level.

Cost Comparison (Cost Minimisation) (CC)
CC is a method of comparing the costs of two or more interventions when the health outcomes of the interventions are assumed to be the same.
Cost of Illness (Burden of Illness) COI studies attempt to quantify the costs of a specific disease. This might be for the entire disease pathway or for parts of it. Unlike EEs, they do not attempt to compare costs for competing interventions rather, they provide an estimate of the cost given the existing provision of care.

Economic Evaluation
Economic evaluation aims to calculate the costs and benefits of an intervention or treatment, in order to establish whether it is cost effective and thus inform investment in services. There are four main types of economic evaluation, which differ in terms of how they measure outcomes: Cost Benefit Analysis (CBA) In CBA, health outcomes are measured in monetary units. costs data sources used and the part of the CRC pathway under study were extracted. For the EEs, the type of evaluation was also noted.

| Literature search results
The articles were almost equally split between EEs (n = 19, 51%) and costing studies (n = 18). The perspective taken influences the types of costs that are included. As a result, the vast majority (89%) of studies only include direct costs associated with the delivery of care. The two studies which take a societal perspective, Lansdorp-Vogelaar et al., (2018) and Pil et al., (2016), also incorporate indirect costs, that is additional costs encountered by the patients such as loss of earnings. In terms of which part of the CRC pathway is investigated, the most common evaluations are conducted on screening programmes. In particular, 45% (n = 9) of the included studies evaluate the cost-effectiveness of different CRC screening programmes. A further 40% (n = 7) look at the cost-effectiveness of treatment for CRC, including curative treatment and treatment for metastatic disease. A smaller proportion of the EE's, 10% (n = 2), look at diagnosis and 5% (n = 1) at surveillance of adenomas. As most studies use Markov and microsimulation models, they tend to model outcomes and costs beyond the initial pathway starting point, either until the end of life or an alternative long-term end point, for example 50 year follow-up. In addition to conducting a CUA to assess the value of a healthcare intervention, three of the EEs conducted a budget impact analysis (BIA) to assess the affordability of the intervention for a specific healthcare budget (Arrospide et al., 2018;Murphy et al., 2017;Pil et al., 2016).
In contrast to the evaluations, only 31% (n = 4) of the COI articles identified were UK based (England only). Ireland accounted for almost a quarter of the studies (n = 3), followed by Italy (n = 2), France (n = 2) and Spain (n = 2). The majority of COI articles conducted retrospective cohort analyses (77%). This involves looking at historical data to identify a cohort of patients, for example those with metastatic CRC, and costing their use of healthcare resources.
In addition, COI studies were less likely to mention which perspective the analysis is conducted from. However, like EEs, the COI studies tended to focus on direct costs. Only two COI papers looked solely at indirect costs (Hanly et al., 2013;ÓCéilleachair et al., 2017) and one included both direct and indirect costs (Lejeune et al., 2009 (Bending et al., 2010;Corral et al., 2016;Francisci et al., 2013). Others focussed on diagnosis, but only looked at costs from diagnosis up to a pre-specified time point, for example 12 months postdiagnosis or within 12 months of initial diagnosis. Two studies looked at all hospital care throughout the care pathway (Laudicella et al., 2016;Macafee & Whynes, 2009).
One study focussed on treatment of metastatic and non-metastatic disease up until the end of life (Mar et al., 2017), whilst another focussed on costs of treating metastatic disease alone (Giuliani et al., 2012). Similarly, one study looked at the cost of surgery alone (Jean-Claude et al., 2012)and another at the costs from surgery up to three years post-surgery (Lejeune et al., 2009). One COI study also conducted a BIA for patients who underwent CRC surgery in French nonprofit hospitals (Jean-Claude et al., 2012).
Of the CC studies identified in Table 4, two of the CC studies are from Italy and the remaining three are from Greece, Sweden and Germany. As with the COI studies, the predominant methodology applied in the CCs is a retrospective cohort approach. In terms of the perspective, the majority of the CC studies conduct their analyses from the perspective of the health system in which they are based.
One of the studies takes the perspective of the healthcare payer, and another takes both a health policymaker and a societal perspective.
Once again, the focus on costs is mainly on direct costs; however, two CC papers also incorporate indirect costs. The majority of CC studies focus on the treatment part of the CRC pathway. One paper focusses on surgery and another on screening up until the end of life.

| Types of administrative data used within the health economic research of CRC in the UK and Europe
The papers identified use a mixture of data sources including administrative data, national statistics, previous studies including randomised control trials (RCTs), expert opinion and in some cases primary data collection. Table 5 below outlines the administrative data sources that appeared most frequently in the studies. In addition, Tables 6, 7 and 8 in the appendix provide a broader overview of the various data sources used to inform the main groups of parameters included in the literature.
Within the EEs, 68% (n = 13) utilise administrative data. The most common use of administrative data within these studies is via the use of administrative costs data bases to calculate direct costs (those which relate directly to patient care such as a hospital stay). Further, some studies use administrative data, for example cancer registry, to inform particular patient and clinical parameters in the decision trees, Markov and simulation models.
Of the EEs which use administrative data, five evaluate screening programmes and use administrative screening programme data in their analysis. This clearly reflects the effort in many European countries in recent years to detect cancer as early as possible for those at the highest risk by rolling out national screening programmes for CRC. As a result, a multitude of administrative screening data sets have been created and researchers have capitalised on this opportunity.
It is also common for those studies to combine the screening data with other administrative data sets. In particular, Arrospide Pil et al., 2016) who undertook BIAs, location-specific estimates of population size, age-specific disease incidence, resource use and location-specific costs were acquired from various administrative

Data source Description
Cancer registry data Cancer registries contain a record of all cases of new cancer diagnoses in one centralised system. They tend to include information on cancer diagnoses and treatment, allowing a country to monitor cancer incidence and survival, and any emerging trends, over a long period of time. Registries also include patient level demographics, permitting analyses of diagnoses by age, gender and stage distribution. They can also include information on cancer related mortality.
Screening programme data Screening programme data sets provide a wealth of information including participation and compliance rates, adenoma and CRC detection rates, specificity and sensitivity, as well as information on surveillance. In some cases, follow-up data are also available, for example on colonoscopies and flexible sigmoidoscopy. Follow-up data provide information on participation, detection and complications.
Routine hospital records Routine hospital records provide information on any acute hospital admission experienced by a patient, including length of stay and procedure codes. Moreover, hospital records often include additional information about an individuals primary and secondary diagnoses, allowing the researcher to gather more information about patient co-morbidity and other procedures and medications related to or unrelated to their cancer diagnosis.

Costs databases
Administrative costs data are collected in the form of national tariffs for the reimbursement of the provision of hospital services and in hospital accounting systems. These systems are usually updated annually and therefore provide robust and up to date estimates of unit costs for economic analyses. data sources to permit analyses that were relevant and useful to the budget holder in question.
At the same time, the extent to which administrative data are used within the EEs varies considerably and no one study relies exclusively on routine data. For example, Atkin et al., (2017) use routine hospital records linked to cancer registry data to inform many of the parameters in their patient level simulation model, whilst Rao et al., (2018) use routine hospital records solely for the purposes of informing their parameter on postoperative mortality. Furthermore, several EEs use administrative costs data only (Asseburg et al., 2011;Bullement et al., 2018;Murphy et al., 2017;Robles-Zurita et al., 2018). In every EE, the existing literature or previous RCTs are also used to inform specific model parameters.
Where this is the case, it is possible that the prior research also used administrative data.
The COI studies utilise administrative data more often compared to the EEs. In particular, 11 of the 12 COI papers identified in Table 3 use administrative data. Overall, compared to EEs, COI studies are more likely to rely exclusively on administrative data (Corral et al., 2016;Francisci et al., 2013;Giuliani et al., 2012;Laudicella et al., 2016;Lejeune et al., 2009;Macafee & Whynes, 2009;Mar et al., 2017) and are far less likely to use previous studies to inform parameters.
Finally, almost all of the CC studies use administrative data of some sort. As with the EEs, some use administrative data in the form of costs databases only and like the COI studies some use administrative hospital records.

| D ISCUSS I ON AND CON CLUS I ON
Clearly, one area in which administrative data have been particularly powerful is in evidence on the cost-effectiveness of various screening strategies for CRC, which has resulted from the evolution of national screening programmes throughout Europe. Data from these programmes have been used to inform and update many of the crucial parameters used in the models that accompany EEs of screening programmes. This evidence base invariably demonstrates the feasibility and potential of collecting administrative data on this scale to inform other parts of the treatment pathway for CRC.
At the same time, administrative cancer registry data have proved to be useful in terms of defining and identifying cohorts for costing studies and again for informing vital parameters such as disease prevalence, treatment and outcomes. Many EE's have also taken advantage of the power of data linkage by linking administrative records to data form participants in RCTs.
Furthermore, since providing estimates of costs is central to conducting both EE's and costing analyses, the emergence of costs databases have proved to be a valuable source of information on costs for all areas of economic research into CRC. Specifically, 43% (n=16) of the studies identified used administrative costs databases.
The administrative costs databases have proved particularly powerful in the studies that include direct costs. In particular, the costing approaches implemented in those papers are consistent with the existence of European Disease Related Group (DRG) type systems for reimbursing hospitals for their services. Therefore, unsurprisingly, many of them implement a 'top-down' costing approach by using national tariffs based on DRGs to attach monetary values to patients resource utilisation (Špacírová et al., 2020). This highlights the potential for administrative data to contribute to understanding the costs of delivering CRC care.
Finally, the merit of using administrative data for the purposes of BIA is clear. In an era of increasing austerity and budget cuts, using administrative data within BIAs to more accurately predict the affordability of introducing novel interventions into a fixed budget healthcare system will ensure more efficient allocation of resources.
Using locally or nationally collected administrative data for the purposes of BIA is particularly useful because this will make any analysis more relevant and useful to the budget holder in question.
Having said that, we have identified some areas where the use of administrative data has been limited. For example, although one of the main advantages of using routine records in research is their ability to capture large populations over long periods of time, we find little evidence that this is the case for the health economics literature on CRC. Specifically, only one costing study used routine records to capture an entire population over a long period of time (Laudicella et al., 2016). Excluding this example, the maximum sample size identified is less than a few thousand and in most cases, the populations under study come from a single hospital or administrative area. At the same time, many of the costing studies identified look at one specific part of the disease pathway with a limited follow-up period. Overall, it appears that the power of administrative data to provide evidence for whole populations, spanning the entire disease pathway and follow-up for survivors, is yet to be harnessed.
Related to this, we found a lack of evidence on the wider costs associated with CRC, particularly with respect to social care and indirect costs such as unpaid care. For example, although evidence shows that many cancer patients need social care as a direct consequence of their condition and the consequences of its treatment, none of the papers identified look at the use of social care services by CRC patients (MacMillan Cancer Support, 2015).
Furthermore, few papers explored indirect costs. In particular, only two EEs explicitly take a societal perspective and therefore include both direct and indirect costs of care (LansdorpVogelaar et al., 2018;Pil et al., 2016). Within the COI studies, Hanly et al., (2013) and ÓCéilleachair et al., (2017) focus exclusively on indirect costs, whilst Lejeune et al., (2009)

include direct and indirect costs. Further, the
CC's carried out by Maniadakis et al., (2009)

and Tscheulin and Drevs
(2010) also include both direct and indirect costs. The lack of inclusion of indirect costs overall is not surprising given that they are notoriously difficult to measure. However, of those who did, the use of administrative data was even less likely. Clearly, measuring indirect costs is challenging in itself, but in addition to this, the administrative data appear less able to contribute to studies which include the indirect costs of CRC. This highlights a key limitation of administrative records in their ability to capture indirect costs.
Finally, it appears that administrative data are less able to con- In Scotland, local authorities are required to routinely collect information on all social care services delivered to people within their area. This data could be used to provide evidence on other nonhealth related direct costs associated with CRC, again both during treatment and beyond. In addition, as part of the social care data collection, an indicator of the presence of an unpaid carer is collected for social care clients. This information could be useful for understanding the indirect costs associated with a CRC diagnosis, in terms of the reliance on unpaid carers to provide additional care and support.
Going forward, it is important to recognise that there are questions administrative data cannot answer alone. In such cases, trial and survey data may fill the gaps, and vice versa. Specifically, clinical trial data offer the opportunity for randomisation, blinding and/or stratification, which allow assessment of the efficiency of new or existing treatments in a highly selected group of patients. In contrast, administrative data can play a unique role in testing the effectiveness, in a real world setting, of treatments that have already been tested within trials. This sequence of events means that the types of questions that trial data set out to answer are likely to be different to those using administrative data. Overall, administrative data do not remove the need for trial or other data sources, instead these sources of data are complimentary.
That being said, we have found that the use of administrative data is common within the UK and EU health economic research on CRC. In particular, cancer registry, screening and routine hospital records were commonly used. In the EE's, administrative data tended to be supplemented with data from the clinical trial under study and/ or from the existing literature. Costing studies were more likely to rely heavily on administrative records. Overall, we find that although administrative data are present, they do not appear to being used to their full potential and administrative data, including data repositories, within the UK and Europe could have a significant impact on research in this area. Scotland, in particular, may provide a valuable exemplar to unlock this potential.

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest to report.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data sharing not applicable to this article as no data sets were generated or analysed during the current study.
A PPEN D I X TA B L E 6 Economic Evaluations: Parameters and sources