Which indicators of early cancer diagnosis from population-based data sources are associated with short-term mortality and survival?

Highlights • Each difference in tumour stage (I–IV) predicts lower five-year survival, so prognostic information is lost in binary indicators.• Emergency presentation is associated with lower survival, independently of stage.• A high proportion of patients whose stage is not recorded die immediately after diagnosis.• Interval from first symptoms to diagnosis is not consistently associated with survival.


Introduction
Increasing early-stage diagnosis is a common component of regional and national strategies to reduce the burden of cancer [1][2][3][4][5]. 'Early diagnosis' is often used as a shorthand for 'early-stage diagnosis', which has historically been the outcome in early diagnosis studies. However, alternative indicators based on electronic health records are increasingly being used in early diagnosis studies. Some of these indicators relate to the promptness of diagnosis following clinical presentation, or the health services patients accessed first [6].
In England, cancer surveillance statistics are published on Public Health England's 'CancerData' dashboard for each of the 209 local healthcare commissioning bodies (Clinical Commissioning Groups -CCGs) [7]. These include the percentage of patients diagnosed with localised tumours (TNM Stage I/II), the percentage diagnosed following emergency admission or referral, and statistics on adherence to targets for patient waiting times.
Surveillance in England was only initiated in 2016, and optimal implementation of the different possible indicators in surveillance has not been extensively researched. Information on the association between each indicator and short-term mortality and survival will help analysts interpret the indicators, and identify those which are timely measures of progress in raising survival from cancer. In this study, we report a systematic literature review and data analysis. Our aim is to identify early diagnosis indicators and evaluate the association between each indicator and short-term mortality and survival. We then discuss the implications of our findings for surveillance.

Patient cohort
Cancer registrations were obtained from the Office for National Statistics (ONS) for adults aged 15-99 years, diagnosed with colorectal cancer, NSCLC or ovarian cancer in England in 2006-2013 (ICD-10 codes C18-20, C21.8, C33-34 and C56-C57.7 [10]). Follow-up was complete up to 31 December 2014. Registrations were linked to datasets from the National Bowel and Lung Cancer Audits, and to the Routes to Diagnosis dataset [11] using the patient's NHS number and postcode. These datasets were used to complete information on stage at diagnosis [12].

Data analysis
Thirty-day mortality and one-and five-year net survival were estimated by agegroup (15-59, 60-79, 80-99 years). Net survival estimates were obtained using Pohar Perme's unbiased estimator [13] and the period approach applied to follow-up data during 2009-2013 for patients diagnosed during 2006-2013 [14] (details in Appendix B in Supplementary data).
To avoid unstable sub-group estimates, one-year survival was only estimated if, in the period 2009-2013, at least 25 patients were diagnosed and five deaths occurred within the first year after diagnosis.
Five-year survival was estimated if at least 15 patients were alive at one year after diagnosis and five deaths occurred in the second to fifth years. Missing data was either included in a separate category, or excluded (complete-case analysis).

Literature search
The PubMed search returned 154 articles (Fig. 1), 19 of which presented new statistics or methods for generating statistics for an early diagnosis indicator. The Google search returned five reports and six articles also meeting that criterion.
Three early diagnosis indicators were identified in these 30 documents: stage at diagnosis (21 documents), emergency admission or emergency presentation (five), and interval from first symptom to diagnosis (eight).
Four documents contained information on survival and two on mortality.
3.1.1. Indicator 1: Stage at diagnosis 3.1.1.1. Definition and description. Stage was the sole indicator used in 18 documents (Table 1), and was one of several indicators in a further three. Typically the TNM classification system was used, directly or using ordinal stage (I-IV). Dukes' stage for colorectal cancer [16] and tumour thickness for melanoma [23] were also used. Stage was frequently dichotomised into 'early' (localised, stage I or II, nonmetastatic) vs. 'late' (advanced, stage III or IV, metastatic) [16,22]. The CancerData dashboard uses a binary indicator for whether the patient has a record of stage I/II disease, and presents this as a percentage of all patients (including those without a recorded stage) [37]. Other studies imputed stage information [15,19], or analysed missing stage as a separate group [35]. Average stage at diagnosis varied by tumour site [38], histological type for ovarian cancer [20], and by subsite for colorectal cancer [33].  (Table 1). This is defined as an admission coded as 'emergency' and/or 'accident & emergency' [16], or a route to diagnosis via the Accident & Emergency department or via an emergency referral or transfer [11].
3.1.2.2. Association with short-term mortality and survival. Elliss-Brookes et al. [11] found an association between emergency presentation and 1year relative survival, noting "the substantially lower relative survival in emergency compared to non-emergency routes indicates that this distinction is of high clinical significance".
3.1.3. Indicator 3: Interval from first symptoms to diagnosis 3.1.3.1. Definition and description. Time from first cancer-relevant symptom to referral or diagnosis was used in eight documents ( Table 1).
The interval start was the time the patient first noticed symptoms [34,39,40] or time of presentation with symptoms to the GP [17,18,42]. The relevance of a symptom to cancer was decided by the GP [42], specialist clinician review [18], or by reference to external standards [40].
The end-point was cancer diagnosis or referral to secondary care. One study defined two intervals: the patient interval (time from symptom onset to first clinical presentation), and the primary care interval (time from first clinical presentation to specialist referral) [39].
3.1.3.2. Association with short-term mortality and survival. The association between the intervals and survival varied by cancer. One study of childhood acute lymphoblastic leukaemia (ALL) found a prolonged interval from presentation to diagnosis was associated with longer event-free survival, although this was attributed to confounding from disease biology [18]. Another study found a 'U-shaped' curve between interval and odds of death within five years for five common cancers, with higher odds for patients with the shortest and longest intervals [42]. The high odds of death amongst patients with short intervals was attributed to confounding, arising because of GPs expediting diagnosis for patients with high-risk symptoms.
In one survey of expert judgement for 21 common cancers [43,44] there was consensus that expedited diagnosis brings mortality reductions for 11 cancers. They were undecided for seven cancers, and disagreed that expedited diagnosis conferred any mortality benefit for three.

Data analysis
We analysed the association between stage at diagnosis and emergency presentation with 30-day mortality and survival for 160,617 colorectal, 170,425 non-small cell lung (NSCLC), and 24,450 ovarian cancer patients (Table 2). Data was not available to examine the interval from first symptoms to diagnosis.

Stage at diagnosis: association with 30-day mortality and net survival
Stage was missing for a large proportion of patients in the linked datasets we analysed (16.1-36.4% of patients for the three cancers). Colorectal and NSCLC patients aged 60-79 were most likely to be diagnosed at stages I or II (43% and 22% respectively), but there were not substantial differences between age groups ( Table 2). By contrast, ovarian cancer patients aged 15-59 were considerably more likely to be diagnosed at stage I/II than those aged 80-99 (48% vs 22%).
Risk of 30-day mortality was higher at higher stages of disease, for all cancers and age groups, with the exception of colorectal cancer where mortality risk plateaued at stages II-III (Table 2). Thirty-day mortality was considerably higher for stage IV patients than stage III patients: six-times higher for colorectal cancer and three-times higher for NSCLC and ovarian cancer patients. NSCLC and ovarian cancer patients with missing stage had even higher mortality than stage IV patients (37.2% vs. 22.8% for NSCLC; 16.9% vs.12% for ovarian cancer) whereas mortality for colorectal cancer patients with missing stage was between that of patients diagnosed at stages III and IV.
One-year colorectal cancer survival was similar for patients diagnosed at stages I-III (9.6% difference between stages I and III) but markedly lower for stage IV patients (Table 3, Fig. 2). There was no such plateau in five-year survival (32.1% difference between stages I and III). For NSCLC and ovarian cancer, incremental differences in stage category (I vs II, II vs III, III vs IV) were associated with significantly lower one-year and five-year survival; no plateau was evident. Patients missing stage had low survival, typically between the survival of patients with stage III and stage IV disease (Tables 3 and 4).

Emergency presentation: association with 30-day mortality and net survival
Twenty-three percent of colorectal, 35.6% of NSCLC, and 30.7% of ovarian cancer patients were diagnosed following emergency presentation (Appendix C in Supplementary data). Emergency presentation risk was greater for older patients diagnosed with NSCLC and ovarian cancer, whilst for colorectal cancer it was most common for patients aged 15-59 and 80-99 (Table 2).
Emergency presentation was associated with 1.9-2.9 times higher 30-day mortality ( Table 2) and lower one-year net survival (Table 3): 50.7% for colorectal cancer compared with 75.9% survival for all routes combined; 14.1% compared to 32.6% for NSCLC; and 43.7% compared to 68.2% for ovarian cancer. Differences were greater for older patients. Similar patterns were observed for five-year survival (Table 4).
A small proportion (1.8-2.4%) of patients could not be assigned a route to diagnosis. Applying the assumption that these were all nonemergency presentations resulted in small changes in net survival (typically < 1% and never > 2%) indicating that these results are not sensitive to missing data (Appendix D in Supplementary data). Search strategy with number of reports and journal articles which reported new early diagnosis statistics or methods for generating these (inclusion criterion (1)), and number reporting the association between an indicator and a measure related to survival (inclusion criterion (2)). Table 1 Cancers investigated, early diagnosis indicators and survival measures used in the documents read in full. (continued on next page) 3.2.3. Stage at diagnosis and emergency presentation: joint association with 30-day mortality and net survival Patients diagnosed following emergency presentation were more likely to be diagnosed at stages III or IV or have stage missing (Table 2). Emergency presentation was associated with higher 30-day mortality and lower one-and five-year net survival for patients at each stage (Tables 2-4). Survival differences between emergency and non-emergency colorectal cancer patients increased after the first year of follow up (Appendix E.1 in Supplementary data). By contrast, survival for emergency and non-emergency NSCLC patients converged to a 'floor' by the fifth year of follow up (Appendix E.2 in Supplementary data). Patients diagnosed following emergency presentation with missing stage had extremely high mortality: 26.4-51.0% died within 30 days following diagnosis (Table 2).

Discussion
Stage at diagnosis, emergency admission or presentation, and interval from first symptoms to diagnosis are commonly used indicators of early diagnosis. However, in the literature only stage and emergency diagnosis have a straightforward relationship with patient survival.
Our data analysis showed that emergency presentation and stage are independently associated with higher 30-day mortality and lower survival from colorectal, NSCLC and ovarian cancer in England. Patients without a recorded stage in population-based datasets had extremely high 30-day mortality and lower five-year survival.

Association between stage and survival
One-year survival from colorectal and breast cancers plateaued at stages I-III and was markedly lower at stage IV, whereas NSCLC and ovarian cancer displayed no such plateau. Five-year survival did not plateau at any stages for any cancer: each incremental increase in stage was associated with substantially lower survival. Granular information on stage at diagnosis, as opposed binary groupings, is therefore useful for monitoring progress in efforts to raise medium-term survival, although certain binary stage groupings may produce statistics which are strongly associated with short-term survival.

Association between emergency presentation and short-term mortality and survival
We found that emergency presentation was associated with higher 30-day mortality and lower medium-term survival for patients at every age and stage disease, consistent with other studies [44,45]. This indicator is therefore a proxy for other factors which independently determine survival, and is a valuable complimentary prognostic indicator to stage. However, more work is needed to understand why it is independently associated with survival.

Association between interval from first symptoms to diagnosis and short-term mortality and survival
Shorter intervals from first symptoms to diagnosis were not consistently associated with improved survival in the literature we examined. Other reviews concur. Neal et al. found instances of contradiction between studies on a given cancer on whether reducing the diagnostic, referral, or treatment interval was associated with higher survival or reduced mortality [46]. Hamilton et al found there was no consensus between experts that expediting symptomatic diagnosis conferred a mortality benefit for many common cancers [43].
These inconsistent findings may be partly explained by confounding by tumour aggressiveness and stage. The 'waiting times paradox' of the shortest intervals being associated with poor survival [42,47] is also likely to be partially attributable to confounding by these tumour factors [48]: Stage and aggressiveness may determine both type of first      presenting symptoms (in turn determining interval length) and patient survival. We found evidence suggesting this in the literature: type of first symptoms was associated with the length of interval for childhood CNS [17], lung cancer [40], and ovarian cancer [20].

Monitoring performance using early diagnosis indicators
Monitoring of early-stage diagnosis is England is currently conducted using the percentage of patients diagnosed at stage I or II. However, we have shown that binary groupings of stage lose information which is predictive of medium-term survival. Numerical average stage (1-4) might provide a simple alternative measure that is more strongly associated with medium-term survival. Within a modelling framework ordered logistic regression with 4-category stage could be used instead of logistic regression with a binary stage indicator.
Emergency presentation is associated with advanced stage, and higher mortality and lower survival for patients at each stage. Data on emergency presentation could therefore be combined with stage information to generate a more informative prognostic index. Patients newly diagnosed through a given route at a given stage could be assigned a score which is the average survival of patients previously diagnosed with the same combination of indicators. For example, if a patient were diagnosed via emergency presentation at stage II disease, and previous 1-year survival for patients with these attributes was 80%, then that patient would be assigned a score of 80. The average score of the patient population could be then be used for monitoring and comparisons.
Our results don't support the use of 'average diagnostic interval length' statistics for benchmarking and performance management. This is because the very shortest intervals are associated with poorer survival (due to confounding), so short intervals are not necessarily indicative of success in early diagnosis. However, it would be worthwhile Table 3 One-year net survival (and 95% confidence interval) by age group, route, cancer, and stage at diagnosis, England 2009-2013. Ages  to monitor whether reductions in average diagnostic intervals in response to an intervention in an area are associated with changes in the stage distribution or survival, to evaluate the effectiveness of the intervention. Further work is also needed to identify alternative statistics based on the interval or similar quantities which are useful for surveillance. Alternative measures could include statistics on 'missed opportunities' for prompt symptomatic diagnosis: Lyratzopoulos et al. have described how these can occur [49], and Renzi et al identified instances of these for colorectal cancer [50].
We found that anatomical site of origin was strongly associated with probability of early diagnosis, regardless of the indicator used. The distribution of cancers should therefore be accounted for in performance comparisons, either through standardisation a modelling approach, to reduce bias from case-mix differences.

Interpreting missing stage information
In the English datasets we examined 16-36% of patients had no recorded stage. Compared to patients with a recorded stage, these patients had very high risk of death shortly following diagnosis and lower medium-term survival.
There are likely to be two different reasons why patients do not have a recorded stage. For most patients missing stage, it may be for administrative (non-clinical) reasons. These patients would have a similar stage distribution and survival to those with recorded stage. For a minority, it may have not been recorded because the patient was acutely unwell or had very poor prognosis at the time of first presentation. These patients would have more advanced disease, and many would die shortly after diagnosis. This hypothesis, that the majority of patients missing stage have typical stage and survival, and a minority have more advanced disease and very poor survival, explains the heterogeneity in 30-day mortality and survival we observed. It is also consistent with results from the study by Barclay et al suggesting that the stage distribution of these patients is slightly skewed towards later stages [51].
Our findings suggest that patients missing stage should be included in surveillance to avoid bias. This could be done using multiple imputation for missing data [52], using a 'missing stage and died shortly following diagnosis' percentage, or by applying expected survival statistics or model-based scores.

Strengths and limitations
We conducted a comprehensive joint analysis of the association between stage and emergency presentation with survival using 355,502 patient records, and compared results from this to the published literature. We also analysed the survival of patients missing a recorded stage, who comprise a substantial proportion of patients.
We restricted the literature search to documents explicitly mentioning 'early diagnosis'. This approach gave us insight into what people consider 'early diagnosis' to encompass, however, it excluded studies where explicit mention of 'early diagnosis' was absent, so some data on the association between an indicator and patient survival may have been omitted.

Conclusion
In this study we identified the different indicators used to measure early diagnosis, and examined the association of each of the indicators with short-term mortality and survival. We recommend several changes to early diagnosis surveillance in England based on our findings: that granular stage information should be used in stage statistics to improve their prognostic value; that patients without a recorded stage should be included in surveillance to minimise bias; and that data on patient's stage and route to diagnosis could be combined to create a composite early diagnosis indicator.
Shorter diagnostic intervals can be a result of late-stage disease and of patients being acutely unwell, and therefore we conclude that the average length of diagnostic interval is not an informative measure for performance management. More work is needed to examine the association between reductions in diagnostic interval length and survival improvements in an area, and to develop informative statistics based on the diagnostic interval for use in surveillance.

Authorship contribution
PM and LW contributed to the study design. PM conducted the literature review, data analysis, and wrote a first draft of the manuscript under the guidance of LW. All authors contributed to interpretation of the study results and to editing the manuscript. All authors approved the final draft for submission.

Conflicts of interest
None.