Disclosing progress in cancer survival with less delay

Cancer registration plays a key role in monitoring the burden of cancer. However, cancer registry (CR) data are usually made available with substantial delay to ensure best possible completeness of case ascertainment. Here, we investigate empirically with routinely available data whether such a delay is mandatory for survival analyses or whether data can be used earlier to provide more up‐to‐date survival estimates. We compared distributions of prognostic factors and period relative survival estimates for three population‐based CRs in Germany (Schleswig‐Holstein (SH), Rhineland‐Palatinate (RP), Saarland (SA)) computed on datasets extracted one (DY+1) to 5 years after the year of diagnosis (DY+5; reference). Analyses were conducted for seven cancer sites and various survival analyses scenarios. The proportion of patients registered in the datasets at a given time varied strongly across registries with 57% (SH), 2% (RP) and 26% (SA) registered in DY+1 and >93% in all registries in DY+3. Five‐year survival estimates for the most recent three‐year period were comparable to estimates from the reference dataset already in DY+1 (mean absolute deviations = 0.2–0.6% units). Deviations >1% units were only observed for pancreatic and lung cancer in RP and leukemia in SA (all ≤1.5% units). For estimates of 1‐year survival based on the most recent 1‐year period only, slightly longer delays were required, but reasonable estimates were still obtained after 1–2 years, depending on the CR and cancer site. Thus, progress in cancer survival could be disclosed in a more timely manner than commonly practiced despite delays in completeness of registration.


Introduction
High-quality population-based cancer registry (CR) data play a key role in monitoring the burden of cancer in a population over time. The estimation of population-based survival estimates from this data is an important tool to assess the overall effectiveness of health services in the management of patients with cancer. Due to its implications for health policy and health care planning, as well as for clinicians and patients as prognostic information, such monitoring should be as up-to-date as possible. However, there is often a substantial delay in data registration and release. For example, in the United States (US), the dataset of the Surveillance, Epidemiology and End Results (SEER) Program includes only patients with cancer diagnosed at least 3 years prior to publication of the dataset. 1 This delay in the use and publication of data is introduced to guarantee sufficient completeness in the ascertainment of cancer cases.
While completeness of registration is a prerequisite for deriving valid estimates of cancer incidence and mortality, valid survival estimates might be obtained even before an almost complete ascertainment of cancer patients has been reached (albeit with increased standard errors due to the smaller sample size) as long as completeness of registration is unrelated to the prognosis of patients. Brenner and Hakulinen 2 have investigated the impact of selective underreporting of cancer patients on period and cohort survival estimates in a simulation study. While these simulations allow estimating the bias expected for survival estimates for specific scenarios of under-reporting, the extent to which delayed reporting is in fact selective and requires delay in the use of CR data for derivation of valid survival estimates in real-life settings of cancer registration has not been investigated.
Earlier use of data for survival analyses is particularly important for period analyses, as this method does not require a long follow-up to estimate long-term survival. With period analyses, reliable estimates of short-as well as long-term survival can be obtained 1 year after the year of diagnosis, as only 1-year follow-up information is needed for these patients. It has been extensively shown that the period estimate for a specific year of diagnosis closely predicts later observed survival of patients diagnosed during this year. 3 Therefore, using period analyses, early use of data does not only allow estimation of short-term survival but can also allow more up-to-date estimates of long-term survival.
Here, we provide an approach to determine the required delay using routinely available CR data. We used annual CR database backups to investigate the impact of the length of delay on changes in patient characteristics and in period survival estimates. This approach was used to determine the required length of delay for various survival analyses scenarios for three population-based German cancer registries.

Data sources
In Germany, cancer registration has been implemented on a federal state level. While the "Federal Cancer Registry Data Act" (Bundeskrebsregisterdatengesetz) enacted in 1995 regularized the implementation, it lacked standards or specifications for certain structures. 4 Consequently, each state developed its own registration system and CR law. To improve the comparability across German cancer registries, the Association of Populationbased Cancer Registries in Germany was founded to develop methodological standards. However, there are key differences across registries. 5,6 Here, we use data from three population-based cancer registries in Germany (Schleswig-Holstein, Rhineland-Palatinate and Saarland), which were chosen based on data availability. The cancer registries started registration in 1967 (Saarland), 1997 (Rhineland-Palatinate) and 1998 (Schleswig-Holstein) and together covered a population of 8.0 million people (9.6% of the total German population) in 2017. A key difference across these registries is that the CR of Schleswig-Holstein is the only one that has used an electronic notification system since its implementation, while Rhineland-Palatinate received approximately 20% of notifications and pathology reports electronically and other notifications by paper. The registries of Saarland and Rhineland-Palatinate only recently fully implemented electronic notification. 4 With respect to followup, a linkage with all deceased persons in the federal state has been conducted in Saarland and Schleswig-Holstein. In Rhineland-Palatinate, it was implemented in autumn 2009 retrospectively for all years (of death) since 1998. Furthermore, in Rhineland-Palatinate linkage could only be made with death certificates with an indication of cancer until the 2014 dataset. While traceback of death certificate notified cases is routinely conducted in Saarland, in the other two cancer registries it has only been done for specific years, based on staff capacity.
For our analyses, each CR provided copies of their annually stored data backup. The data included all cancer patients registered since 1997 (Saarland) and 1998 (Rhineland-Palatinate, Schleswig-Holstein). These cancer sites were selected to include the most common tumors, tumors with short and long survival, and solid tumors What's new? Data reporting on cancer incidence and survival lags significantly behind the actual time of data collection. This lag is due mainly to efforts to ensure data completeness that result in notification delays to cancer registries. This study shows, however, that at least in the case of survival data, reliable up-to-date estimates can be obtained one year after cancer diagnosissignificantly sooner than the usual three-year delay. Valid estimates can be derived even from preliminary, currently incomplete registry data. The findings indicate that relatively prompt data reporting on cancer survival is feasible, with possible implications for cancer screening and treatment.
as well as hematological cancers. According to common practice in population-based cancer survival analyses, cases notified by death certificate only were excluded.
The characteristics age at diagnosis, stage (for solid tumors) and subsite (for hematological cancers) were explored. Stage was grouped according to the recommendations of the European Network of Cancer Registry in localized/local spread, regional spread and advanced. 8 In Saarland, a cancer diagnosis was originally coded according to the 2nd edition of the International Classification of Disease in Oncology (ICD-O-2) 9 and codes were subsequently converted to ICD-10 codes.

Statistical analyses
Analyses were conducted separately for each CR and each cancer site. For hematological cancers, survival scenarios were investigated separately for lymphoma and leukemia.
For each registry, five datasets were created, one for each length of delay between diagnosis and dataset extraction ( The dataset DY+5 was used as the reference assuming that sufficient completeness was achieved 5 years after the year of diagnosis. Using these datasets, changes in the number of patients and the distribution of age at diagnosis, stage (for solid tumors) and subtype (for hematological cancers) between the datasets DY+1 to DY+5 were explored to investigate when distributions of these prognostic factors start to be comparable to the reference distribution (DY+5).
In addition, 5-year age-standardized relative cohort survival was compared across dataset versions to assess whether prognosis is related to the timing of registration. For these analyses, the datasets had to be restricted to patients with linked follow-up information and to years of diagnosis with at Relative survival was computed as the ratio of the observed survival in the group of cancer patients divided by the expected survival of a comparable group in the general population. Expected survival was estimated by the Ederer II method 10 using life tables stratified by age, sex and calendar period as obtained from the German Federal Statistical Office. 11 Age standardization was conducted according to the International Cancer Survival Standard using the weight for cancer sites with a steeply increasing age distribution (15-44 years: 7%, 45-54 years: 12%, 55-64 years: 23%, 65-74 years: 23%, 75+ years: 29%). 12 As earlier use of data is particularly interesting when using period analyses, we investigated various period survival analyses scenarios. Cancer survival statistics are typically reported for several recent calendar years and 5-year relative survival is the most frequently used survival estimate. Therefore, we estimated age-standardized 5-year period survival for 3-year calendar periods on datasets extracted one to 5-years after the period. For all cancer registries, person-times at risk, observed deaths, and expected deaths in the absence of cancer were computed for 2006-2008 and 2009-2011 separately on the datasets extracted 1-5 years after the last year of the 3-year period (Supporting Information Table S1). Then, estimates were pooled over the two periods to obtain one estimate for each CR, cancer site and delay. The estimates based on the datasets extracted 5 years after the year of diagnosis were used as the reference.
Analyses were additionally conducted for 1-year calendar periods, which are less commonly used, but might be relevant to investigate recent trends in cancer survival. Here, we investi-  Table S2.
All calculations were carried out with SAS software (version 9.3). Cohort and period survival estimates were computed using macros developed for registry analyses. 13

Data availability
The data that support the findings of our study are available from the cancer registries Schleswig-Holstein, Rhineland-Palatinate, and Saarland. Restrictions apply to the availability of these data, which were used under license for our study. Data are available from the authors with the permission of the cancer registries.

Results
In Schleswig-Holstein, 41,105 patients were registered and included in the data 5 years after the year of diagnosis (DY +5). As shown in Table 1, the proportion of DCO cases increased from DY+1 to DY+3 (mean over all cancer sites: 7 and 20%) and was stable or decreased afterward. Figure 1 shows the proportion of patients registered in DY+1 to DY+4 compared to the reference dataset (DY+5) after the exclusion of DCO cases. The overall proportion increases strongly from 57% in DY+1 to 89% in DY+2 and then slightly to 95% (DY +3) and 98% (DY+4). Notably, patients with hematological cancers were registered later with only 44% registered in DY +1. A similar tendency was observed for cancers with worse prognosis (pancreatic cancer: 51% and lung cancer: 54%).
The reference dataset of Rhineland-Palatinate included 63,874 patients. DCO cases were first included in DY+2 (12%), increased to 20% in DY+3 and decreased afterward, especially for the cancer sites with poor prognosis (lung and pancreatic cancer; Table 1). Only 2% of the patients were already registered in DY+1 and even in DY+2, only 58% were included in the dataset (Fig. 1). However, the proportion increased strongly afterward to 93% in DY+3. Again, especially cancer sites with worse prognosis (lung and pancreatic cancer) as well as hematological cancers were registered later.
In Saarland, the reference dataset included 23,398 patients. The mean proportion of DCO cases was close to 0% in DY+1, increased up to DY+3 (8%) and was stable or decreased afterward ( Table 1). The proportion of patients registered increased from 26% in DY+1 to 90% in DY+2 to 97% in DY+3 (Fig. 1). Again, cancer sites with worse prognosis and hematological cancers were registered later.
The distribution of patient characteristics and 5-year relative cohort survival in the reference dataset compared to the previous datasets are shown in Supporting Information Table S3 to S5 for Schleswig-Holstein, Rhineland-Palatinate and Saarland. Results are described in detail in the Supporting Information Material. Patterns differed across cancer registries and across cancer sites. In general, age distributions were mostly already comparable to the reference dataset in DY+1.
Among hematological cancers, in all cancer registries, lymphoma cases were over and leukemia cases underrepresented in earlier datasets. Overall, deviations were mostly stronger or more long-lasting for pancreatic cancer and hematological cancers compared to other cancers. In Schleswig-Holstein, absolute differences for all factors were <3% units in DY+2 compared to DY+5 except for a lower proportion of advanced pancreatic cancer cases (−3.5% units). In Rhineland-Palatinate, in DY+2, absolute differences were larger than 3% units only for pancreatic cancer stage (+4.1% more advanced cancers) and 5-year cohort survival for hematological cancer (+4.7% units). In Saarland, all absolute differences were <3% units in DY+2. Table 2 shows age-standardized 5-year period survival estimates for 3-year periods computed on datasets extracted 1 (DY+1) to 5 (DY+5) years after the last year included in the period. For all cancer registries and all cancer sites, absolute differences to DY+5 were already smaller than 1.6% units in DY+1, resulting in mean absolute differences of 0.4, 0.6 and Table 2. Period estimates of 5-year age-standardized relative survival for 3-year periods for the most complete dataset (DY+5) and differences to these estimates for the following years after the year of diagnosis for various cancer sites and registries When the period of analysis was shortened to 1 year, deviations generally increased (Table 3). Mean absolute differences to DY+5 started to be lower than 3% unit in DY+1 in Schleswig-Holstein and Saarland and in DY+2 in Rhineland-Palatinate. They were <1% unit 1 year later in each registry. Overall, differences were largest for pancreatic and lung cancer and leukemia. In Schleswig-Holstein, survival was underestimated for these cancer sites in DY+1 (−2.9, −1.5 and 0.2.7% units). These deviations started to be <1% unit in DY+2 (lung), DY+3 (pancreatic) and DY+4 (leukemia), respectively. In Rhineland-Palatinate, strong differences to DY+5 were observed in DY+1. In DY+2, absolute differences were <3% units for all cancer sites except leukemia (4.0% units) and further decreased in DY+3 and DY+4. In Saarland, absolute differences >1% units were only observed for colorectal cancer (+2.4% units), prostate cancer (+1.7% units) and leukemia (+3.7% units). These differences resolved in DY+2, but in DY+3 survival was slightly underestimated for pancreatic cancer (−1.2% units) and leukemia (−1.3% units).

Discussion
Our study shows that up-to-date period survival estimates can be obtained much earlier than they are using current standard practices. Standard survival estimates, such as 5-year relative period survival for the most recent 3-year periods, can be reliably estimated 1 year after the most recent year of diagnosis instead of 3 years of delay in the investigated cancer registries. For estimates of short-term (1-year) survival based on the most recent 1-year period only, slightly longer delay was required, but reasonable estimates were still obtained after 1-2 years, depending on the CR and cancer site. Timeliness of CR data varies across countries and the required delay is usually estimated using completeness estimates. For example, within the Californian CR the median time from the date of diagnosis until availability for research was 382 days. 14 In the Norwegian CR, incidence gets reported 1 year after the closing of a certain year of diagnosis. An underreporting of 2.2% of cases was estimated at this time point, which was considered to be acceptable for analyses to allow up-to-date use of the data. 15 In Northern Ireland, first cancer statistics are published approximately 15 months after the latest incident year. 16 Within the SEER program, member registries should report their data within 22-24 months after the diagnosis year. 17,18 In Germany, CR data is usually made available 3 years after the year of diagnosis. Our study shows that standard period survival estimates can already be obtained with preliminary, incomplete CR data. For example, in Schleswig-Holstein, although only 57% of cases were registered in the first year after the year of diagnosis, the mean absolute difference between the 5-year period survival estimates (for the most recent 3-year calendar period) obtained from the dataset with 1 year compared to 5 years delay was 0.4% units with a maximum deviation of 0.6% units. Therefore, the choice of the required delay need not be interlinked with the completeness estimate but should be based on other measures to avoid unnecessary delays in data usage.
To the best of our knowledge, only one previous study has examined timing of registration and its impact on cohort survival. 16 In our study from the Northern Ireland CR, late ascertainment was associated with worse prognosis for all cancers combined and with older age, late or unknown stage of disease and specific cancer sites such as small intestine, bone, leukemia and Table 4. Period estimates of 1-year age-standardized relative survival for 1-year periods for the most complete dataset (DY+5) and differences to these estimates for the following years after the year of diagnosis for various cancer sites and registries multiple myeloma. In our study, we also observed later registration for hematological cancer sites in all three registries but we did not find a consistent pattern between the length of delay and survival. It can be expected that results differ across countries and even across registries within the same country, as the registration processes differ. The Northern Ireland CR primarily uses electronic methods for case ascertainment, while among the included German CRs an electronic notification system was only used fully in Schleswig-Holstein during the time of the study. Furthermore, the Northern Ireland CR routinely conducts a traceback of death certificate notified cases, which was only routinely conducted in the Saarland CR. With a DCO proportion of 1% in 1993-2010, the case ascertainment is overall more complete in the Northern Ireland CR than in the investigated German CRs.16 These differences illustrate that results from our study should not be generalized to other CRs and it is important to investigate reporting patterns and, consequently, the required delay to obtain accurate survival estimates for each CR separately. However, our study exemplifies an easy way to estimate the required delay with routinely available data and could be easily conducted in cancer registries to avoid unnecessary delays in reporting of survival statistics. Even within Germany, differences in registration practice across federal states exist that might cause different registration timing and, thus, might explain our observed intra-registry differences in the analyses. One main difference is that an electronic notification system was used fully only in Schleswig-Holstein during the time of the study. Such a system helps to accelerate the processes of registration in general, as paper forms do not have to be typed in a database and several linkages would be possibly easier and faster than before. Another CR in Germany that used an electronic notification system is the CR of North Rhine-Westphalia, which is able to report important epidemiological information for the state within 2 years. 19,20 Our result support that a modernization to electronic systems might lead to an improvement of timeliness, as cases were registered earlier and deviations with shorter delays were generally smaller in Schleswig-Holstein compared to the other two cancer registries.
Despite differences in registration practice, some patterns were consistently found in all three registries. The proportion of DCO cases increased in all registries with the length of the delay during the first 3 years and decreased afterward. As information on date of diagnosis is usually not available for DCO cases and substituted by the year of death, the number of DCO cases cannot increase with the length of the delay as long as death certificates are immediately available and linked to the CR data. Thus, the pattern we observed is solely based on delayed availability of death certificates and/or delayed linkage with this information. Regarding survival, estimates computed on earlier datasets showed stronger bias or a more long-lasting bias for pancreatic, lung and hematological cancers than for the other cancer sites. Due to the consistency across registries, it can be hypothesized that these differences do not arise due to processes within registries but due to delayed notification by reporting sources. Pancreatic and lung cancer have the worst prognosis and, thus, a lower probability of being registered during the patient's lifetime. 21,22 Delayed reporting for patients with leukemia has previously been observed in the US 23 and was explained by the dependence on the reporting of licensed practitioners, as these patients are often treated by practitioners and are not in clinics for surgery. However, it is not directly evident why the reporting patterns might differ for these sites, wherefore further insights into the data, especially with respect to the notification sources, are needed.
Our study has specific strengths and limitations. We provide an easy-to-implement approach to determine the required delay for up-to-date registry-based monitoring of cancer survival and illustrate its application for cancer registries with various practices of cancer registration and various delays in completeness of registration. Although this delay was found to be consistently shorter than the commonly practiced delay in using data for survival analyses across registries with strongly varying practices of cancer registration, these patterns should be confirmed by similar analyses for other countries or even other registries within Germany, as registration practices differ across cancer registries. Furthermore, these analyses should be repeated when registration processes change, as it is currently the case in Germany due to the implementation of the nationwide clinical cancer registration. 4 A further strength of our study is the investigation of various years of diagnosis and many cancer sites including solid as well as hematological tumors and tumors with high and low survival rates. However, we have not investigated rare tumors, which might show a different notification pattern and we only investigated the timeliness of the reporting of the prognostic factors age and stage. Especially, for the evaluation of clinical CR data, an assessment of further tumor-related factors and treatment information will be required.
In conclusion, our study shows that CR data could be used much earlier for standard survival analyses as it is currently done: for standard period survival analyses in Germany, the delay could be shortened from 3 to 1 year, with some caution for pancreatic cancer, lung cancer and leukemia when estimating short-term survival or using short calendar periods. Routine implementation of the presented method for estimating the validity of the most recent survival estimates rather than exclusively relying on estimates of delays in completeness of cancer registration could help to avoid unnecessary delays in the monitoring of cancer survival.