Indicators of Data Quality at the Cancer Registry Zurich and Zug in Switzerland

Data quality is an important issue in cancer registration. This paper provides a comprehensive overview of the four main data quality indicators (comparability, validity, timeliness, and completeness) for the Cancer Registry Zurich and Zug (Switzerland). We extracted all malignant cancer cases (excluding non-melanoma skin cancer) diagnosed between 1980 and 2014 in the canton of Zurich. Methods included the proportion of morphologically verified cases (MV%), the proportion of DCN and DCO cases (2009–2014), cases with primary site uncertain (PSU%), the stability of incidence rates over time, age-specific incidence rates for childhood cancer, and mortality:incidence (MI) ratios. The DCO rate decreased from 6.4% in 1997 to 0.8% in 2014 and was <5% since 2000. MV% was 95.5% in 2014. PSU% was <3% over the whole period. The incidence rate of all tumours increased over time with site-specific fluctuations. The overall M:I ratio decreased from 0.58 in 1980 to 0.37 in 2014. Overall, data quality of the Cancer Registry Zurich and Zug was acceptable according to the methods presented in this review. Most indicators improved over time with low DCO rates, high MV%, low PSU%, relatively low M:I ratios and age-specific incidence of childhood cancer within reference ranges.


Introduction
The Cancer Registry Zurich and Zug in Switzerland was established in 1980 and covers roughly 20% of the Swiss population (1.56 Mio of 8. 19 Mio in 2014). The reporting of cancer data is compulsory in the canton of Zurich since the introduction of the cantonal law on cancer registration in 2017. Before that, several legal bases and approvals have ensured a high level of data reporting from pathology institutes, hospitals, and general practitioners. The purpose of population-based cancer registration is monitoring, epidemiological research, and health policy making. Cancer statistics depend on the quality of data assessed in cancer registries. Therefore, a good data quality is essential.
Cancer registries are encouraged to assess the quality of their data [1][2][3]. Several methods to report data quality have been proposed including qualitative and quantitative methods. These address the following indicators: comparability, validity, timeliness, and completeness. Comparability is the extent to which coding and classification procedures as well as definitions of recording and reporting specific items adhere to agreed international guidelines [1]. Validity (accuracy) is defined as the proportion of cases in a dataset with a given characteristic (e.g., site and age) that truly have the attribute [1]. Timeliness relates to the rapidity at which a registry can collect, process, and report sufficiently reliable and complete cancer data [1]. However, there is a trade-off between timely data and the extent to which it is complete and accurate [1]. Finally, completeness is the extent to which all of the incident cancers occurring in the population are included in the registry database [2]. Completeness is a prerequisite to present incidence rates and survival proportions [2].

BioMed Research International
The present study aims to provide a comprehensive overview of the four main data quality indicators for the cancer registry of Zurich and Zug for the period of 1980 until 2014 including different cancer types and using a variety of methods. The cancer registry Zurich and Zug receives notifications from pathology and haematology laboratories, hospitals, and physicians as well as death certificates from the Swiss Federal Statistical Office. Data include personal information and tumour characteristics. Vital status follow-up was not conducted annually, because we did not have access to vital status information from the citizen service departments. The death statistics (and death certificates) that we receive from the Swiss Federal Statistical Office once a year are anonymous and linkage with the cancer registry database is not straightforward. Since incidence year 2009, a one-year and a five-year follow-up have been carried out.

Methods
Only malignant cancer cases (C00-C99, excluding nonmelanoma skin cancer (C44)) were included. For some indicators/methods, data are presented for the whole period 1980-2014 (e.g., incidence data); for others, data are only presented for the periods 1997-2014 (death-certificate-only (DCO) cases) or 2009-2014 (death-certificate notification (DCN) cases). Distinguishing between DCN and DCO cases is only possible since 2009 due to a change in the database software. Furthermore, DCO cases were not systematically marked before 1997 . For the whole period 1980-2014, 197,493  incident tumours were available. For the analyses including  only the periods 1997-2014 and 2009-2014, the respective  numbers were 115,947 and 43,719 tumours. For specific  parameters, only the most common cancer localisations are  presented (Tables 1, 2, and 4).
The Swiss Federal Statistical Office provided population and mortality data . The coding of the mortality data is based on the International Statistical Classification of Diseases and Related Health Problems (ICD) 10 th revision and conducted according to the rules defined by the WHO since 1995. Permanent resident population data, which include Swiss citizens with main place of residence in Switzerland and foreign citizens with a residence permit for at least 12 months, were used at midyear.

Comparability, Validity, Timeliness, and Completeness.
Regarding comparability, a general description of adherence to international guidelines, standards for classification and coding of neoplasms, definition of incidence date, and rules for multiple primaries is given. Validity is represented by the proportion of morphologically verified cases (MV%, 1997(MV%, -2014, the proportion of DCO (1997-2014) and DCN (2009-2014) cases, and cases with primary site uncertain (=C80 according to ICD 10, PSU%, 1980-2014. Furthermore, the procedures regarding internal consistency checks are presented. A general description of timeliness is provided in addition to the comparison of incident cases published in annual reports for specific years, indicating the proportion of cases that were registered "too late." Regarding completeness, the following semiquantitative methods were used: the stability of incidence rates over time was investigated for specific tumour groups. Incidence rates were age-standardised using the 1976 European Standard Population [18]. Furthermore, the mortality:incidence (MI) ratios and the age-specific incidence rates per 100,000 for childhood cancer were assessed. Age-specific incidence rates for childhood cancer were calculated over the whole period 1981-2014 for the age strata 0-4, 5-9, and 10-14 years including all types of cancer. As suggested by   [2], we used the reference intervals based on deciles for childhood cancer published in CI5 Volume VIII [17].
Due to limitations in updated vital status information, methods including survival were not applied. All statistical analyses were performed using R Version 3.4.0. The curves in Figures 1(c) and 2 were smoothed using LOESS regression (Local Polynomial regression fitting) and the shaded areas present the 95% confidence intervals. Incidence dates are defined according to the recommendations of the European Network of Cancer Registries (ENCR). The date of histological confirmation or the date of the first pathology report confirming a cancer has the highest priority. If the clinical confirmation of the diagnosis was more than three months before the histological confirmation, the clinical date is considered the date of diagnosis.

Results
The most valid basis of diagnosis is selected according to the recommendations of the International Agency for Research on Cancer (IARC) and the International Association of Cancer Registries (IACR) [19]. The recording of  Figure 1(c) displays a smoothed curve; the shaded area presents the 95% confidence intervals. multiple primary tumours follows the recommendations of ENCR [20]. The topography codes considered as single sites and systemic and multicentric cancers were counted only once. If a new tumour (e.g., diagnosed simultaneously in the same site) has a different morphological code (e.g., the first four digits denote a different cell type), it is considered as a new cancer case.  Table 1). DCN cases decreased from 3.6% in 2009 to 1.5% in 2014. In 1997, the DCO rate was highest for pancreatic cancer (16.9%) and carcinoma of the liver and intrahepatic bile ducts (12.2%) and lowest for skin melanoma (1.0%) and thyroid cancer (1.5%, Table 1  rate was highest for leukaemia (2.0%) and pancreatic cancer (1.7%) and lowest for skin melanoma, tumours of the oral cavity and pharynx, and thyroid and brain cancer (0.0%). For carcinoma of the liver and intrahepatic bile ducts, the DCO rate decreased to 1.4% in 2014. The DCN rate in 2014 was highest for carcinoma of the liver and intrahepatic bile ducts (5.5%) and stomach cancer (4.8%).
MV% has increased from 89.7% in 1997 to 95.5% in 2014 ( Figure 1 Table 2). In 1997, the proportion was lowest for pancreatic cancer (62.2%) and highest for skin melanoma (99.0%, Table 2). In 2014, the proportion was lowest for carcinoma of the liver and intrahepatic bile ducts (68.3%) and highest for skin melanoma, tumours of the oral cavity and pharynx, and thyroid cancer (100.0%). For pancreatic cancer, the proportion increased to 81.3% in 2014.
After one incidence year has been completed, IARC checks as well as ENCR checks (since incidence year 2014) are performed. Any errors are checked and corrected, if applicable.

Timeliness.
Currently, the Cancer Registry Zurich and Zug completes the incident cases with a two-year delay. That is, at the end of 2017, incident cases of 2015 are completely registered and coded. The Cancer Registry Zurich and Zug publishes annual reports since 2009. This requires "freezing" the database at a certain point in time (usually in December). For example, in December 2017, the data of incidence year 2015 are exported; these are published in spring 2018. The advantage is that most information on these cancer cases is available by the time of coding the cases. However, at the time of publishing the data, they are already somehow "out-dated" (three-year delay).
Based on the annual reports 2014-2016, Supplementary Material Table 1 presents the number of incident cases for specific localisation for the incidence years 2012 and 2013. For most localisations, the difference in cases registered within two years after diagnosis and within three years after diagnosis was less than 5% but tended to be somewhat higher for leukaemia and liver cancer. About 2.5% of all cases were registered one year later than intended. Figure 2 shows the stability of incidence rates between 1981 and 2014. Overall, the incidence of all tumours combined increased both for men and for women. For men, the most pronounced increase was between 1981 and 1990, while for women a linear increase was observed over the whole period. The annual trends for some cancer sites fluctuate, but there does not seem to be any pattern. Increasing incidence trends were observed for breast cancer and lung cancer in women and for skin melanoma in both sexes, while lung cancer in men showed a decreasing trend. Prostate cancer increased up to 2005 and decreased thereafter. The incidence of stomach and bladder cancer decreased while lymphomas increased slightly in both sexes. Table 3 presents the age-specific incidence rates per 100,000 for childhood cancer for 1981-2014 (all sites). All values are within the reference values (upper and lower deciles for childhood cancer incidence rates published in volume VIII of CI5) [17].

Completeness.
The M:I ratio is displayed in Table 4. Overall, the M:I ratio decreased from 0.58 in 1980 to 0.37 in 2014. Cancers with poor survival rates (e.g., pancreas, lung, stomach, liver) had M:I    1) The lowest and highest deciles for childhood cancer incidence rates are published in CI5 Vol. VIII [17].
ratios close to one, whereas skin melanoma had low M:I ratios of about 0.1 or 0.2 over the whole observation period.

Discussion
The present study gives an overview of the four main indicators of data quality in cancer registration (comparability, validity, timeliness, and completeness) for the Cancer Registry Zurich and Zug in Switzerland that registers data since 1980. In general, the data quality in the Cancer Registry Zurich and Zug is acceptable according to the methods presented in this study.

Comparability.
The Cancer Zurich and Zug generally follows international standards of coding, definition of incidence date, and rules regarding multiple primaries.

Validity.
The DCO rate decreased from over 6% to less than 1% in 2014. Internationally, DCO rates of <5% are regarded as satisfactory. The DCO rate for the Cancer Registry Zurich and Zug decreased below 5% around the year 2000 and below 1% in 2005. The increase to 3% in 2009 was triggered by limited data access to two pathology laboratories at that time (of which one delivered the reports at a later stage). However, in general, DCO rates for the Cancer Registry Zurich and Zug were in an acceptable range and have been below 5% since the year 2000. A general increase in MV% was observed for most cancer sites in the Cancer Registry Zurich and Zug between 1997 and 2014. This indicates that a higher proportion of cancer cases was based on histology reports, as MV% generally reflects the diagnostic process. PSU% was overall low (<3%) with a peak in 1997. The increase between 1980 and 1997 may be due to increased awareness of PSU and cancer diagnostics. The subsequent decrease may likely be due to improved diagnostic techniques that allowed for finding the primary site in a higher percentage of new cancer diagnoses (Binder et al., manuscript in preparation).

Timeliness.
There is no formal definition of timeliness in a cancer registration context [1]. However, some standards have been set. The Centers for Disease Control and Prevention/National Programme of Cancer Registries request that, within 24 months of the close of the diagnosis year, 95% of expected unduplicated cases are available to be counted as incident cases [1]. Similarly, the North American Association of Central Cancer Registries defines this time span to be 23 months [1]. In the Cancer Registry Zurich and Zug, the difference in the number of cases reported for the incidence years 2012 and 2013 in the annual reports 2014, 2015, and 2016 is mostly smaller than 5%, indicating that 95% of cases were registered within 24 months. The proportion is specifically low for melanoma (about 1%). For leukaemia, the proportion is up to 10%, indicating that these cancers were more frequently missed within two years and we get notifications for these later on. This is in line with other research indicating that (lymphoid) leukaemia was systematically underregistered [16]. One reason could be that chronic types of leukaemia are often diagnosed in outpatient settings where the notification procedures for cancer registries are not yet well established.
A national law on cancer registration will presumably come into force in 2020. This law aims to accelerate the process of cancer registration all over Switzerland, such that, at the end of one year, the incident cases of the previous year should be completed.

Completeness.
Overall, cancer incidence rates of men and women in the Canton of Zurich increased. Similar trends were reported for Norway [5], Iceland [6], and Finland [7]. The age-specific childhood cancer incidence rates were within the limits of the reference values, although close to the upper limit especially for boys aged 0-4 years. Other (mostly Nordic) countries have reported high rates of childhood cancer [7], which can be attributed to true variations in underlying risks [21].
The most common cancers were prostate cancer in men and breast cancer in women. While breast cancer is still increasing, prostate cancer increased up to 2005 and decreased thereafter, which probably reflects the introduction of PSA testing. An increase in the number of breast cancer cases was also observed in other countries such as Bulgaria [4], Iceland [6], Norway [5], and Finland [7]. The lung cancer trends reflect changing smoking patterns in the population, with a decreasing proportion of smokers in men and an increasing proportion in women in the last decades [22]. While the incidence of colon and rectum cancer was relatively stable over time, the incidence of skin melanoma more than doubled in the observation period. Increases have been observed in other European countries such as Italy [23], Finland [24], and Lithuania [25]. Furthermore, compared to other European countries, the incidence rate of skin melanoma is high in Zurich and Switzerland in general [26]. Comparably high rates were observed in Nordic ICD- 10 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999  countries such as Denmark, Norway, Netherlands, and Sweden, whereas Central, Eastern, and Southern European countries had mostly much lower rates [26]. Reasons for the high rates in Switzerland are assumed to be an increased sun exposure (due to travel behaviour favourably to sunny destinations, more frequent outdoor activities, and increased use of sunbeds) and increased dermatological consultations leading to greater awareness [27].
The site-specific M:I ratios are comparable to other European countries with values closer to one for pancreas, liver, brain, and lung cancer [5,6]. M:I ratios >1 (e.g., for pancreatic cancer) are probably due to incorrect coding on the death certificate or can occur because the incident and the mortality cases in one calendar year are not necessarily referring to the same patients.

Strengths and Limitations.
Strengths of the study are the presentation of a variety of methods that address data quality focusing on the four main indicators of data quality. Furthermore, due to cancer registration dating back to 1980 in the Canton of Zurich, data quality indicators were presented over time, demonstrating that the indicators improved over time and were overall in an acceptable range. A limitation is that only semiquantitative but no quantitative methods to reflect completeness as suggested by Parkin & Bray were applied [2]. Moreover, due to limited access to vital status data, no methods that are based on survival were used, such as the Flow method [3]. However, with the new cantonal law, access to vital status data will be improved. The national law that will come into force in about 2020 foresees an annual matching of the cancer registry data with mortality data of the Central Compensation Office based on the unique personal insurance number, which will considerably improve survival data in the Cancer Registry Zurich and Zug.

Conclusions
The Cancer Registry Zurich and Zug has a long experience of cancer registration starting in 1980. Overall, the access to data is relatively good and is likely to improve with the new cantonal law on cancer registration that came into force in early 2017. The adherence to international standards is good. According to the methods presented in this review, the data quality of the Cancer Registry Zurich and Zug was acceptable. Most indicators improved over time with low DCO rates, high MV%, low PSU%, and relatively low M:I ratios. In addition, age-specific incidence rates of childhood cancer were within the reference limits. A drawback is the limited access to vital status information, which poses a problem on survival analyses. However, the new cantonal law and the national law that will come into force in 2020 will certainly improve this issue. Good data quality is a prerequisite for using cancer registry data for monitoring, research, and health policy making.

Data Availability
In general, cancer registry data are not publicly available. Anonymised cancer incidence data for Switzerland by cancer site, sex, period, and canton are available at http://www.nicer .org/NicerReportFiles2017/EN/report/atlas.html?&geog=0.