An assessment of GLOBOCAN methods for deriving national estimates of cancer incidence

Abstract Objective To assess the validity of the GLOBOCAN methods for deriving national estimates of cancer incidence. Methods We obtained incidence and mortality data from Norway by region, year of diagnosis, cancer site, sex and 5-year age group for the period 1983–2012 from the NORDCAN database. Estimates for the year 2010 were derived using nine different methods from GLOBOCAN. These included the projection of national historical rates, the use of regional proxies and the combination of national mortality data with mortality to incidence ratios or relative survival proportions. We then compared the national estimates with recorded cancer incidence data. Findings Differences between the estimates derived using different methods varied by cancer site and sex. Methods based on projections performed better where major changes in recent trends were absent. Methods based on mortality data performed less well for cancers associated with small numbers of deaths and for cancers detectable by screening. In countries with longstanding cancer registries of high quality, regional-based, or trends-based incidence estimates perform reasonably well in comparison with recorded incidence. Conclusion Although the performance of the GLOBOCAN methods varies by cancer site and sex in this study, the results emphasize a need for more high-quality population-based cancer registries – either regional or, where practical and feasible, national registries – to describe cancer patterns and trends for planning cancer control priorities.


Introduction
Cancer is among the most common causes of morbidity and mortality worldwide, with an estimated 14 million new cases and 8 million deaths in 2012, projected to rise by at least 70% by 2030. 1 Timely and accurate cancer statistics are crucial to identify priorities for cancer control strategies at the national level. Yet, only 34 of 194 World Health Organization (WHO) Member States presently report high-quality national mortality data, 2 while 68 countries provided high-quality incidence data for the last volume of Cancer incidence in five continents. 3 As a result, many policy-makers rely on national cancer incidence and mortality estimates of variable precision to inform cancer control priorities.
GLOBOCAN, a project of the International Agency for Research on Cancer (IARC) provides estimates by cancer site and sex using the best available data in each country and several methods of estimation. 1 Producing high-quality estimates therefore requires a dual approach of improving the reported data (developing cancer registries and civil/vital registration systems) and a continual assessment of the validity of the estimation procedures to improve the methods used.
This study focuses on the validity of the methods used in GLOBOCAN to derive national cancer incidence estimates, based on a retrospective comparison of these estimates to the observed national data in a setting with high quality cancer registry data. Although we focused on the methods most commonly used in high-income countries, we also aimed at providing insights into the validity of the methods more broadly, including methods used more predominantly in low-and middle-income countries.

Recorded data
To validate the nine methods used in GLOBOCAN to estimate national incidence in 2012 (GLOBOCAN 2012), long-term na-tional and regional incidence and mortality data as well as 5-year relative survival estimates are required. Of the few countries with such data available, we selected Norway because of the consistently high quality of its cancer registry data, available nationally and by region. Cancer reporting is a legal requirement in Norway and data linkage procedures with the cause of death registry further increase the completeness of the information. For the period 2001-2005, data completeness was estimated at 98.8%, while 93.8% of the cases had been verified by examining biopsy samples under a microscope. 4 From the Nordic cancer database NORDCAN, we extracted Norwegian incidence and mortality data by region, year of diagnosis, cancer site, sex and 5-years age group (starting at 0-4 and ending at 85+) for the period 1983-2012. 5 We also extracted Norwegian 5-year relative survival proportions for each cancer site as well as incidence and mortality data from neighbouring countries Denmark, Finland, Iceland and Sweden. 5 As with GLOBOCAN 2012, national population data were obtained from the United Nations 6 while regional population data were extracted from NORDCAN. 5 Cancer sites of the recorded cases and deaths were grouped by the codes in the International statistical classification of diseases and related health problems, 10th revision (ICD-10) to correspond to the sites used in GLOBOCAN. Unspecified neoplasms of the uterus (ICD-10 code C55) were reallocated to the cervix (C53) and corpus uteri (C54) according to the respective proportions of these two sites in the different datasets. 7 We computed the number of cases by sex and cancer site in Norway in 2010 as the average of the recorded cancer cases between 2009 and 2011 to define a gold standard for comparisons. We then applied each of the nine methods used in GLOBOCAN 2012 to estimate the number of cancer cases in Norway in 2010, by sex and cancer site, and compared these estimates with the gold standard.
Objective To assess the validity of the GLOBOCAN methods for deriving national estimates of cancer incidence. Methods We obtained incidence and mortality data from Norway by region, year of diagnosis, cancer site, sex and 5-year age group for the period 1983-2012 from the NORDCAN database. Estimates for the year 2010 were derived using nine different methods from GLOBOCAN. These included the projection of national historical rates, the use of regional proxies and the combination of national mortality data with mortality to incidence ratios or relative survival proportions. We then compared the national estimates with recorded cancer incidence data. Findings Differences between the estimates derived using different methods varied by cancer site and sex. Methods based on projections performed better where major changes in recent trends were absent. Methods based on mortality data performed less well for cancers associated with small numbers of deaths and for cancers detectable by screening. In countries with longstanding cancer registries of high quality, regional-based, or trends-based incidence estimates perform reasonably well in comparison with recorded incidence. Conclusion Although the performance of the GLOBOCAN methods varies by cancer site and sex in this study, the results emphasize a need for more high-quality population-based cancer registries -either regional or, where practical and feasible, national registries -to describe cancer patterns and trends for planning cancer control priorities.

Research
National estimates of cancer incidence Sebastien Antoni et al.

Estimation methods
The GLOBOCAN methods are summarized in Fig. 1, together with the algorithm used to select them in GLOBO-CAN based on the availability of data in each country. More details can be found elsewhere. 1,8 Fig. 2 illustrates which method was used for each country within the GLOBOCAN 2012 project.
The data required for each of the nine methods are summarized in Table 1.The methods used may produce under-or overestimates at different cancer sites. Therefore, presenting an overall number of cases based on the sum of site-specific numbers could be misleading, if aggregated overestimates and underestimates cancel each other out. We thus report separately the total number of cases underestimated and overestimated for each method. These were then aggregated to assess the differences between the results and the Norwegian recorded data.
All analyses were performed using the R software package (The R Project for Statistical Computing, Vienna, Austria).

Methods 2 to 7
For methods 2 to 7, we used incidence and/or mortality data from 2003-2007 to simulate a real-life situation where data from the latest volume of Cancer incidence in five continents (Vol. X) would be used. 3 The 2010 Norwegian mortality data used in methods 3 to 5 were estimated as in GLOBOCAN 2012 by projecting rates for the period 1988-2007 to 2008-2012.
In method 3, mortality:incidence (M:I) ratios from regional registries are used as a proxy for national case-fatality rates. National incidence rates can then National mortality data available

Selection of a method of estimation
Some high-HDI countries and countries with short-term incidence statistics (e.g. Belgium, Namibia, Uruguay) European, Latin American countries and some Asian countries with regional but no national cancer registry (e.g. Brazil, China, France) The Caribbean, Latin American and some western Asian countries with no incidence data but national mortality data (e.g. Jamaica, Kazakhstan, Mexico) African and Asian countries with regional incidence data but no national mortality data (e.g. Algeria, India, Malawi, Uganda) Eastern Asian and African countries with no incidence or mortality data but site-specific frequency data available (e.g. Bangladesh, Ghana) Eastern Asian and African countries with no incidence, mortality or frequency data available (e.g. Angola, Cambodia, Lao People's Democratic Republic)

Method 2
Most recent national incidence rates applied to 2010 population Methods 3-4 Derived from national mortality and M:I ratios

Method 5 Derived from national mortality and modelled survival
Methods 6-7 Rates from one or more regional registry

Method 9
Simple average of rates from neighbouring countries Notes: For Method 1, two projections were done. Method 1A used NORDPRED (5-year intervals, > 15 years of data) and method 1B used DEPPRED (annual, < 10 years of data). Method 2 also used for cancer sites with stabilizing rates following large temporal variations (for example due to screening). Method 3 used M:I ratios from regional registries and method 4 used data from neighbouring countries. Method 6 used rates from one registry and method 7 used weighted rates from multiple registries. National estimates of cancer incidence Sebastien Antoni et al. be inferred from national mortality data along with the M:I ratio. This is useful where regional registries are numerous but not necessarily nationally representative, as in Italy 11 or Japan. 12 Where no such regional population-based data are available, data from neighbouring countries can be used (method 4). To generate the M:I ratios used in method 3, we included recorded cancer cases and deaths from all regions of Norway except for the south-eastern region (that includes Oslo). In some high-income countries (e.g. France or Japan) national estimates are derived from regional cancer registry data that do not cover the capital city which is usually highly populated. We also included recorded cases and deaths from other Nordic countries for cancer sites with less than a hundred deaths in Norway (e.g. cancers of the larynx, testis and thyroid and Hodgkin lymphoma).
Method 5 estimates national cancer incidence by using national mortality and 5-year relative cancer survival data, using the equation: where M is the mortality rate, I is incidence rate and S is the 5-year relative survival proportion. Method 6 was based on incidence data from the northern and western regions of Norway, while we selected the south-eastern region (including Oslo) for method 7. For GLOBOCAN estimations, regional incidence data are often only available from large cities, particularly in low-and middle-income countries (e.g. Uganda, Zimbabwe).

Methods 8 and 9
The incidence rates from neighbouring countries used in methods 8 and 9 were computed using data from Nordic countries for the period 2009-2011.

Results
In 2010, 14 507 new cancer cases were recorded in Norwegian men and 12 466 in women. Our corresponding estimates, based on GLOBOCAN methods, differed by 5.7-18.8% (834 to 2341 cases) from the observed data (excluding method 5). Fig. 3 summarizes the sexspecific numerical differences according to each method, with under-and overestimates reported separately, as well as the overall difference as a percentage with observed data.
Comparing incidence estimates to observed data across cancer sites by sex, estimates based on data from one regional cancer registry (method 7) performed best in men (mean of 5.7%, or 834 difference between estimated and observed cases), while projection of medium-term historical rates (method 1B) performed best in women (mean: 6.1% difference; 763 cases). When considering both sexes together, and among the methods usually used in high-income countries (methods 1 to 4), the most recent recorded rates applied to 2010 population (method 2) performed well with a 6.4% (1726 cases) difference between observed and estimated cases. However, when prostate and breast cancers were excluded, projection of rates (methods 1A and 1B) produced very similar overall estimates to those from method 2 (at most a 5.0% (723 cases, 1B) and 7.7% (958 cases, 1A) difference; Fig. 3). Apart from methods 1A, 1B and 5, all methods tended to underestimate the total number of cases.
Our estimates by cancer sites show variability in the performance of the different methods (Fig. 4). Overall, methods commonly used in highincome countries performed quite well in estimating recent cancer incidence in Norway. Method 1A produced the Historical national incidence data -Availability of robust data on cases/population size -Recent incidence trends continue into near future 2 Recent national incidence data -Availability of robust data on cases/population size -Stable incidence rates in near future 3 National mortality data and M:I ratios from regional registries within the country -Availability of robust data on cases/deaths -Trends in incidence, mortality and survival are relatively stable over time -Case fatality in combined regions representative nationally 4 National mortality data and M:I ratios from registries in neighbouring countries -Availability of robust data on cases/deaths -Trends in incidence, mortality and survival are relatively stable over time -Case fatality in combined neighbouring countries representative nationally 5 National mortality and 5-year relative survival data -Availability of robust data on deaths and survival -Trends in incidence, mortality and survival are relatively stable over time -Five-year survival proportion a reasonable proxy for clinical cure 6 Rates from one regional registry within the country -Availability of robust data on cases/population size -Incidence rates in single region representative nationally 7 Rates from multiple regional registries within the country -Availability of robust data on cases/population size -Incidence rates in combined regions representative nationally 8 Data from all sites by age and sex and frequency data by cancer site -Availability of robust data on total cancer cases -Total cases and cancer-specific frequencies representative nationally 9 Data from neighbouring countries -Availability of robust data on cases/population size -Incidence rates in combined neighbouring countries representative nationally M:I: mortality:incidence.  Fig. 6). Methods 3 and 4 generally produced underestimations at major cancer sites except for melanoma of skin in women (+17.4%; +139 cases and +34.0%; +271 cases using methods 3 and 4, respectively). These two methods performed less well for rare cancers (e.g. gallbladder cancer or Hodgkin lymphoma) or those with a good prognosis (e.g. testis or thyroid cancers; Table 2 and Table 3). Among the methods commonly used in low-and middle-income countries (methods 5 to 9), the method using mortality combined with 5-year relative survival proportion (method 5) produced quite large overestimates for cancers associated with good survival including melanoma of skin in women (+99.6%; +794 cases), prostate (+92.8%; +4208 cases) and breast (+57.0%; +1649 cases) and underestimates for cancers with small numbers of deaths, including testicular (−40.3%; −116 cases) or gallbladder (−61.6%; −45 cases in men, −59.5%; −50 cases in women) cancers. Estimates for lung and pancreatic cancers were similar to, or more accurate Fig. 3. Observed and estimated

Research
National estimates of cancer incidence Sebastien Antoni et al.
than, those obtained from method 3 and 4 ( Table 2 and Table 3).
The performance of methods using data from one or more regional registries (methods 6 and 7) varied greatly by cancer site. Estimates for prostate, colorectal, lung and breast cancers were reasonable (less than 8% difference between estimates and observed data); method 6, however, underestimated female lung cancer estimates in our study (−20.3%; −245 cases). Despite the use of observed data (instead of GLOBOCAN estimates), results from methods 8 and 9 were also almost exclusively underestimates and their accuracy varied greatly by cancer site and sex (Table 2  and Table 3).

Discussion
Our results, validated against the highquality data available from the Norwegian Cancer Registry, confirm that projections of historical national data are among the best methods to predict recent cancer incidence. They also suggest that, in selected populations, a site-specific approach is warranted for cancers where the level of incidence is driven by changes in diagnosis patterns (e.g. thyroid) or screening (e.g. breast, prostate). They also illustrate how the accuracy of national estimates based on geographic proxies -including data from regional registries or neighbouring countries -is highly dependent on the extent to which these datasets are representative of the scale and profile of the country of interest.
In Norway, where long-term national cancer incidence data series are available, the projection of historical rates 9 (method 1A) resulted in a relatively good estimation of recent incidence statistics. Projections-based methods captured medium-to long-term trends reasonably but did not perform as well when there were recent changes in the trends. For example, prostate cancer rates increased by 4.3% annually in Norway between 1985 and 2008 13 but plateaued in recent years. 14 Thus method 2, which simply applies the most recent cancer incidence rates available to recent population data, performed better than a projection of historical rates in this context (Fig. 5). On the other hand, lung cancer rates have uniformly increased in Norwegian women 15 in recent years, explaining the good quality of estimates based on trends for this cancer (Fig. 6).
Applied to the Norwegian data, methods 3 and 4 were less accurate than the first two methods and underestimated the overall number of cases. They were notably less reliable for cancer sites with small numbers of deaths such as thyroid (males) or testicular cancers. Although the incidence of testicular cancer has uniformly increased in Norway over recent decades, mortality from this cancer has declined since the late-1970s, leading to low numbers of annual deaths  National estimates of cancer incidence Sebastien Antoni et al.
(13 deaths nationally in 2010). 14 In this context, methods 3 and 4 failed to accurately estimate incident cases in age groups where deaths are rare and tend to underestimate the overall cancer burden. Furthermore, these methods also depend on the representativeness of the proxy datasets used to compute the M:I ratios on which they rely. In GLOBOCAN 2012, method 5 was mainly used in the Caribbean, Latin America and some Asian countries. Applied to Norway, it performed equivalently or better than methods 3 and 4 in cancers with a poor prognosis such as lung or pancreatic cancer, for which the 5-year relative survival proportion in Norway is 15% and 6%, respectively, for male diagnoses 2009-2012. 14 However, the method was inadequate for cancers with good prognosis such as melanoma, breast or prostate cancers, where the 5-year relative survival rate was above 80%. 14 For the latter two cancers, cure is not apparent at 5 years and survival proportions continue to decline in further years of follow-up, 16 thus invalidating the equation used to calculate incidence (Equation 1).
It is likely that method 5 combined with longer-term relative survival estimates would produce better incidence estimates for cancers with a good prognosis. In Norway, 10-year relative survival proportions for prostate and breast cancers are available and reduced to 58% and 71%, respectively. 17,18 However, such data are less frequently available than 5-year relative survival proportions, particularly in countries were method 5 would be applied. In many low-and middle-income countries, where curative treatments may not be available and hence the M:I ratio is higher, the 5-year survival proportion may be a better proxy of case-fatality. For example, 5-year relative survival proportions for breast cancer in Costa Rica was 68% for diagnoses 1995-2000 19 while the M:I ratio was 31.8% based on data from 1998-2002, 20 indicating that method 5 would produce reliable estimates in this setting.
Because of the paucity of cancer data, national incidence in low-and middle-income countries is often estimated using datasets from regional registries or neighbouring countries. Most of the GLOBOCAN 2012 estimates for Africa and south-east Asia were based on such data (methods 6 to 9). Applying these methods to the Norwegian Table 2. National estimates of cancer incidence Sebastien Antoni et al.
data illustrated the problem of a lack of representativeness of proxy datasets used to derive national cancer incidence. For example, method 7, where data from the country's capital city were used, provided relatively good overall estimates for Norway. In many low-income countries, the differences are likely to be considerably greater where there are marked differences in the profile of cancer in rural and urban settings. As an example, the breast cancer rate in Mumbai, India (a major urban area) was 31.0 per 100 000 person-years in 2008-2009, more than 2.5 times the rate observed in Barshi (12.3 per 100 000), a rural area, in 2009-2010. 21 Producing accurate national cancer incidence estimates is a difficult task that depends on multiple factors: the availability of high-quality cancer registry data, the use of valid and reproducible estimation methods and the representativeness of proxy datasets used for calculations. Because this study was performed using high-quality cancer registry data from a high-income country, the impact of data quality issues and regional variations of the cancer burden on our results are likely to be minimal. The findings should mainly reflect the intrinsic characteristics of the different methods of estimation. On the other hand, it also means that our site-and method-specific results cannot be generalized to other countries and may not be valid in different settings. However, our study provides general conclusions regarding the context in which the different methods are likely to produce reliable estimates, provided that the required data are available.
The study provides a comparative assessment of the different methods of estimation of national incidence used in GLOBOCAN as well as some general guidance on the caveats associated with certain methods of estimation for specific cancer types. In particular, they indicate that in countries such as Norway with longstanding high quality population-based cancer registries, regional-based or trends-based estimates perform reasonably well in comparison with recorded incidence. However, such an evaluation of the validity of the estimates themselves is only possible in a few countries with high-quality national data. Elsewhere, data quality issues or a lack of national representativeness of regional datasets could potentially undermine the validity of the estimates and the evidencebased evaluation process. Assessment of uncertainty would also require additional adjustment for the completeness, accuracy and representativeness of the source information.
Along with the continuous assessment and improvements of estimation methods, efforts should be targeted at supporting the development of cancer registration worldwide. The Global Initiative for Cancer Registry Development 22 is a global partnership launched in 2011 with a goal to increase the coverage and quality of registries in low-and middle-income countries. The partnership plays a critical role in capacity-building, to attain more robust data for national and global cancer estimation purposes and aid countries in the prioritization and evaluation of national cancer control plans. ■