Exploring Patterns and Trends with Selected Cancer Rates Reported by China National Cancer Registry: Alternative Perspectives and Findings

Background: National cancer registration reports provide huge potential for identifying patterns and trends of policy, research, prevention and treatment significance. Yet given the range of factors involved in cancer onset, case identification, progression and reporting, pin-pointing this complexity requires systematic thinking and varied strategies of data analysis. Methods: The study extracts data about incidence rates (IRs) and mortality rates (MRs) of lung, stomach, colorectal and liver cancers for 2004, 2006 and 2009 from relevant China National Cancer Registry (CNCR) reports and analyzes the data using line-graphs, ratios and logistic growth modeling. Results: The study shows that: a) all line graphs of age-specific IRs and MRs of the 4 cancers characterized typical S-shape with substantial differences in terms of smoothness, height and proximity; b) MR lines mimicked and located below the corresponding (of the same cancer, population group and year of reporting) IR lines for almost all the age groups except 1 to 2 oldest ones; c) colorectal cancer witnessed the lowest MR/IR ratios on average followed by gastric and lung cancers and all such ratios featured an increasing trend along the age spectrum; d) urban vs. rural ratios in IRs or MRs showed an increasing trend along the age axis for 3 out of the 4 cancers but a typical v-shaped curves for stomach cancer; e) the lines of recent vs. early ratios in cumulative IRs or MRs for urban areas located apparently closer than that for rural areas; f) all the age-specific IRs and MRs fitted very well with logistic growth models (goodness of fit> 0.91) and the integrations and ages when the models reached 5%, 50% or 95% of their highest values yielded interesting features. Conclusion: The study provides useful perspectives for analyzing age-specific IRs and MRs and reveals a number of interesting patterns and trends with cancer counts reported by CNCR.


INTRODUCTION
Cancer registry (CR) is gaining recognition worldwide [1,2]. Many countries, including China, have established large scale long-term operating CR systems [2][3][4]. These systems have accumulated large amount data about incidence rates (IRs) and mortality rates (MRs) of combined and specific cancers and thus a huge potential for identifying patterns and trends of policy, research, prevention and treatment significance [4][5][6][7][8]. However, most CR data are published in raw dataset with primitive groupings or summary reports (usually at an annual base) falling far short from exploring their full potential. Up to date, CR data have been used mainly in describing cancer distribution among different groups, assessing or predicting cancer burdens, and modeling age, cohort and time (APC) effects on cancers [9][10][11][12]. In addition to these, CR data may be used in many other ways. In a previous paper [13], we tried to identify some of the patterns and trends with the IRs/MRs of all cancers behind available reports published by China National Cancer Registry (CNCR). The paper addressed several features with agespecific IRs/MRs reported by CNCR and possible contributing factors. These included: S-shaped age-specific IRs/MRs; identical patterns between MRs and IRs along the age spectrum; positive differences in age-specific IRs and MRs (i.e., IR minuses MR ) for almost all the age groups but the oldest couple ones (i.e., 80-84 and 85 years plus); big discrepancies between the secondary peaks of age-specific MR/IR ratios for urban and rural females; U-shaped urban versus rural ratios of age-specific IRs and MRs; mixed trends in the IRs and MRs between different years etc. These provide useful perspectives and examples for exploring the mounting data from CR and other relevant initiatives.
Cancers of different types or locations are heterogeneous in terms of causes, progression, symptoms, diagnosis, treatment and prognosis [14][15][16][17]. Collective characteristics observable with all cancers combined together may differ substantially from that of specific cancers. Therefore, this paper examines patterns and trends with incidence and mortality rates of specific cancers from similar perspectives as we adopted in our previous work and others and compares the findings between these specific cancers and that of all types of cancer as a whole. Given that the CNCR annual reports provide aggregate data on 46 specific cancers, it is impossible for us to address each of these cancers within a single paper. Instead, we had to be selective and focused on only four leading (lung, stomach, colorectal and liver) cancers. Leading cancers also mean most serious health threats and thus worth top attention for any kind of studies; while four was the largest number of cancers to be fitted into manageable figures (like Fig. 1) and tables (like Table 1).

Data Source
The study used CNCR annual reports as source data. It extracted incidence and mortality rates in 2004, 2006 and 2009 from CNCR Annual Reports 2004Reports , 2009 and 2012, respectively. CNCR Annual Report 2004 is the earliest available report of the kind; while CNCR Aannul Report 2012, the latest one. All of the reports provide incidence and mortality counts by type of cancer, age, gender, registry site and region (urban vs. rural) etc. Due to space limit, this study extracted and analyzed only data about lung, gastric, colorectal and liver cancers (further referred to as 4-cancers) (see Appendix A). They are the four most common types of cancers in China according to CNCR Annual Report 2012.

Data Analysis
The study adopted mainly descriptive analysis. It calculated 4 kinds of indicators and portrayed their patterns and trends in line or histograms graphs using Microsoft Excel 2010. These indicators include: a) age-specific IRs and MRs by gender, region (urban or rural) and year of reporting; b) age-specific MR/IR ratios by gender, region and year of reporting; c) urban versus rural ratios in terms of age-specific IR and MR by gender and year of reporting; and d) age-,gender-and region-specific ratios between accumulative IR (or MR) reported in a later year (e.g., 2009) and that reported in an earlier year (e.g., 2006). Here, an IR or MR for a given cancer and group equals the number of the cancer incidence cases registered for the group divided by the total number of people within the group; while an MR/IR ratio, the IR in a certain year divided by the MR in the same year; an IR (or MR) ratio between urban and rural areas, the IR (or MR) of urban areas in a certain year divided by the IR (or MR) of rural areas in the same year; a accumulative IR (or MR) for a given age (say age X), sum of all IRs (or MRs) from age 0 up to age X reported in a given year. Reported IRs and MRs for some age groups (e.g., age 0 through to 25) were extremely low. This made ratios generated using these IRs or MRs vary substantially and misleading. In order to prevent such problems, the calculation of part of the indicators excluded these ages.
The study also performed a series of modeling which used SPSS version 16 as calculation tool and reported age-specific IRs or MRs as observed data and produced 3-parameter logistic growth equations using formula P x = p max /(1+e b-kx ). Where x stands for age; and P x , cancer incidence rate for a given age x; p max , the biggest cancer incidence rate for all ages; k, growth rate; while b serves as a baseline growth rate that determines the location of "the rapidly growing phase" of the growth or S-curve along the age spectrum.
In addition, the study calculated integrations of all the logistic growth equations derived via the above process, A 0.5MIR (the age when the IR of a given cancer reached 50% of its max values), time lags between A 0.5MIR and A 0.5MMR (the age when the MR of a given cancer reached 50% of its max) and between A 0.95MIR and A 0.05MIR , and MR vs. IR integrations ratios (a MR vs. IR integration ratio equals the integration of an IR model for a specific subgroup divided by the integration of the MR model for the same subgroup).

Simple Line-graphs
Figs. 1a-p depicts, in line-graphs, the agespecific IRs and MRs of the 4-cancers by gender (males, females), region (urban, rural) and year of reporting (2004,2006 and 2009) respectively (see Appendix B for detailed data). All these lines characterized atypical S-shaped curves consisting of a relatively low and stable phase from age 0 to around age 35, a rapidly growing phase from around age group 35 to75, and a final phase with slowing down increase, even slight decrease. These lines showed substantial differences in terms of smoothness, height and proximity. The lines representing IRs or MRs of lung cancer for urban males located the highest; followed by IR or MR lines of stomach cancer for rural males; and IR or MR lines of lung cancer of rural males. The IR and MR lines for rural areas witnessed greater sub-trend variations than that for urban areas. The IR and MR lines of lung and liver cancers located much closer to each other compared with that of colorectal and stomach cancers. All of the MR lines located below the corresponding (of the same cancer and year) IR lines for almost all the age groups except 1 to 2 oldest ones (i.e., 80-84 and 85 years plus).

MR vs IR Ratios
Figs. 2a-p shows, in lines again, the age-specific MR/IR ratios of 35 and older population for different subgroups in year 2009 (blue lines), 2006 (red lines) and 2004 (green lines). The ycoordinate of the lines (the ratios) ranged from 0.54 to 1.25, 0.41 to 1.60, 0.24 to 1.24, and 0.63 to 1.45 for lung, gastric, colorectal, and liver cancers respectively. Colorectal cancer witnessed the lowest MR/IR ratios on average followed by gastric and lung cancers. All the lines showed an increasing trend from younger to older age and the pace of increase remained relatively slow until some age around 60 and then began to grow faster and faster. Greater sub-trend variations or fluctuations in the lines appeared for rural than urban areas and for females than males. For any given cancer and subgroup, the 3 colored (blue, red and green) lines displayed similar trend and intertwined together without apparent difference; yet the lines for year 2009 appeared to be somewhat smoother than the other two.  a) all the lines ended within an ratio range from 0.62 to 1.43; b) all of the red and 13 out of the purple lines ended above 1 but only half of the blue and 6 out 16 of the green lines did so; c) the four kinds of colored lines consisting the gender specific figure components for urban areas (Fig.  4, columns 1-2) located apparently closer than that for rural areas (Fig. 4, columns 3-4) and, for both gender subgroups in rural areas, most part of the blue and red lines located below 1, while green and purple lines, above 1; d) only a small part of lines demonstrated some extent of decreases from age group 35 to 85+, e.g., the blue lines of lung, stomach and colorectal cancers among rural males (Figs. 4c, g, k) and of stomach cancer among rural females (Fig. 4h), and the green lines of lung cancer among rural males (Fig. 4c) and of liver cancer among urban males and females (Figs. 4m, n). Table 1 provides parametric estimates of the logistic growth models of the age-specific IRs and MRs of the 4-cancers. Goodness of fit for all subgroups was estimated as high as over 0.91. Yet, the 3 parameters defining the models showed substantial variations: P max (the highest IR or MR) ranged from 56.211 (for colorectal cancer) to 772.583 (for lung cancer); b, from 6.046 (for liver cancer) to 16.532 (for lung cancer); and k, from 0.386 (for liver cancer) to 1.173 (for lung cancer). As shown in Fig. 5 and Appendix C, integrations of the IR and MR models ranged from 381.84 to 3018.49 and from 245.10 to 2852.02 respectively; while A 0.5MIR , from 43.81 to 66.86. If examined on cancer by cancer base, the IR and MR integrations witnessed greater values in males than females for all the 4-cancers and in urban than rural areas for lung and colorectal cancers; while A 0.5MIR , moderate yet consistent urban over rural difference. The time lag between A 0.5MIR and A 0.5MMR presented substantial variations (from -0.40 to 15.05) with longer lags for colorectal and stomach cancers over lung and liver cancers and for urban areas over rural areas. Time lag between A 0.95MIR and A 0.05MIR differed from 25.10 for lung cancer to 67.53 for liver cancer. The MR vs. IR integrations ratios were the highest (115.43%) for liver cancer and the lowest (59.94%) for colorectal cancer. And 8 out of these 48 ratios valued even greater than 100%.    Ratios enable quantitative comparisons between the two indicators under concern. An age-specific MR to IR ratio is co-determined by: a) slope of trend (increase or decrease) in IRs along the age spectrum; b) survival time of the cancer under concern; c) quality of IRs and MRs reported. Therefore, the increasing age-specific ratios for all the subgroups (Fig. 2) may due largely to accelerating increases, along the age span, in all the corresponding IRs. And survival time may be a major reason for the relatively lower MR/IR ratios of colorectal cancer followed by stomach, lung and liver cancers. The finding that most of the lines in Fig. 2 increased from below to over 1 in the latest 2 to 3 age groups may not necessarily mean greater real MRs than IRs for these groups. In general, the MR of a given cancer and group should be no-higher than the IR of the same cancer and group. So the phenomenon may due mainly to reduced service utilization by and thus under diagnosis for the elderly [13].

Logistic Growth Models
Similarly, urban vs. rural ratios reflect the combined effects of: physical factors (i.e., heretics, immunity), environmental risks (e.g., smoking, air pollution, and sedentary work), service seeking, and case reporting (i.e., accuracy and completeness of cases and deaths reported). Differences in environment risks may be the main reasons for higher IR and MR of stomach and liver cancers and lower IR and MR of lung and colorectal cancers in urban than rural areas. Improving nutrition, drinking water hygiene and case reporting for rural residents may have played an important role in the narrowing urban vs. rural gaps in IRs and MRs of stomach and liver cancers as manifested by that the blue lines located higher over the red and green lines (Figs. 3e-h and 3m-p) [26-28]; while the decreasing discrepancies in IRs and MRs, for residents aged 70+ or so, as displayed by the apparently higher green lines over the blue or red lines in Figs. 3ad suggest worsening relative air quality for rural residents due to escalating air pollution in rural areas and rapidly growing numbers of farmers seeking temporary jobs in cities.
Regarding ratios depicted in Fig. 4 Logistic growth models may help explore agespecific cancer rates in various ways. First, description of cancer incidence or mortality rates along the whole age span using logistic growth equations becomes estimating the parameters in the equations rather than uncovering rates for all of the ages. Such a shift of focus may result in great resource reduction, since logistic equations generally involve only a few parameters and estimation of these requires much less data than what have usually been collected. Second, if there are sufficient evidences to believe that certain age-specific cancer rates follow logistic growth law, then the goodness of fit estimations can be viewed as a quality indicator of the cancer counts reported. Of the goodness of fit (i.e., R values) of the 48 IR models listed in Table 1, only 7 of them were estimated as higher than that of the corresponding MR models (e.g., the model of IRs of lung cancer among urban males in year 2009 vs. the models of MRs of the same cancer in the same subgroup and year); poorer goodness of fit was also observed with models for rural subgroups compared with that of urban ones. These suggest that the quality of cancer counts reported by rural areas and about IRs were not as good as data from urban registry system and about MRs. Third, mathematical integration of the logistic growth equations may be used to measure overall burden of cancers. As shown in Figs. 5a-b, urban males were the hardest hit (by lung cancer) followed by followed by rural males (by stomach cancer). Fourth, the ages when the age-specific IR or MR of a cancer model reaches 5% (A 0.05 ), 50% (A 0.50 ) and 95% (A 0.95 ) of its highest value (P max ) may serve as indicator ages to inform data analysis and intervention planning. For example, A 0.05 may be used to define the starting age for some targeted interventions (e.g., screening); while the age range between A 0.05 and A 0.95 of a cancer may be viewed as critical ages for stemming the epidemic.
The study suffers from several limitations. First, reported cancer incidence and mortality rates reflect not only actual prevalence of cancers but also performances of registry systems and readers are fully cautioned about potential biases due defects with cancer registration e.g., under reporting, misclassification. Second, the time interval between the earliest (2004) cancer rates and the latest (2009) ones was only 5 years. So our findings in terms between different years may not necessarily represent long-term trends. Third, CNCRs provide similar data about 58 types of most common cancers in China. Yet our study included only four types of cancers due to space limit. Fourth, it used aggregate data extracted from published reports which did allow for more detailed analysis. For example, the study did not mention differences between sub-regions of China, e.g., differences between south and north or east and west China.

CONCLUSION
The study provides useful perspectives for analyzing age-specific IRs and MRs and reveals a number of interesting patterns and trends with cancer counts reported by CNCR.

CONSENT
It is not applicable.

ETHICAL APPROVAL
It is not applicable.