A Heuristic Solution of the Identifiability Problem of the Age-Period-Cohort Analysis of Cancer Occurrence: Lung Cancer Example

Background The Age–Period–Cohort (APC) analysis is aimed at estimating the following effects on disease incidence: (i) the age of the subject at the time of disease diagnosis; (ii) the time period, when the disease occurred; and (iii) the date of birth of the subject. These effects can help in evaluating the biological events leading to the disease, in estimating the influence of distinct risk factors on disease occurrence, and in the development of new strategies for disease prevention and treatment. Methodology/Principal Findings We developed a novel approach for estimating the APC effects on disease incidence rates in the frame of the Log-Linear Age-Period-Cohort (LLAPC) model. Since the APC effects are linearly interdependent and cannot be uniquely estimated, solving this identifiability problem requires setting four redundant parameters within a set of unknown parameters. By setting three parameters (one of the time-period and the birth-cohort effects and the corresponding age effect) to zero, we reduced this problem to the problem of determining one redundant parameter and, used as such, the effect of the time-period adjacent to the anchored time period. By varying this identification parameter, a family of estimates of the APC effects can be obtained. Using a heuristic assumption that the differences between the adjacent birth-cohort effects are small, we developed a numerical method for determining the optimal value of the identification parameter, by which a unique set of all APC effects is determined and the identifiability problem is solved. Conclusions/Significance We tested this approach while estimating the APC effects on lung cancer occurrence in white men and women using the SEER data, collected during 1975–2004. We showed that the LLAPC models with the corresponding unique sets of the APC effects estimated by the proposed approach fit very well with the observational data.


Introduction
For more than 50 years, the importance of accurate accounting for the Age-Period-Cohort (APC) effects has been well recognized by epidemiologists and mathematicians in disease incidence and mortality studies. In such studies, the incidence rate is defined as a ratio of the number of events divided by the total person-years experience. It is assumed that the numerator of this ratio has a Poisson distribution and the standard errors (SE) of the incidence rate are calculated by the ratio of the squared root of the number of events divided by the total person-years [1]. Often, it is also assumed that the logarithm of the incidence rate can be modeled as a linear function of specified regressors: the APC effects. Such models of the incidence rates belong to the so-called generalized linear models [2]. In particular, in the Log-Linear Age-Period-Cohort (LLAPC) model, the observed variable is the logarithm of the incidence rate, which is approximated by the sum of the APC effects [2]. The problem is figuring out how to estimate these effects from the observed incidence rates.

APC analysis
In this work, using the long-term observational data, we determine the APC effects in the frame of the LLAPC model [2]. By definition [1], the crude incidence rate for the given age, time-period (TP) and birth-cohort (BC) intervals, is a ratio of the number of cancer occurrences, O i,j,k , divided by the total personyears at risk, P i,j,k : where the age intervals are indexed as i~1,:::,n; the time periods of cancer occurrences as j~1,:::,m; the birth cohorts of cancer occurrences as k~1,:::,l; and n, m and l are numbers of the age intervals, time periods, and birth cohorts, correspondingly. Let us consider that the temporal intervals, indexed by i, j and k, have the same size (for instance, five-year long intervals that are usually used in the APC studies). In this case, these indexes and the n, m and l numbers are related in the following way [2]: k~j{izn i~1,:::,n; j~1,:::,m; k~1,:::,l ð Þ ð 2Þ and l~mzn{1. It should be noted that, according to (2), index k is uniquely defined by indexes, i andj. Therefore in (1), index kcan be omitted, while keeping in mind that incidence rates are also dependent on the BC effects. The LLAPC model is usually presented by the following system of conditional equations: Y i,j~m za i zb j zc k i~1,:::,n; j~1,:::,m; k~1,:::,l ð Þ , ð3Þ and Y i,j~l n I i,j À Á~l n O i,j P i,j i~1,:::,n; j~1,:: where Y i,j is a logarithm of the observed incidence rate, a i denotes the age effect, b j -the TP effect, c k -the BC effect, and the constant term, m, is the intercept [2]. In this model, weights for the observed data, Y i,j , are chosen to be inversely proportional to their sampling variances,SE 2 Y i,j À Á : where Formula (6) is obtained under the assumption that the numbers of cancer occurrences in each group are independent random variables characterized by a Poisson distribution. It is also assumed that the variances of the incidence rates, O i,j P 2 i,j , are entirely due to variations in the small number of cancer occurrences, O i,j , compared to the total person-years at risk, P i,j , [3]. From (5) and (6) it follows that: The APC problem is to determine from the system of m|n conditional equations (3) with weights (7) the following: (i) the n estimates of the age effects, a i ; (ii) the m estimates of the TP effects,b j ; (iii) the l estimates of the BC effects, c k ; and (iv) the intercept, m. Additional constraints on the parameters must be made to obtain a solution. One approach is to set three effects (one of the TP effects, b j 0 , one of the BC effects, c k 0 , and the corresponding Age effect, a i 0 , where i 0~j0 {k 0 zn) to zero and then to use these settings as the reference levels. Another approach is to set the sums of these effects to zero [2]. In the present work, we use the first approach. From the aforementioned settings and from (1-7), it follows that: 1. I m~e xp(m) presents the modeled incidence rate of getting the cancer, when the anchored parameters are: a i0~0 , b j0~0 , and c k0~0 .
2. I m i,.,.~e xp(mza i ) presents the modeled Age-specific incidence rate of getting the cancer in a given age interval i, when TP and BC effects are absent. 3. I m .,j,.~e xp(mzb j ) presents the modeled TP-specific incidence rate of getting the cancer for a given TP interval, j, when Age and BC effects are absent. 4. I m .,.,k~e xp(mzc k ) presents the modeled BC-specific incidence rate of getting the cancer for a BC interval, k, when Age and TP effects are absent. 5. I m i,j,k~e xp(mza i zb j zc k ) presents the modeled incidence rate of getting a particular type of cancer in a given Age interval, i, a TP interval, j, and a BC interval, k, when all of these effects are present.
In 2), 3) and 4), the Age effects,a i , the TP effects,b j , and the BC effects, c k , can be presented as logarithms of the incidence rate ratios: a i~l n(I m i,.,. =I m ), b j~l n(I m .,j,. =I m ), and c k~l n(I m .,.,k =I m ), correspondingly. Thus, the a i ,b j , and c k parameters are dimensionless and their variations (with respect to the corresponding successive Age, TP and BC intervals) indicate the temporal trends of these effects.

Identifiability problem
The system (3) cannot be solved directly by methods of multiple linear regressions due to the fact that the design matrix of the system (3) of the LLAPC is rank deficient. (This fact can be directly checked in practice, for example, using MATLAB function, rank). This is because the APC effects are linearly interrelated. Consequently, these effects cannot be uniquely and simultaneously estimated (multiple estimators of these parameters provide similar solutions). Mathematically, this problem falls into a category of the identifiability problems that, in turn, are a special subclass of a more general class of the ill-posed or incorrectly-posed mathematical problems. Solving the identifiability problem, in particular, and the ill-posed problems, in general, requires the use of additional assumptions and/or a priori knowledge regarding their solutions [4].
Approaches that have been used in the APC analysis to solve the identifiability problem are reviewed in several papers (see, for example, [2,5,6] and references therein). In these approaches, either three effects (one of the TP effects, one of the BP effects, and the corresponding Age effect) are set to zero and used as reference levels or the sums of these effects are equated to zero. However, these settings are still insufficient for solving the identifiability problem [2] and required the use of additional constraints on a set of the parameter estimates to be determined. Although a variety of additional constraints and the utility of estimable functions (that are invariant for any particular set of model parameters) have already been proposed, the identifiability problem still remains largely unsolved [2,5,6].
In this work, we extended the well-known approach used in the APC analysis for solving the identifiability problem [2,3,7,8], where four redundant parameters within a set of the unknown parameters to be determined are equated to zero. In our approach, we fixed (set to zero) only three redundant parameters and used them as reference levels. In contrast to the ''traditional'' approaches, where all four parameters are equated to zero, we determined an optimal value of the forth parameter using an additional heuristic assumption (see below). We used an effect of the time period adjacent to the anchored time period as such a parameter. We have shown that by varying this parameter from 2' to ', all possible solutions of the APC problem can be obtained. To our best knowledge, such a general solution of the APC problem (a complete family of estimates of the APC effects) which depends only on the one ''identifiability'' parameter is given for the first time in the present work.

A heuristic assumption
To get an optimal value of the identification parameter, we used a heuristic assumption that the effects of the adjacent cohorts are close. This assumption is motivated by the fact that the multi-year adjacent birth-cohorts are overlapping in time intervals. Using this assumption, we developed a numerical method for determining the optimal value of the identification parameter. With the optimal value of this parameter, a unique set of the APC effects can be determined and thus the identifiability problem is overcome. The method for obtaining the optimal value of the identifiability parameter proposed in this work enables one to obtain a distinct solution(s) of the APC identifiability problem depending on a priori assumption(s).

Proof-of-concept
We tested the proposed numerical method while estimating the APC effects on lung cancer (LC) incidence rates in white men and women, using data collected in the SEER 9 database during 1975-2004.

Data preparation
To test the proposed approach, we used the SEER databases that include the number of occurrences of different types of cancer and information on the population at risk obtained from the U.S. Census Bureau. In our study, data on LC occurrence in white men and women collected in SEER 9 during 1975SEER 9 during -2004 were utilized. We used data from the nine registries rather than data from the currently available 17 registries, because the longitudinal nature of our study required utilization of data dating back three decades when there were only nine registries.
From SEER 9, we extracted the first primary, microscopically confirmed LC cases stratified by gender and race. The number of the LC occurrences in white men and women and the corresponding person-years at risk extracted from the SEER 9 were grouped in six five-year cross-sectional TP groups: 1975-79, … , 2000-04; 18 five-year age groups: 17 groups, ranging from 0 to 84 years, and the 18th group including all cases for the ages 85+; and 17 BC groups corresponding to the birth year groups of 1890-94, …, 1970-74. In our study, we used only 12 five-year Age groups from 30-34 years up to 85+, because the observed numbers of the LC cancer occurrences in younger ages were insignificant. The grouped data, tabulated by the age and time-period indexes, are presented in Tables 1, 2, 3, 4.

Statistical methods and software used
For data presented in Tables 1, 2, 3, 4, the LLAPC model was applied and the corresponding design matrices of the systems of conditional equations for white men and women were obtained. These design matrices were checked for rank deficiencies using the MATLAB function, rank. To solve these systems of conditional equations, we applied a novel approach (see below) using the weighted least-square method and utilized the MATLAB function, regress. For determining the optimal values of the identification parameters, we used a program developed in-house, inpar, and written in MATLAB, Version 7.10.0 (R2010a). Validity of the used LLAPC models for assessing the APC effects in the LC occurrences in white men and women were checked by three diagnostic plots [10]: (i) the normal probability plot of the standardized residuals, (ii) the residuals vs. the modeled values plot; and (iii) the observed vs. the modeled values plot.

A solution of the identifiability problem
Let us fix one of the TP effects, b j0 , one of the BC effects, c k0 , and the corresponding Age effect, a i0 , where i 0~j0 {k 0 zn (see (2)). By moving these effects to the left side of the system (3), the number of unknowns in a new system is decreased by three. In practice, these effects are used as reference levels and are usually set to zero.
In such a case, the solution of the APC problem is reduced to determining one parameter -the identification parameter. Let us use the effect, b j0{1 (or b j0z1 ) of the TP, adjacent to the anchored TP, j 0 , as the identification parameter designated by d. When the exact value of d is a priori known, the system (3) can be additionally corrected for this effect by moving this parameter to the left side of Note, when the exact value of d is a priori known, the corrected system (3) has the same weights (7) as system (3) and the design matrix of this weighted system does not have a rank deficiency (this can be directly checked by using the MATLAB function, rank). For assessing the unknowns in the corrected system (3), a standard weighted least squares method can be used. Thus, estimates of the intercept,m Ã , the n{1 numbers of the Age effects, a Ã i , the m{2 numbers of the estimates of the TP effects, b Ã j , and the l{1 numbers of the estimates of the BC effects, c Ã k , and their confidence intervals (CI) can be obtained. Here and below, asterisks ( Ã )denote estimates or set values of the unknown parameters. It should be noted that, in general, these estimates depend on given values of the four redundant parameters: By varying the identification parameter, d, within the interval of its expected variation, a family of estimates of the APC effects can be obtained. In fact, let us suppose the values of the expected variation of the identification parameter lie within an interval, ½{L; L, where Lw0. In this interval, let us choose the following net points: where Nis a natural number bigger than, say, 10, i.e. Nw10. The consequent values of these net points can be used as the variable values of the identification parameter: For each d s value, one can obtain estimates of the APC effects (m Ã , a Ã i , b Ã j , and c Ã k ) and their CIs, as was described previously. Thus, the corresponding family of estimates of the APC effects can be obtained. Theoretically, by varying d from {? to ?, one can obtain all possible estimates of the APC effects (m Ã , a Ã i , b Ã j , and c Ã k ) and their CIs.
The optimal value of the identification parameter,d, can be determined within the interval of its expected variation using an additional assumption. As such, the heuristic assumption that differences between the effects of the adjacent birth-cohorts are small can be used. This assumption is based on the fact that the multi-year adjacent birth-cohorts are overlapping in time intervals, and the identification of a cohort associated with a particular range for period and age is somehow ambiguous [11][12][13].
Using this heuristic assumption, one can numerically determine the optimal value of the identification parameter by minimizing (with respect to d) the weighted average of the squared differences between the estimates of the adjacent BC effects, (c Ã kz1 {c Ã k ) 2 . This minimization problem can be formulated as follows: where the weights, W k , are reciprocals of the variances of the differences between estimates of the adjacent BC effects, . This problem can be solved numerically by getting the net values (10), and calculating for each d s the corresponding weighted average (11). Thus, from these net values, the optimal value,d opt , which minimizes this weighted average, can be obtained.

Assessing model adequacy
To check the goodness of the fit of the modeled values obtained by a multiple linear regression analysis of the observed values, the R 2 statistic as well as the F statistic and its p value, are usually used. However, to compute these statistics, the design matrix of the system of the conditional equations, presenting the model under consideration, has to include a column with ''1''. Otherwise, the obtained numeric values of these statistics can be incorrect and Table 3. Person-years at risk, P i,j (i~1,:::,12;j~1,:::,6), in white men.  Table 4. Person-years at risk, P i,j (i~1,:::,12;j~1,:::,6), in white women. even erroneous [14,15]. In our case, the design matrix of the system of the weighted conditional equations of the corrected system (3) with weights (7) does not include the column with ''1''. Therefore, for assessing the validity of the results obtained by the proposed approach, we utilized the following diagnostic plots [10]: (i) the normal probability plot of the standardized residuals; (ii) the residuals vs. the modeled values plot; and (iii) the observed vs. the modeled values plot. Plot (i) allows one to assess the plausibility of the assumption that standardized residuals, e Ã i,j (the observed weighted values, Y c i,j less the modeled weighted values, (Y c i,j ) Ã , divided by their estimated SE), have a normal distribution. If the assumption of normally distributed residuals is correct, the plot should be sufficiently straight. Plot (ii) checks the aptness of the model. When the model is appropriate, the residuals should be randomly distributed around 0, so all, but a very few e Ã i,j (about 95% of the total number of residuals) should lie between the values of 22 and 2. Plot (iii) should exhibit points located close to the line with a slope of +1 going through the point (0, 0). This plot provides a visual assessment of the effectiveness of the model in making predictions.

Results
In this section, we present the results of the testing of this approach, while estimating the APC effects on lung cancer (LC) incidence rates in white men and women, using SEER 9 data, collected over a 30-year time period.

Testing of the approach
The SEER 9 data collected during 1975-2004 for LC in white men and women were used for testing of the proposed approach. In this testing, preparation of the SEER-based data was performed as described in the Materials and Methods section. The obtained number of cancer occurrences and the total person-years at risk for the given age intervals and time periods are presented in Tables 1,  2, 3, 4.
Using Table 5 and formulas (3) and (7), the design matrices for the LLAPC model of LC in white men and women were built and their rank deficiencies were checked (see Materials and Methods). The obtained rank deficiencies of these design matrices were equal to 4. Therefore, four parameters had to be fixed to determine the APC effects for LC in white men and women by using the corresponding systems of the conditional equations (3) with weights (7). This was done in two steps: (i) by choosing one of the Age effects, one of the TP effects, and one of the BC effects as anchors and setting them to 0; and (ii) by determining the optimal value of the identification parameter -effect of the TP, adjacent to the anchored TP.
To perform the first step, we chose the cell with indexes 9 and 6 (i.e. i 0~9 andj 0~6 ) as the anchored cell in Table 5. This means that the Age interval, 70-74, and the TP of 2000-04 (j 0~6 ) were chosen as the anchors. Since the indexes, i, j and k are linearly interrelated by formula (3), the anchored BC index was k 0~9 . This index corresponds to the BC group of 1925-29. To perform the second step, we chose the TP effect, adjacent to the anchored TP, i.e. d~b j0{1~b5 . Then, we moved this identification parameter as well as the anchored parameters to the left side of the system (3). For the anchored cell, (i 0~9 ,j 0~6 , k 0~9 ), we set the corresponding APC effects to zero and used these effects as the reference levels.
For the obtained conditional systems of equations (8) with weights (7), we built the corresponding design matrices and checked the rank deficiencies of these matrices by using the Matlab function, rank. We found that these matrices do not have a rank deficiency and their full ranks were equal to 32. We applied the aforementioned numerical procedure for obtaining d opt from the net values (11), when L~0:5 and N~1000.
To determine the optimal value of the identification parameter, d opt , we used our program, inpar, and obtained the values of d opt ,0.14 and d opt ,0.03, for men and women, correspondingly. These optimal values of the identification parameter were used for estimating the APC effects (m Ã , a Ã i , b Ã j , and c Ã k ), as well as the lower (CI lo ) and upper (CI up ) bounds of their 95% confidence intervals for LC in white men and women. For men, the obtained estimates of the intercept,m Ã , and its 95% CI with the lower (CI lo ) and upper (CI up Tables 6, 7, and 8, correspondingly. In these tables, the values of the anchored effects are presented in bold. In Table 5, the values of the identification parameters are presented in bold italic. Birth-cohort, k  Panels 1E and 1F present the obtained trends of the Age effects on LC occurrence in white men and women, correspondingly. These trends increase from Age 30 until Age 70-75 and, then, decrease at older ages. Figure 2 demonstrates the APC effects on LC incidence rates in white men and women, anchored to the Age interval of 70-74, the TP of 2000-04, and the BC of 1930-34. The rates for the anchored Age, TP and BC are presented by open circles. The error bars show the 95% confidence intervals.
Panels A and B of this figure present the trends of the modeled TP-specific incidence rates vs. TP interval indexes,j, of LC in men and women, correspondingly. The estimates of the modeled TPspecific incidence rates, I mÃ .,j,. , and their variances SE 2 were obtained by formulas: For men, the TP-specific incidence rates of LC decreased from 1975 until 2004, while for women these increased from 1975 to 1990 and then remained nearly constant.
Panels C and D of Figure 2 present the trends of the modeled BC-specific incidence rates vs. BC interval indexes,k, for men and women, correspondingly. The estimates of the modeled BCspecific incidence rates, I mÃ .,.,k , and their variances SE 2 were obtained by formulas: I mÃ .,.,k~e xp(m Ã zc Ã k ) k~1,:::,l ð Þ ð 14Þ For both men and women, the BC-specific incidence rates of LC increase from the cohort of 1890-94 until the cohort of 1925-29,     decrease until the cohort of 1950-54 and then remain almost unchanged. Panels E and F of Figure 2 present the cross-sectional Agespecific incidence rates, observed in the 2000-04 time period (dotted lines), and the estimates of the modeled Age-specific incidence rates anchored to the 2000-04 time period and to the 1930-34 birth cohort (solid lines) of LC in white men and women, correspondingly. The estimates of the modeled Age-specific incidence rates, I mÃ i,.,. , and their variances SE 2 were obtained by formulas: The modeled Age-specific incidence rates at the anchored ages are shown by the open circles. The error bars show 95% confidence intervals. As can be seen, the modeled Age-specific incidence rates of LC in men and women have the ''reverse bathtub'' shapes that are increasing with Age, reaching maximum (at the age interval of 75-79) and then fall at older ages. It is important to notice that values of the modeled Age-specific incidence rates and the corresponding values of the observed cross-sectional Age-specific incidence rates are significantly different. This is because the estimates of the modeled Age-specific incidence rates are ''cleanedup'' from the TP and BC effects, while the observed cross sectional Age-specific incidence rates are significantly influenced by these effects. Figure 3 exhibits the results of assessing the validity of using the LLAPC model for determining the APC effects in the LC occurrences in white men and women. Panels 3A and 3B exhibit the probability distribution plot of the standardized residuals, e Ã i,j . The vertical axes present the obtained quintiles of the standardized residuals and the horizontal axes show the corresponding quintiles of the standard normal distribution. For both men and women, the plots are sufficiently straight, except for several points which have very small or large quintiles.
The vertical axes of panels 3C and 3D exhibit the standardized residuals, e Ã i,j , and the horizontal axes exhibit the modeled weighted values,(Y c i,j ) Ã . As seen from Panel 3C for men, all but two values of the standardized residuals, e Ã i,j , fall into the [22,2] interval, while for women, all of these values are distributed within the interval. This indicates that the models of multiple regressions we used are appropriate for presenting the corresponding observational data.
Panels 3E and 3F exhibit the observed weighted values, Y c i,j , on the vertical axes vs. the modeled weighted values, (Y c i,j ) Ã , on the horizontal axes for men and women, correspondingly. For both men and women, the regression function used accurately models the actual observed values. Overall, we can conclude that the LLAPC models used in this work fit the observational data of LC in white men and women.

Discussion
For many decades, the problem of estimating the APC effects on cancer incidence rate data has intrigued researchers. The main difficulty in estimating these effects in the frame of the LLAPC model arises due to the fact that the APC effects are linearly interdependent temporal parameters and their values cannot be uniquely determined. Most of the known approaches for solving this identifiability problem have significant drawbacks and/or their computational implementation is complicated [2,16].
In this work, we developed a new computationally effective approach for solving the identifiability problem in APC analyses. We showed that the solution of this problem can be reduced to a problem of determining one unknown identification parameter, d.We used the effect, b j0{1 , of the TP adjacent to the anchored TP,j 0, as such a parameter. We showed that when the identification parameter is a priori known, the identifiability problem with multiple estimators does not arise and a unique set of estimates of the APC effects can be found.
By using a heuristic assumption that the differences between the BC effects of the adjacent cohorts are close to 0, we showed that the optimal value of the unknown identification parameter can be obtained by minimizing (with respect to d) the weighted average of the squared differences between the adjacent BC effects. In other words, this procedure allows one to determine such a value of the identification parameter, which provides the ''smoothest'' trend within all possible trends of the BC effects. This heuristic assumption is milder than the one utilized in [17], where the use of smooth functions for presenting a temporal variation of the BC effects is required for assessing the APC effects. It should be noted that the aforementioned assumption was successfully used in our previous papers [18,19].
In the present work, we extended the approach [2,4,8,9] that is well-known as the ''equate two effects'' approach, in which all redundant parameters are equated to zero to solve the identifiability problem. Here, we used the LLAPC model with four redundant parameters to be identified. We equated one of the TP effects, one of the BC effects, and the corresponding Age effect to zero and used them as reference levels. We pointed out that by varying the fourth parameter, which we called the identification parameter, all possible solutions of the identifiability problem can be obtained. We proposed a method for obtaining the optimal value of the identification parameter, by which a unique set of the APC effects can be determined and thus the identifiability problem can be overcome.
We tested the proposed approach by estimating the APC effects on LC occurrence in white men and women. For this purpose, we used the Age-specific incidence rate data collected in the SEER 9 database during 1975-2004. By the aforementioned assumption and procedure, we determined the optimal values of the identifiability parameters and the corresponding unique sets of the APC effects on LC occurrence in white men and women.
We determined the modeled Age-specific incidence rates and showed that these rates have the ''reverse bathtub'' shape falling at old ages. This is consistent with several publications (see, for instance, [20][21][22]) suggesting the existence of a plateau, followed by a decline in the Age-specific cancer rates. In those studies, only the observational cross-sectional data were analyzed, while there was no accounting for the APC effects. In the present work, as well as in our previous studies [18,19], we have shown that the curves presenting the modeled Age-specific cancer incidence rates also have the ''reverse bathtub'' shape when the APC effects are taken into consideration. At the present time, the vast majority of the existing Age-specific models of carcinogenesis (see [23][24][25][26] and references therein) are based on the assumption that cancer rates are increasing with age. There are only three models [27][28][29] that describe the ''reverse bathtub'' shape behavior of the Age-specific cancer rates. From these three models, the Weibull-like model [29] appears to have a better biological background.
Our analyses shows that the TP-specific incidence rates of LC in men decreased from 1975 until 2004, while in women, these rates increased from 1975 to 1990 and then remained nearly constant. Our results are consistent with the statement made in [30]: ''…lung cancer incidence rates are declining in men and have leveled off after increasing for many decades in women. The lag in the temporal trend of lung cancer incidence rates in women compared to men reflects the historical difference in cigarette smoking between men and women; cigarette smoking in women peaked about 20 years later than in men.'' Our analysis also indicates that the variations of the BC-specific incidence rates of LC in men and women have similar shapes. This is a new result that was obtained by the approach presented in this work.
Overall, in our opinion, the present work provides the most efficient computational approach for determining the APC effects in the frame of the LLAPC model compared to other currently used approaches. The proposed approach can be used for the APC analysis of different types of cancer and other diseases as well.