Using cell replication data in mathematical modeling in carcinogenesis.

Risk estimation involves the application of quantitative models of dose versus response to carcinogenicity data. Recent advances in biology, computing, and mathematics have led to the application of mathematically complicated, mechanistically based models of carcinogenesis to the estimation of risks. This paper focuses on two aspects of this application, distinguishing between models using available data and the development of new models to keep pace with research developments.


Introduction
Using quantitative models to understand and interpret carcinogenesis is a complex and demanding task. To use models properly requires not only an understanding of mathematics and statistics, but also a fundamental knowledge of the data involved in this modeling and a clear picture of the underlying biology that generated the data. Thus, modeling exercises in this area constitute a true blending of disciplines. In this paper, we address some of the problems involved in modeling carcinogenesis from the point of view of the quantitative researcher, outlining assumptions necessary to exploit the data and the models, and we attempt to explain how these assumptions relate to the underlying biological processes being modeled. In addition, we give some details concerning alternative modeling approaches we are investigating and directions for future research in this area.
With few exceptions, the mathematical modeling of carcinogenesis has concentrated on the use of multistage models. In the mid-20th century, there were sev-eral theoretical discussions of how the biological theory of carcinogenesis might be turned into mathematical models (1)(2)(3). However, the first practical application of this class of models was done by Armitage and Doll (4). The Armitage-Doll (AD) model of carcinogenesis is illustrated in Figure la. The basic concept is that cells progress from a state in which they display normal cellular function into a malignant state via a fixed number of stages that must occur in a particular sequence. For their model to adequately describe cancer data, this progression generally required between four and seven stages. Each stage represents an additional critical mutation in the genome, with the final mutation leading to the formation of a single malignant cell, which progresses rapidly to an observable neoplasm. The model was applied in a variety of contexts, including the description of early and late-stage carcinogens (5), design (6,7) and analysis (8) of carcinogenicity experiments, and the estimation of low-dose cancer risks (9).
Later, several researchers (10)(11)(12) noted one obvious failure in the theoretical basis of the AD model: it failed to take into account the growth kinetics of cell populations. Figure lb illustrates a two-stage model (TS) of carcinogenesis, which allows for the birth and death/differentiation of cells. In this model, cells progress through two stages, normal cells and cells having one critical mutation, before the ultimate mutation that leads to the formation of malignancies. Each of these types of cells undergoes a birth-death process that allows the population of cells in a particular stage to either expand in number through mitosis or to reduce in number through cell death or terminal differentiation. Originally, the TS model was applied to tumor incidence data in the same manner as the AD model (12,13) and proved quite successful in explaining results which the AD model failed to describe adequately. In recent years, due to the onset of newer techniques in toxicology, cell biology, biochemistry, and other fields, we are able to apply the TS model to a much broader range of experimental data (14,15). With few exceptions (16), when applied properly, this model has done an adequate job of describing the experimental results in the observable range.
Even though the TS model seems to provide an adequate description of a large class of data, the advent of new information on carcinogenic mechanisms and the availability of faster computers has led to the creation of more complicated mathematical models of carcinogenesis. Figure lc illustrates one such model (17). This multistage model of carcinogenesis expands the process of cellular mutation into a two-stage process, the first being movement of a cell from the normal state to a damaged state due to DNA damage and the second representing movement from the damaged state to the initiated state via fixation of that DNA damage by cell mitosis. Note that in this model, unlike the TS model, there is also repair of the DNA damage returning the damaged cell to the normal state. The major advantage of this damage/repair multistage model (DR) is that one can characterize many of the rates of the model from biochemical data and use the carcinogenicity data to estimate fewer parameters (18) or, in cases where all parameters are estimated from biochemical data, to check the validity of the model.
Although it is important that mathematical models be used to describe existing data, they can serve a much more useful role by creating a framework in which one can ask questions about alternative carcinogenic mechanisms and define experiments that allow us to answer these questions. This paper addresses two separate a. Armitage issues. The first concerns our ability to distinguish between the existing models of carcinogenesis, applying the data commonly available to the mathematical modeler. This is intended to provide some insight into the difficult issue of distinguishing between curve-fitting and mechanistic modeling. The second issue concerns the development of new models of carcinogenesis as a consequence of data suggesting modifications to the existing theory. One modification concerns the presence or absence of stem cells in each stage of the carcinogenic process. Another modification addresses the potential for all cells in a given stage of the carcinogenic process to progress to a higher stage These two issues are fundamental to our understanding of the carcinogenic process and form the basis for the mathematical models that are currently applied to carcinogenicity data. Alternatives to the default mechanisms used in this context are discussed.

Mechanistic Modeling of Carcinogenesis
To understand the difference between true mechanistic modeling and curve fitting, we must begin by studying the levels of information available for these exercises. Table 1 illustrates the types of information available to the researcher in attempting to model carcinogenesis. In Table 1 the available data relating to carcinogenesis are broken into four broad categories: biochemical data, cellular data, tissue-specific data, and data on the whole organisms. Under each category, examples of the types of data one might collect are given. This list is not intended to be exhaustive but to simply illustrate how one might view the available information.
True mechanistic modeling of carcinogenesis would involve the application of information from the lowest level (in this case biochemical data) to predict results at a higher level. For example, in true mechanistic modeling of carcinogenesis, we would use cellular data and biochemical data to characterize a carcinogenic response and only use tissue-specific cancer data to verify that the model predictions are correct. This has been described by numerous researchers as a bottomup process of modeling carcinogenesis. Curve fitting, on the other hand, generally involves information obtained at the level for which inference is desired. In this case, you would have some model that describes an end point, say, cancer, and you would use direct information on that end point to estimate the parameters in this model. Let us consider an example. Suppose you want to use a TS model of carcinogenesis to explain the carcinogenic effect of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) in the liver of female rats. True mechanistic modeling of this effect would involve the creation of a model for the distribution of TCDD to the tissues in the rat, the binding of TCDD to the Ah receptor in the rat, gene expression for the TCDD-receptor complex, and of the relationship these end points with the mutation rates and birth/death rates in the TS model of carcinogenesis. The model would have to be built in essence from first principals, ignoring the available tumor incidence data and focusing only on the biochemical and cellular data. After all of these pieces of the model have been put together, tumor incidence could be predicted and compared with what was obtained from whole animal experiments and human experience. If the model agrees with the data, it can be used as a working hypothesis of how TCDD leads to cancer. If it fails, one would attempt to understand why (at the cellular and biochemical level) and seek information within the context of the model to correct the failure.
Alternatively, one could directly fit the TS model to the carcinogenicity data or mortality data available for TCDD. In this case, biological confidence in the model is reduced. This curve-fitting approach is useful for obtaining working estimates of the mutation rates and birth/death rates that might be studied then at the cellular or biochemical levels.
In general, neither of these two procedures are used exclusively in modeling carcinogenesis. Instead, a combination approach is generally used in which some parameters in the model are obtained from cellular and biochemical experiments, and others are obtained from the carcinogenicity data. A good example of this approach is given by .
This combined approach to modeling carcinogenesis leads to the question of how accurately one can characterize an assumed mechanism with the available data. There are many statistical and biological issues that have not been addressed in this context. We are endeavoring to address some of the statistical issues but currently have few results. However, we have addressed a similar problem by studying the degree to which tumor incidence data can be used to characterize carcinogenic mechanisms. To illustrate the problems involved in this field, we will discuss some of our findings concerning the estimation of model parameters and the characterization of the best model when using only tumor incidence data.

Statistical Issues in Model Discrimination
Statistical methods are routinely used to test for differences between various alternatives. For example, the simple t-test is used to test for the equality of the means in two treatment groups. Similar techniques can be used to assess the adequacy of one model for explaining a set of results as compared to an alternative model.  considered the possibility of distinguishing between the AD model and the TS model ( Fig. 1) using only information on tumor incidence, which is routinely available to modelers.
We first looked at the fit of the two models to the historical control data from recent National Toxicology Program (NTP) studies of carcinogenicity (16). Table 2 illustrates the results from this analysis. Table 2 presents the results from fitting the two models to seven tumor sites in male and female B6C3F1 mice. The column labeled "TS" indicates whether the two-stage model fit the data (p = 0.10), with a + indicating an adequate fit andindicating lack of fit. Similarly, the AD column indicates the degree of fit for the Armitage-Doll model. It is clear from this table that when one model fits the data, the other also fits. Similarly, when one model fails to fit the data, so does the other. Thus, using goodness of fit as a measure of model correctness, we are unable to differentiate between these two models on the basis of tumor incidence data. This is partially due to the magnitude of the variability one sees in the tumor incidence data and partially due to the flexibility of these models when applied to these data. As a separate issue, we also considered how many stages in the AD model were in agreement with the observed data. In Table 2, the column labeled "stages" shows that, for most of the data sets, multiple numbers of stages agreed with the data including cases where a model with two stages and a model with eight stages both agreed with the same data (circulatory system tumors and lung carcinomas in male mice).
The data used in this analysis, were, in some sense, the best data available because there were over 2000 animals in each data set with a considerable amount of information on age-specific tumor incidence. To study this issue for more realistic-size data sets, a simulation study was performed in which data were generated from one model, and the ability of the other model to provide a better fit to the data was assessed. In this case, experimental situations with fewer animals were used. The results of this analysis are summarized in Table 3. The exact cases that were studied are given in more detail in Kopp-Schneider and Portier (19). The point to note is that unless the true underlying model is an AD model with many stages, our chances of discriminating between these two models are dismal. Dose effects are just as difficult to assess. In recent research (20,21), we considered the possibility of distinguishing between treatment effects on mutation rates and treatment effects on birth rates within the context of the TS model. We concluded that, with the usual design of the long-term animal carcinogenicity experiment using chronic dosing, it is unlikely that a unique effect on either of these two rates could be clearly  (20,21). It was also shown that modifying that design to include start-stop dosing would help somewhat, but it still did not dramatically improve our ability to properly characterize treatment effects. What may prove to be the best way to determine treatment effects on birth, death, and mutation rates is to use data that relates directly to these end points. However, similar problems may also hold for the application of models to biochemical data and cellular data. Portier et al. (22) have demonstrated this fact with two simple receptor models for the effects of TCDD on cellular proteins. Similar problems arise with the use of physiologically based pharmacokinetic models (23). It is likely that this will also occur with cellular data because labeling index is a widely varying measure. Thus, it may be difficult to place any degree of confidence on parameters derived from these data (14), especially with regard to initiation effects versus promotional effects. This is an area that still requires considerable research.
In short, the toxicological and biological data routinely collected for use in modeling multistage carcinogenesis leave large gaps in our ability to distinguish between models. This does not imply that this information should not be used. On the contrary, we wholeheartedly support the application of these data in modeling carcinogenesis. However, extreme caution should be used in interpreting the resulting model parameters and in placing confidence in any predictions from these models.

Carcinogenesis
Multistage models of carcinogenesis have become the de facto standard for modeling carcinogenesis. Within this class of models, the TS model is perhaps the most popular among researchers, and the AD model is the most widely used. However, often the biologist and the statistician/mathematician are using similar terminology but discussing quite different processes. In this section, we carefully outline some of the assumptions that typically go with the use of the TS model and discuss alternative models that may have a more realistic basis in the underlying biology. In addition, we discuss the role of these alternative models in studying carcinogenic mechanisms.
The first issue to be considered is the meaning of stages in multistage carcinogenesis. In mathematical parlance, stages constitute a sequence of events that must occur in a prescribed order for the carcinogenic event to be observed. Thus, in the TS model (Fig. lb), malignant cells can only arise from initiated cells; there is only a single pathway to carcinogenesis. However, it is unlikely that carcinogenesis arises in such a simplistic fashion.
In contrast to multistage carcinogenesis, there is the forgotten class of multihit models. Hit theories of carcinogenesis hypothesize that a normal cell must be damaged a multiple number of times before it results in a malignant cell. There is no implied order to these hits, thus it is the accumulation of damage that differentiates this theory from multistage theory. Hit theories have been developed but were generally not accepted due to the perceived simplistic nature and lack of biological plausibility relative to the TS model.
The fundamental difference between the TS model and a two-hit model is whether a malignant cell is generated by a particular sequence of mutations or by cumulative mutations, where the sequence of mutational events is inconsequential. Thus, the TS model represents only one possible path to malignancy, whereas the two-hit model represents all possible paths to malignancy consisting of two mutational events. By restricting the order of events, the two-hit model results in the TS model. An alternative concept is the fusion of the two theories into a single framework. We refer to this concept as multipathway/multistage theory. Biological evidence for the multipathway/multistage theory is supported by current cancer research in the area of oncogenes and suppressor genes. Oncogenes are thought to be genes whose activation accelerates cell growth. Suppressor genes are thought to work in the opposite manner; they are genes whose deactivation removes some restrictions on the mechanism that regulates cell proliferation. Thus, if oncogenes are activated and suppressor genes deactivated, the net result is believed to be a cell, and eventually a colony of cells, with little or no growth control.
Experimental evidence suggests that at least two activated oncogenes are necessary for eventual tumor development (24,25). Oncogene activation may be the result of chromosome translocation, gene amplification, point mutation, or deletions/alterations in the chromatin. These activated oncogenes can be classified into two groups: myc-type oncogenes and ras-type oncogenes. These two groups of oncogenes complement each other, resulting in synergistic activity. This enhanced growth activity is one possible component in the production of a malignant cell (26,27).
The number of genetic events necessary to deactivate a tumor-suppressor gene has been conservatively estimated as two, but may be as many as three or four depending on the type of cancer (28,29). These lesions may be the result of large-scale deletions or rearrangements of segments of the chromosomes, inactivation of gene products, or point mutations. Once oncogenes are activated and tumor-suppressor genes deactivated, uncontrolled cell growth generally ensues.
Fearon and Vogelstein (30) state that multiple, distinct mutations may have to accumulate before a cell becomes malignant. These genetic changes may occur in a specific order; however, they indicate the accumulation of changes is most important for malignancy to occur. This is believed to correspond to the genetic changes necessary to alter the expression of oncogenes and tumor-suppressor genes. A clear example of the multipathway/multistage theory of carcinogenesis is l(1h) illustrated by the work of Vogelstein and others on the molecular biology of colon cancer (e.g., 30). Careful characterization of human tumors has shown a mixed pattern of genetic changes, which is more readily described by a combination of multiple pathways than by a strictly multistage process.
The mathematical modeling of carcinogenesis should proceed in the direction of incorporating hits and stages within a unified theory. There has been some development along these lines, but it has been sparse. Portier (20) examined the effect of multiple two-stage paths to carcinogenesis, and Tan (21) partially expanded this to a general theory. Much research needs to be done before it can be routinely used.
To illustrate what might be meant by a multipath/ multistage model, consider the model in Figure 2. This is a simple two-stage model of carcinogenesis with a single path added to the first stage. In this model, a single mutation with rate jl, transforms a cell into an initiated cell. The rate ,ut depends on the rate of replication, 0, of the population in which the mutation is occurring. There are two populations from which initiated cells are drawn: the population of normal cells [so that the rate from normal cells to initiated cells is g1(00)] and another population of hit cells [so the rate of mutation from hit cells to initiated cells is g9(03)].
These hit cells are the result of a mutation of the normal cells, which occurs at rate po. They are not necessary to the carcinogenic process but can contribute to it by enhancing the chances of generating an initiated cell. As in the TS model described earlier, a single mutation in the initiated cells results in a malignant cell.
It is possible to approximate a hit/stage model with the simpler multistage model by allowing for timedependent rates and other modifications. We would not support this approach because the approximation is likely to be inadequate and could lead to false inferences from the model. This has been observed and discussed for other multistage models in the context of tumor incidence modeling (32,33,34), and it is unlikely that approximations for this class of models will be any different. Furthermore, it would be far better to match the mathematical development to the biology. This not only limits confusion between the biologists and the mathematicians, it also allows us to directly exploit biological data on carcinogenic mechanisms (e.g., DNA damage rates, DNA repair rates, patterns of oncogene expression). In the long run, this will strengthen inferences drawn from the model and result in greater confidence in the utility of these models for risk assessment.
Another potential problem with the current collection of multistage models of carcinogenesis concerns the mathematics involved in the process by which births and deaths are modeled. The process currently used is a standard birth-death process, which is common in the area of probability theory (35). As defined mathematically, this process is questionable as a mechanistic descriptor for a biological system, especially for the growth kinetics of initiated cells. Its shortcomings are described below.
A stochastic birth-death process works in the following fashion. Cells act independently of all other cells in the system (this is referred to as a "linear" process). At random times, based on an exponential distribution, events occur in these cells. For a simple birth-death process, these events include the division of one cell into two cells (birth) and the removal of one cell from the system (death). Once the time of a cellular event has been determined, deciding on which event occurs is similar to flipping a weighted coin where the probability of heads (birth) is the birth rate divided by the sum of the birth rate and the death rate. The probability of tails (death) is 1 minus the probability of heads.
The first major assumption of concern in these processes is that cells act independently of each other. It is a well-established fact that cells communicate and that there is a homeostatic feedback system that controls the size of a given population of cells. In general, if the number of cells in a population is large, the linear birth-death process is adequate for describing the kinetics of the system of cells without concern for a dependence between the cells. However, for smaller populations of cells, as is possible with initiated cell populations (e.g., premalignant focal lesions in the liver, papillomas in the skin), cellular dependencies may play a major role in the observed growth kinetics. This will be especially true when modeling potential carcinogenic effects due to regenerative hyperplasia, a mechanism that has been proposed for some compounds. The current models of carcinogenesis may not be adequate for addressing these issues. Time-dependent changes in the rates for multistage models have been used in this context, but it is not clear how well this approximation addresses this interdependence between cells. There have been some attempts at developing the mathematical theory necessary to use non-linear models, but this has proven to be, as yet, untenable. The advent of fast, inexpensive computers has provided some hope for numerical solutions to allow for the utility of these models, but, as illustrated in Portier and , an order-of-magnitude increase in com-puting speed is still necessary to make this approach truly practical for modeling carcinogenesis.
Another concern with the birth-death processes used in multistage models is the lack of true stem cells in the model. A true stem cell would have different death characteristics than other cells in the tissue in that it would remain for a very long time. In addition, most of the other cells in the population would be derived from this stem cell. In addition to stem cells, there are also likely to be size restrictions on the population of initiated cells where the birth rate and death rate depend on the size of the clone and limitations on the number of divisions the progeny of a stem cell are likely to undergo. These processes, again, differ drastically from the simple linear birth-death process used in multistage models.
Another issue relates to the importance of stem cells in modeling carcinogenesis. More specifically, the issue is determining which cells in the premalignant clone should be considered as initiated cells able to proceed to malignancy. Moolgavkar et al. (14) and  analyzed data on the clonal expansion of premalignant cells in the liver and skin, respectively, but do not relate their model results to tumor rates. To relate the findings for colonies of premalignant cells to the incidence of malignancy, one must make an assumption concerning the proportion of cells in the clone of pre-malignant cells that may go on the malignancy. The usual two-stage model (Fig lb) allows all cells in the premalignant clones to act as initiated cells and to go on to malignancy. This is the model that was used by Moolgavkar et al. Kopp-Schneider and Portier modified this model to estimate the number of actively dividing (nonterminally differentiated) cells in the papillomas they were studying because many of the cells in these papillomas are terminally differentiated and can no longer go on to malignancy. Some proportion of these actively dividing cells could constitute the initiated cell population with respect to carcinogenesis. One alternative would be a stem-cell-based two-stage model in which only a small percentage of the cells in the initiated cell population (as few as one) can go on to malignancy. When using information on the size and number of premalignant lesions in attempting to model carcinogenicity, this issue must be handled cautiously. Biologists need to expand research in this area to improve our knowledge of these processes. (As an aside, it is important to note that the stem cell has different meanings in biology and modeling. In the context of the discussion above, the biological definition is intended.) Improved handling of stem cells is one area where mathematical modeling could improve the science of carcinogenesis. A careful analysis should lead to experimental designs in which the result of the study will clearly differentiate between a model in which all cells go on to malignancy and one in which only a few cells go on to malignancy. One such design is an initiationpromotion study with two promotion phases separated by a period of no exposure. By waiting long enough between promotion phases, we should be able to get some handle on the presence or absence of stem cells in the tissue and on the number of initiated cells important to the carcinogenic process. Research on this issue is currently underway.

Discussion
This paper has addressed two specific issues in the application of mechanistic models to understanding and evaluating carcinogenic risks. The first issue is the ability of the data currently being collected to provide us with meaningful estimates of the parameters in these models. There is little ability to discriminate between alternative models and mechanisms with the data that are currently available. A concerted effort is needed to find novel designs to address the application of mechanistic models of carcinogenesis. This effort must be truly interdisciplinary, including mathematicians, statisticians, and biologists working on all aspects of the carcinogenicity issue. This is especially important in discriminating between chemically induced changes in cell proliferation and chemically induced changes in mutation rates, in determining whether chemically induced mutations are necessary (stages) to the carcinogenic process or simply augment carcinogenesis (hits), and in determining the size and properties of the critical population of initiated cells (stem cells versus differentiated cells).
The second issue is basically a mathematical one: the development and application of more realistic models relating to the proliferation of cells. The current models do not truly characterize the biology as it is observed and understood. This could lead to bias in our interpretation of the analytical results and incorrect estimates of carcinogenic risks derived from the models. There needs to be additional research into the development of more realistic models and additional research into the limitations, if any, of current models for dealing with more complex carcinogenicity data.
One final issue should be noted. In an interdisciplinary field such as mechanistic modeling of carcinogenesis, it is vitally important that the model being used be described exactly and that all parties involved in applying the model have a working knowledge of its limitations and assumptions. Thus, biologists will have to spend a bit of time trying to understand the mathematical assumptions and properties of the models and mathematicians/statisticians will have to get better acquainted with the data they are addressing. Only in this way can the field move forward and true mechanistic modeling of carcinogenesis be achieved.