The road to embryologically based dose-response models.

The goal of researchers working in the area of developmental toxicology is to prevent adverse reproductive outcomes (early pregnancy loss, birth defects, reduced birth weight, and altered functional development) in humans due to exposures to environmental contaminants, therapeutic drugs, and other factors. To best achieve that goal, it is important that relevant information be gathered and assimilated in the risk assessment process. One of the major challenges of improved risk assessment is to better use all pertinent biological and mechanistic information. This may be done qualitatively (e.g., demonstrating that the experimental model is not appropriate for extrapolation purposes); semiquantitatively (using information to reduce the degree of uncertainty present under default extrapolation procedures), or quantitatively (formally describing the relationships between exposure and adverse outcome in mathematical forms, including components that directly reflect individual steps in the overall progression of toxicity). In this paper we review the recent advances in the risk assessment process for developmental toxicants and hypothesize on future directions that may revolutionize our thinking in this area. The road to these changes sometimes appears to be a well-mapped course on a relatively smooth surface; at other times the path is bumpy and obscure, while at still other times it is only a wish in the eye of the engineer to cross an uncharted and rugged environment.

During the last 5 years, significant changes in the risk assessment process for noncancer health effects of environmental contaminants have begun to appear. The first of these changes is the development and use of statistically based dose-response models (the benchmark dose approach) that better utilize data derived from existing testing approaches. Accompanying this change is the greater emphasis on understanding and using mechanistic information to yield more accurate, reliable, and less uncertain risk assessments. The next stage in the evolution of risk assessment will be the use of biologically based dose-response (BBDR) models to build factors related to the underlying kinetic, biochemical, or physiological processes, which may be perturbed by a toxicant, into the statistically based models. Such models are now emerging from several research laboratories. The introduction of quantitative models and the incorporation of biological information into them has pointed to the need for even more sophisticated modifications, which we term embryologically based dose-response (EBDR) models. Because these models are based upon the understanding of normal morphogenesis, they represent a quantum leap in our thinking, but their complexity presents daunting challenges both to the developmental biologist and the developmental toxicologist. However, the remarkable progress in the understanding of mammalian embryonic development at the molecular level that has occurred over the last decade should eventually enable these as yet hypothetical models to be brought into use.
A firm understanding of the mechanisms of normal development is required to adequately characterize mechanisms of abnormal development. Indeed, the paucity of complete descriptions of mechanisms of chemically induced dysmorphogenesis is in large part based on our poor understanding of normal developmental processes. For example, without an understanding of the forces that control the outgrowth and differentiation of the limb bud, how can we understand the formation of limb deformities? Advances in understanding morphogenesis on the molecular and biochemical level, for the first time, are providing the knowledge base necessary for developmental toxicologists to truly understand the mechanisms by which chemicals disrupt embryogenesis. Clearly, we are still far from having models of normal morphogenesis commonly available in the toolbox of the developmental toxicologist and risk assessor, but one day we may witness a revolutionary change not only in how we evaluate developmental toxicity in animal models but also in how toxicity is extrapolated to the human population.

Basic Elements of Dose-Response Assessment
For the purposes of this review, doseresponse assessment can be viewed as three critical steps: identification of the effect (and related exposure level) of most concern; a characterization of the uncertainty present in the database; and an estimate of the exposure level presumed to be free of risk to the human conceptus. In the first step, data from exposed experimental species, as well as any epidemiological information, is examined for the highest dose level that is without a significant adverse effect. This dose level is established by a combination of statistical analysis and expert opinion and is generally referred to as the NOAEL (the no observed adverse effect level). It is important not to confuse the concept of NOAEL with that of threshold for biological effects, as the former merely reflects the statistical power of an experiment to see an effect when in fact one does exist. Various regulatory agencies have provided guidelines for the design, conduct, and interpretation of such hazard identification studies for developmental toxicity (1). The lowest NOAEL in the database on a particular chemical is termed the critical NOAEL. In the second step, the adequacy, relevance, and uncertainties in extrapolating the NOAEL from the experimental species to the target species are estimated. Minimally, these extrapolations consider the sensitivity of the human, relative to test species, and the presence of 'potentially sensitive subpopulations. In the default situation, uncertainty factors of 10 are used to cover both sources of uncertainty. Other uncertainty or modifying factors may be applied to account for a lack of identification of a NOAEL, an incomplete database, or an expert opinion regarding the probability of risk. In the final step, the critical NOAEL is divided by the product of the uncertainty factors, as well as any expertderived modifying factors (MF) to obtain the reference dose [RfD] (or reference concentration (RfC) for an inhaled chemical). RfD NOAEL critical effect UF inrspeciesXUFintapecies XMF Lifetime exposures below the RfD or RfC are believed to be without appreciable risk to humans (1)(2)(3)(4).

Qualitative and Semiquantitative Approaches
Perhaps the easiest and most straightforward approach to incorporating mechanisms into dose-response assessment occurs when it can be demonstrated that the animal model yields results that are not extrapolatable (a process that can be called qualitative nonextrapolation). While this may be pertinent only on rare occasions, the impact is always profound. Excellent examples of this have been derived by research into the mechanisms of carcinogenesis, where we now understand that saccharin-induced bladder tumors in rodents and t2-microglobulin-induced kidney tumors in male rats have no homologous counterpart in humans. Likewise, if a metabolite is found to be the proximate toxicant and that metabolite is not formed in humans, then the subsequent effect would not be expected to occur either. In developmental toxicity, examples include the effects of Gram-negative antibiotics in rabbits in which the marked effect of the chemical class on the gut microflora causes a nutritional deficiency that secondarily produces effects on the offspring (5) and the effect of diflunisal-induced axial skeletal defects in rabbits that are due to a speciesspecific maternal hemolytic anemia (6).
Another approach to incorporating biological understanding into the doseresponse and risk assessment activities is to carefully examine the quality, consistency, and adequacy of the database in light of the default assumptions regarding the presumed sensitivity of humans relative to the experimental species and to the existence of susceptible subpopulations. If the database provides sufficient evidence, it is then possible to reduce the magnitude of the uncertainty factors to reflect the level of understanding of interspecies and intraspecies differences. Such an approach is exemplified by the Institute for Evaluating Health Risks Evaluative Process (4) in the assessment of the reproductive and development effects of lithium (7). The human and experimental evidence were judged sufficient to indicate that lithium causes developmental toxicity in the therapeutic range, but it fell short of indicating what the presumed safe level of exposure would be; extrapolation therefore was necessary. An expert review committee concluded that the uncertainty factor of 10 for interspecies differences could be reduced to 1005 on the basis of the facts that a) Li+2 is the active toxicant; b) there was a linear relationship between lithium exposure and plasma levels in both humans and experimental animals; c) adverse effects appear to occur at similar lithium levels in humans and animals; and d) the systemic target organs for lithium toxicity are similar in humans and animals. In addition, the intraspecies uncertainty factor was also reduced to 100°5 on the basis that the use of serum concentrations as a measure of delivered dose minimizes interindividual differences in absorption, hence accounting for some of the differences within a population. The aggregate uncertainty factor used was therefore 100.5 times 100°5, or 10. This effort is one of the first coordinated attempts to bring independent experts together for the specific purpose of providing the best scientific determination of risk of adverse reproductive outcomes (similar in nature to the effort of the International Agency for Research on Cancer for carcinogenesis); this effort also demonstrates the types of decisions that such an informed group can make regarding the magnitude of the uncertainties present in a typical example.

The Benchmark Dose Approach
Another avenue to improve the doseresponse component of the risk assessment process is to better use data generated from standardized testing procedures, independent of knowledge of toxicokinetic or toxicodynamic factors that may be used to adjust the magnitude of the uncertainty factors. Reliance on the NOAEL as the entrance point into the extrapolation process for noncancer effects has been the subject of much criticism (1). The most significant criticism has arisen from the fact that the procedure to obtain the NOAEL fails to encourage better experimental design. In fact, it even works actively to discourage such efforts: experiments with more dose groups and more subjects per group can only result in lower NOAELs because more statistical power is focused on between-dose-group comparisons. Thus, chemical manufacturers are effectively discouraged from submitting better toxicological data than the minimum required by regulatory agencies. Other criticisms include the need to repeat experiments that fail to demonstrate a NOAEL; the approach ignores the shape and variability of the dose-response curve; and NOAELs can represent considerable (and inconsistent) risk levels (8,9).
Many of these criticisms have been addressed by application of statistically based dose-response models in the benchmark dose (BMD) approach (10). In the BMD approach, a particular effect level is chosen and the dose inducing that response is calculated using a statistical model ( Figure 1). The BMD is then defined as the lower 95% confidence interval on that dose level. In principle, the response level is chosen near the low end of the observable range so no extrapolation is necessary (11). In this idealized graph, the experimental data points are depicted by the symbols, the smooth line is the model fit to the data, and the dashed line is the lower confidence limit on dose for a given response level. The BME is shown here as an extra 5% risk, and the BMD is the dose that corresponds to the intersection of the BME and the confidence interval. data from all experimental doses into use, and the use of the lower confidence interval on the dose estimate for a particular risk allows the experimental variability to enter into the output. BMDs from different end points or different studies would therefore be based on more similar response levels than occurs with the NOAEL.
To evaluate the utility of the BMD to standard developmental toxicity test data, a database of 246 studies was analyzed (12)(13)(14)(15). These studies used two doseresponse models applicable to any toxicological end point (the quantal Weibull model and the continuous power model), as well as three models (termed the RVR, NCTR, and LOG models) that incorporate aspects specific to developmental toxicity data (e.g., litter-size effects and intralitter correlations). These studies also examined quantal (Q) end points (the presence or absence of litters with at least one dead or malformed implant) and continuous (C) end points (the mean litter incidence of affected implants and fetal weight). BMDs for various added risk levels (1, 5, and 10%) were estimated from a variety of model formulations (e.g., the presence of a threshold or litter-size parameter). For incidence data, a total of 1,825 end point-specific BMDs and corresponding NOAELs were determined. For fetal weight, comparisons were based on a subset of studies for which individual fetal weight data were available, and only the continuous power model and the LOG model (with the litter size but without the threshold parameter) were used. To calculate a benchmark dose for reductions in fetal weight, it was first necessary to define what level of effect should be used in the assessment. Therefore, in a preliminary analysis, 18 different definitions of reduced fetal weight were considered in establishing the benchmark effect (BME) level that was similar in magnitude, on average, to traditionally determined NOAELs. Six BMEs for reduced fetal weight were used in the full analysis. These included reductions in the mean fetal weight by 5%, 0.5 standard deviations (SD), or 2 standard errors of the mean (SEM); a reduction in mean fetal weight to the 25th percentile of the control mean; and a 5 or 10% increase in incidence of fetuses weighing less than the 5th or 10th percentile, respectively, of the control litter mean. The NOAEL for reduced fetal weight was less than the highest experimental dose level in 85 of the 173 studies in this subset. X2 tests were used to assess goodness of fit while the magnitude of the log-likelihood estimates were used to compare the influence of optional model parameters. BMDs were then compared with traditionally determined NOAELs.
In the database, QNOAELs (the NOAEL for an end point based upon whether a litter contained an affected implant) were similar in magnitude to CNOAELs (the NOAEL for an end point based upon the mean incidence of affected implants within litters). Both generic and developmental-specific models provided acceptable fits to the data from these standard developmental toxicity bioassays. For the generic models, goodness of fit tests were rejected in less than 4% of all analyses. A very low frequency of nonconvergence (4/1,825) occurred, which seemed to coincide with dose-response patterns in which the response at low doses was higher than that at higher dose levels. For the developmental-specific models, incorporation of the litter size but not the threshold parameter marginally improved model fit, and the LOG model was slightly better than the RVR and NCTR models in terms of model fit (probably the result of its more flexible handling of the litter-size parameter).
In nearly every comparison, the median ratios of benchmarks to NOAELs were closer to unity than the means, suggesting the presence of non-normal distributions. In comparisons based upon the quantal approach, the best match on average between NOAELs and the BMDS was found for a QMBDLO (a 10% added risk from the quantal Weibull model). The median benchmark-to-NOAEL ratio at this risk level was 0.5, and 88% of the BMDs were within a factor of 5 of the NOAEL. When comparisons were based upon more continuous measures of response (the mean litter incidence), the best matches were found between the NOAEL and a CBMD05 (a 5% added risk from the continuous power model), and a BMD05 from any of the three developmental-specific models. For example, the median CBMDto-NOAEL ratio was 1.04, and 95% of the benchmarks were within a factor of 5 of the NOAEL; and only 9/486 (1.85%) of the comparisons differed by a factor of 10. All six operational definitions of reduced fetal weight listed above provided BMDs that were similar, on average to the NOAELs. The median benchmark to NOAEL ratios ranged from a low of 0.9 for a BME of a 2 SEM reduction in the average litter mean to a high of 1.24 for a 5% reduction in the average litter mean. In only 9/76 comparisons (11.8%) did any of the BMDs from the six definitions of a BME for reduced fetal weight differ from the NOAEL by more than a factor of 4. The largest such difference was 18-fold. Two aspects, often in combination, generally contributed to the differences of unusual magnitudes between the BMD05 for the reduced fetal weight BMEs and the corresponding NOAELs. The first of these was the use of a study design with wide dose spacing and the second was the presence of a shallow dose-response pattern. In the former instance, a very wide interval between the NOAEL and LOAEL dose would tend to produce NOAELs that might be considered to be artificially low (recall that unlike the benchmark dose, the NOAEL is constrained to be one of the experimental doses). In the latter instance, the shallow slope can make determination of the NOAEL more arbitrary and unstable. Combined, these two aspects can therefore be expected to create greater than normal heterogeneity in the BMD-to-NOAEL ratio. Therefore, close examination of the minority of studies that yielded divergent BMD-to-NOAEL ratios demonstrates a key advantage of the benchmark dose approach. If the BMD always gives us the same information as the NOAEL, there is little compelling reason to adopt a new system. Starting the extrapolation process from a point of consistently determined and comparable risk levels avoids some potentially misleading comparisons of the relative risk of two chemicals.
These comparisons demonstrate the feasibility of applying the benchmark dose methodology to developmental toxicity bioassays and provide convincing evidence Environmental Health Perspectives * Vol 104, Supplement * March 1996 of the risk level (5% on average) associated with traditionally derived NOAELs based upon continuous measures of response. The analysis also points to the conservative nature of dose-response models based upon quantal end points that reduce the data to whether a litter contained at least one affected implant (recall that the QBMD that best matched the QNOAEL was based upon an added risk of 10%, and even in that case the estimate was still conservative relative to the NOAEL). At the 10% risk level, QBMDs were two to three times smaller than the CBMDs, a difference attributable to both lower maximum likelihood estimates (MLE) of the dose corresponding to this risk level and to wider confidence intervals. For example, the median ratio for the CMLE05/CBMD05 was 1.6 with an upper 99th percentile of 4.0, whereas the median ratio of the QMLE05/QBMD05 was 3.3 with an upper 99th percentile of 15.6. In only 13 of 542 end points with significant quantal and continuous trends, the QBMDo5 was greater than the CBMD05. Such comparisons may have important implications for implementing the BMD approach for other noncancer end points in which data are more like those of the quantal end points that are based upon affected litters and point to the advantages of using individual implant data for developmental toxicity analyses. In any event, there were no major differences in performance or fitness between the generic, continuous, or developmentally specific benchmark models for any of the end point comparisons, suggesting that the choice of model is up to the user, provided that it adequately describes the data. Another advantage of the BMD approach is that it provides a stimulus for considering other dose-response methodologies in the area of noncancer health effects, a situation that appeared extremely remote only a few years ago.

Second Generation Models
The discussion to this point has been limited to assessing each end point of developmental toxicity individually. Several groups are now working on models that are capable of providing risk estimates for multiple adverse outcomes. End points of developmental toxicity nominally recorded in bioassays include the viability of an implant and the morphological status and growth of surviving implants. Emerging models can account for correlations between outcomes and can represent the overall probability of yielding a normal birth outcome. For example, Catalano et al. (16) presented a model of the probability of abnormality, which is defined as the probability that an offspring is either dead, malformed, or of low fetal weight. The model can be simply expressed as where P(d) is the overall probability of being normal at dose d, PI is the probability of death or resorption, and P2 is the probability of malformation or low weight conditional on survival. The models were fit using generalized estimating equations (GEE), which are computationally simpler than maximum likelihood methods and also have relaxed distributional assumptions. The probability of fetal death or resorption was initially modeled as a function of dose using a probit model with a power parameter. Next, outcomes among live fetuses were modeled using a twostage regression approach. The first stage regresses fetal weight as a function of dose with litter size as a covariate and allows for a correlation parameter to characterize the litter effect. Then, the procedure calculates the individual and average litter residuals from the fetal weight model. A cutoff value of 3 SD units beneath the control mean was considered abnormal. Next, a probit model was used to quantify the probability of malformation, with covariates for dose, individual and average weight residuals, and litter size. Finally, all three models were linked to obtain an overall risk of adverse outcome. An important model assumption was that malformation and weight are independent after conditioning on litter size. The multinomial approach should produce more conservative estimates of adverse outcome as a result of its increased power and sensitivity to detect effects among strongly correlated outcomes. Thus, in the analysis of the effects of diethylene glycol dimethyl ether (DYME) on mouse development, the individual estimates of the BMD05 for death, malformation, and weight were 152.4, 141.7, and 149.8 mg/kg, respectively, while the full multinomial yielded a combined BMD05 of 99.4 mg/kg. Similarly, Zhu et al. (17) examined an extended Dirichlet-multinomial covariance function to estimate jointly the regression parameters in Weibull dose-response models for both embryolethality and fetal malformations as applied to the large-scale study conducted by the National Center for Toxiocological Research on the developmental toxicity of 2,4,5-trichlorophenoxyacetic acid (2,4,5-T) in mice. Here, the fetal malformation rate was determined conditionally on both the number of implants and the number of viable fetuses. Using GEE to estimate the model parameters, the doses associated with 5% increased risk of response (ED05) were: 51.93 mg/kg for cleft palate, 55.55 mg/kg for prenatal death, and 46.51 mg/kg for what they termed overall toxicity. Krewski and Zhu (18) later extended the comparison of binomial and trinomial models to 11 datasets and found that when both end points were affected by dose, the ED05s were always lower for overall toxicity and the standard errors of the estimate tended to be smaller. When only one end point was affected by dose, the ED05 for overall toxicity approximated that for the affected end point.
As seen in both examples of multinomial approaches to developmental toxicity, there is a gain in sensitivity (conservatism) in estimating the joint probability of response when multiple end points are affected by exposure. As many of the measured end points are intercorrelated, and perhaps even related biologically, there seems to be considerable logic in models that are capable of estimating the overall risk of adverse outcome; there are also computational advantages of these models. However, they have not been extensively evaluated in large numbers of datasets, and the gains in precision generally appear to be relatively small.

Study Design Implications of the BMD Approach
The practical consideration of identifying the highest experimental group that does not differ significantly from the control group has led to study designs for developmental toxicology that generally consist of four dose groups (one control and three treated) of 20 litters each. The high dose is usually targeted to induce mild maternal toxicity, with lower doses set either by progressively halving the higher doses or by other factors, such as the desire to ensure that no adverse maternal or developmental effects are observed at the lowest experimental dose. Given the sample sizes and background response rates, these designs are generally capable of detecting a 3to 6-fold increase in embryonic death, a 5to 12-fold increase in malformations, and a 15 to 20% decrease in mean fetal weight (1). With the emergence of the BMD approach for dose-response assessment, the premium on identification of the NOAEL is likely to diminish in favor of designs that Environmental Health Perspectives * Vol 104, Supplement -March 996 11 10 yield smaller confidence intervals, and hence higher BMDs, around the benchmark effect level. Given this new consideration, it is important to analyze how elements of study design can influence estimation of the BMD. While there has been considerable effort placed on examining the influence of study design on cancer risk assessment models, the fact that target risk estimates for noncancer effects lie in the observable range (10°-10-1) as opposed to very low and experimentally nonconfirmable levels for carcinogenic effects (generally 10-5-106) provides a different set of issues related to study design. Thus, it is expected that the BMD will be less sensitive to model misspecification, provided that the models are flexible in terms of handling different dose-response patterns (11). The most important issue for BMD calculations is how the number of dose groups, their spacing, and sample size affect the precision and accuracy of the risk measure. These aspects were studied by simulating the doseresponse effects of 5-fluorouracil (5-FU) on the incidence of malformations and reduced fetal growth noted in a study by Shuey et al. (19). The primary findings of that study will be presented in the discussion on biologically based dose models. In the simulation study (20), fixed sample size designs were studied: a total of 80 litters were distributed as evenly as possible among four, five, six, seven, or eight dose levels, as well as fixed dose group designs in which 10, 13, 17, or 20 litters per group were assigned to either four or five strategically spaced dose levels. In this review, discussion will be focused on malformations as the outcome variable.
The observed and simulated dose response patterns for 5-FU-induced malformations are depicted in Figure 2A. Note that the response pattern is steep; elevated responses are first observed at 30 mg/kg and increase rapidly to reach nearly 70% by 40 mg/kg. In the original experimental data, the BMD for malformations was 26.5 mg/kg with a NOAEL of 30 mg/kg. Figure  2B presents a composite analysis of the results from five different simulated designs. The designs varied from four through eight dose groups, with all dose groups consisting of 10 litters. Fifty permutations were run for each design; benchmarks were calculated using the log-logistic model with litter size and intralitter correlations as optional parameters. NOAELS were determined by the NOSTASOT method (21). Several elements can be observed in Figure  2B: the proximity (and variability) of the  Dose, mg/kg Ise level that an underestimation of about 10% when formations the design carried two responding dose ilation); the levels, neither with a response level near the confidence ED05 (i.e., there was no dose group at 30 BMD); and mg/kg). In both of these cases, the confiexample, if dence interval for the MLE was large, with )se group at mean BMD levels as low as 50% of the -ified as the MLE. The dose-spacing and dose-response 8%) of the patterns for the design that contained 0, e ED05 can 10, 20, and 40 mg/kg are somewhat like aared error those observed with a typical design with bias can be an active developmental toxicant. The varirence from ability of the MLE was lowest and the magthe design, nitude of the BMD highest (e.g., smallest ar from the confidence intervals for the MLE) when the ere an over-design contained two positive dose groups, design with one of which was near the true ED05 (i.e., esponse and one dose group at 30   from simulations of different study designs for abnormalities given 10 litters per dose. For each study design, the box plot on the left is for the MLE, the box plot immediately to the right is for the BMDs, and the number of simulations (50 total) at each dose level giving a particular NOSTASOT is inscribed to the right. Box plots are as described in A. Adapted from Kaviock et al. (20).

The Biologically Based Approach
In vivo Thus far we have discussed how to better use data traditionally acquired in the hazard identification phase of risk assessment. We now turn our attention to approaches that require additional knowledge and characterization and, by doing so, offer greater promise of reducing the uncertainties present in extrapolating data from experimental systems to humans. In the first set of examples, the problem is approached from the top, i.e., identifying an effect and putative mechanism, acquiring the necessary information, and constructing a formal quantitative response model. These models, termed pharmacodynamically or biologically based dose-response (BBDR) models, break down the sequence of events-from administration of the chemical to expression of Environmental Health Perspectives -Vol 104, Supplement * March 1996 i 1 toxicity (in the case of developmental toxicity these would be an altered phenotype in the offspring)-into individual intervening steps and attempt to quantitatively describe each segment (Figure 3). Thus, to the extent possible and feasible, these models attempt to determine and quantify mechanisms of toxicity and how the relationships may change as a function of dose rate, route of administration, and molecular, biochemical, and physiological differences across species. Much of the early efforts in development of BBDR models has been in the area of carcinogenesis (22,23), but attention is now being turned to noncancer end points as well. Gaylor and Razzaghi (24) postulated a model to describe the induction of cleft palate by 2,4,5-T (using the same NCTR dataset used above for multinomial benchmark-dose comparisons). To simplify the model, they assumed that the chemical affected only one stage of development through a reduction in cell number, which in turn yielded palatal shelves that were too small to close, hence the cleft. A logistic function was used to describe the growth rate of cells, and an exponential function described the effect of target dose on the growth rate constant. The probability of a normal palate in a treated animal relative to a normal palate in a control animal was assumed to be equal to the ratio of the number of palatal cells present at the critical time of closure. The overall model was expressed as: ot'e-aDy -pot' e -e where P (D) is the probability of a cleft palate at dose D, Bo is the growth rate constant of palatal cells in untreated animals, and t' is the time to complete the ith stage of development. Note that two parameters, a and y, must be estimated for growth rate of the cells in the palatal region, as well as an estimate of the background incidence of clefting in control litters. For cross species extrapolation, it would be necessary to further assume that postulated relationships between administrated dose and target dose and with cell kinetics and morphogenesis are similar in the exposed and target species. They cautioned that, even in this simplified case, many assumptions had to be made and the estimates of response from the BBDR model may not improve those obtained by logistic analysis of  standard bioassay data. It is also worth noting that this approach did not consider among-animal variability in parameters and thus did not include an important component of modeling.
In an attempt to prospectively build a BBDR model, Shuey et al. (19) examined the response of gestation day (GD) 14 rat fetuses to the chemotherapeutic 5-FU. This agent was selected because steps in altering cellular biochemistry are generally known (inhibition of the enzyme thymidylate synthetase by the metabolite 5-fluoro-2'-deoxyuridylic monophosphate) and the consequences of depleted nucleotides on the cell cycle, and hence fetal growth and hind limb development, were quantifiable. The experimental design included examination of a wide range of dose levels, tissues, end points, and time points following dosing. The end points induded activity of thymidylate synthetase, the synthesis of DNA and protein, cell cycle kinetics, tissue morphometrics, and fetal morphology. In constructing the quantitative dose-response model, they focused on enzyme activity 1 hr after dosing, the percent of cells in S phase 8 hr after dosing, limb bud morphometry 24 hr after dosing, and the incidence of malformations on day 21 of gestation (7 days after dosing). Each step was described by a Hill equation; the individual equations were linked as proposed in the initial framework to yield an empirical model of the induction of developmental toxicity (Figures 4 and 5). The linked model slightly overestimated the incidence of hindlimb defects observed in the original bioassay. A Monte Carlo simulation to evaluate the amount of variability around the predicted relationship between the administered dose of 5-FU and the incidence of digit agenesis showed that a few combinations provided predicted curves near the original dose response. The model, though biologically based, must still be regarded as empirical because of the lack of a priori biological basis for the form of the quantitative expressions. As a consequence, the proposed model cannot be used to make predictions about other thymidylate synthetase inhibitors, effects following exposure at other critical times during development or via other routes of exposure, or in other species. Nevertheless, the effort provided clear evidence of the feasibility of constructing mechanistic models, as well as the somewhat daunting data intensiveness required of even relatively simple postulated cause-and-effect mechanisms. If experimental situations can be constructed in which portions of the response model can be verified with in vitro tissues from the test species and from human embryo tissues, then it might be possible to make the models less empirical in nature. In addition, if models could be built upon final common pathways of chemical perturbation (altered nucleic acid pools or altered cell cycle kinetics), then their use could be extended beyond the particular chemical under study. Leisenring et al. (25) have proposed such a model based upon cell kinetics and a branching model. In any event, the data intensiveness involved in the construction of models of this nature will, for the foreseeable future, limit their application to situations in which large segments of the population are exposed to low levels and where determination of safe exposure levels is extremely important or to economically important chemicals where the costs of regulation would warrant a concerted effort to minimize the uncertainties inherent in extrapolation. To generalize the predictiveness of the experimental model, a parallel modeling effort was undertaken using computer simulations of the toxicokinetics, cellular biochemistry, and cell kinetics.

Computer Simulation
The next step in the evolution of the dose-response model is to develop mathematical descriptions that capture the mechanisms that are responsible for the steps in the causal cascade from exposure to final effect. This has several advantages. If there is a good mapping between the biological structures and processes being modeled and the equations and parameters in the  model, and the biological processes in the animal for which the model is developed reflect the processes going on in the target species (usually humans), then interspecies extrapolation can be carried out by replacing the species-specific parameters. Under current dose-response methodology, there is little empirical support for low-dose extrapolation; however, the creation of the sort of biologically based dose-response model considered here presumes mechanistic understanding of the toxic processes, and lends credence to any low-dose extrapolation. A similar argument applies to route-to-route extrapolation. Finally, such a model incorporates data from a wider diversity of sources than conventional dose-response models. Dose-response bioassays and virtually any other relevant biological information could be incorporated into such models. In the development of such models, it is helpful if their structure reflects the biological structure of the system being modeled. Thus, it may help to divide the events to be modeled into steps as depicted in Figure 3, with each submodel linked together by common variables. Conventional dose-response models relate some measure of final outcome, such as incidence of malformations or average weights, directly to an administered dosage. It is more natural for this kind of BBDR to predict time courses of phenomena because it is usually easier to express models in terms of rates. So, for example, a pharmacokinetic submodel will predict the concentration of a compound at the target cell surface for a continuous range of times after dosing; submodels for the interaction with molecular receptors and subsequent changes in cellular behavior will predict the time course of such behaviors (e.g., the fraction of cells undergoing apoptosis at a given time). Lastly, the time course of the final end points in the causal cascade must be related to the observed adverse effect.
Consider as an example a model for the effects of 5-FU on cell-cycle kinetics in the rat fetus described above (19). A mathematical model is being developed to facilitate the understanding of the relationships among enzyme activities, nucleotide concentrations, and subsequent perturbations of cell kinetics observed in this system. Although the model is still in development (26)(27)(28), it is far enough along to allow discussion of the salient points in nonmathematical form.
The mathematical model is an attempt to quantify an admittedly simple conceptual model for the developmental toxicity of 5-FU: when pregnant dams are dosed on GD14 by subcutaneous injection, the 5-FU is absorbed from the site of injection into a large, metabolically inactive compartment. From there it enters the blood and is distributed to a metabolically active compartment, which includes renal elimination and the uterus. The original motivation for bui this particular model was to account f apparent discrepancy between the reli ship of 5-FU dose to maximum TS ir tion on one hand and that of dose t incidence of malformations and w deficits on the other (19). In brief, a dosage increases, the marginal incre-TS inhibition decreases sharply. The response for malformations has neo hockey-stick shape ( Figure 2A); at the dosages at which malformations begin to increase, the additional increase of TS inhibition due to increases in 5-FU dose has already become very small. The mathematical model reproduces this behavior ( Figure   6). Thus, what at first seemed to be evidence for the action of mechanisms other than initially hypothesized is actually the normal behavior of a highly nonlinear interactive system.
The 5-FU model as described here is far from being useful as a quantitative dose-response model for risk assessment; however, it serves as the source of some instructive points. First of all, the model reproduces the very steep increase in malformations seen in rats exposed in vivo beginning at 30 mg/kg in its prediction of cell cycle disruption (Figure 7). Although this -ymay yet turn out to be an artifact of model 40 construction, the observation points out the possibility of using such models to infer safe exposure levels based on a quanticurves tative understanding of the mechanism of onship. action of a toxicant. By looking at how n from much the threshold varies when the model is run with different values for the kinetic parameters, it would be possible to get at least a semiquantitative estimate of the d cell degree of intraspecies variability in the and threshold one might expect. Thus the model could serve as a link between the ludes variability of metabolic values, which could listed be observable experimentally, and the redict uncertainty factor for intraspecies variability, fetal which so far is usually determined purely -ludes formally. Finally, by putting hypotheses JMP) about mode of action in a quantitative lanations guage, experimental tests of those hypotheetics. ses can be carried out in a more rigorous nodel manner, facilitating decisions about the id 65 appropriateness of an animal model for  ise for the fracextrapolation to humans. In addition to refining the precise mathematical form of the model, we are currendy evaluating several model-generated hypotheses concerning the effects on nucleotide pool sizes, cell cycle kinetics, and rescue by presumably rate-limiting nucleotides in a simpler in vitro cell system. If these experiments are successful, similar information will be collected on embryos exposed in utero. All along, there will be an iterative interplay between model formulation and experimental data as we learn more about the underlying biological processes. It is reasonable to inquire whether the effort implied by this discussion of BBDR modeling could ever be justified in routine practice. Clearly, in our current state of knowledge, it would be unrealistic in the extreme to propose that all dose-response assessments for developmental toxicity should be conducted under this paradigm.
However, there may be toxic agents with such pervasive distribution and potentially toxic effects that such intense effort is justified: chemicals such as dioxins, PCBs, and other persistent environmental chemicals that mimic or inhibit the effect of endogenous hormones may provide examples. Such efforts may also be warranted for chemicals that are being proposed for significantly new uses in which the potential economic gain to the supplier and resulting widespread human exposure justifies extended examination of the hazards identified by more traditional toxicological approaches (e.g., fuel additives, alternative fuels). Perhaps the greatest potential for such models is not for assessments of specific chemicals but as research tools to help elucidate general mechanisms of toxicity. As the experience with such investigations accumulates, it may be that their results could be treated as a toolbox out of which models for new compounds could be constructed with only incremental increases in effort. In the meantime, the rigor required to construct BBDR models can only benefit toxicological mechanistic investigation.

The Embryologically Based Approach
The final approach we present is dearly the most visionary and hypothetical of the approaches. The EBDR approach begins not with an effect and mechanism but with the fundamental understanding of normal morphogenesis and, only secondarily, factors in how these events are perturbed by exogenous agents. Such models would be adaptable to the effects of multiple chemicals provided they captured the salient biological events. The biological understanding of morphogenesis at the molecular level, linked with mathematical theories and constructs of pattern formation, may open the door for these approaches. We will use recent advances in the understanding of the role of homeobox genes in development of the axial skeleton and limbs to illustrate how the emerging knowledge of positional signaling is providing information to take the heretofore theoretical models of pattern formation into potential use by developmental toxicologists.
Homeobox genes, so named for a 183 bp DNA sequence that yields a 61 amino acid protein sequence containing a DNA binding domain, were first described in Drosophila about 10 years ago (34). These genes are highly conserved across many phyla, and today some 38 Hox genes organized in four chromosome complexes are recognized in the mouse (Figure 8). The role of Hox genes in pattern formation in vertebrates is in part due to a feature termed collinearity, that is, there is a close relationship between expression along the anteriorposterior axis of the embryo and the gene's order along the chromosome (Figures 9  and 10) (35-37). Manak and Scott (36) provided several conserved rules governing Hox gene function in the developing vertebrae: a) their tissue expression and function follows along the order on the chromosome; b) more genes are expressed in the more posterior regions; c) loss of gene function leads to development of anterior structures where posterior structures should have formed (e.g., a rib on the 14th postcervical vertebra of a rodent); d) activation of genes where they should be off (gain-of-function mutations) leads to formation of posterior structures where anterior structures should be (e.g., the presence of only 12 pairs of ribs in rodents); e) each homeotic gene contains a single homeobox, which encodes a specific DNA-binding transcription factor; and f) most of the 5' ends of the transcription are oriented toward the end of the cluster.  coordinated appearance during early pattern formation implicates them in at least specifying segment identity if not the actual segmentation process. Other gene classes are also known to be involved in sculpting the formation of the vertebrae, with the Pax genes (especially those that include the paired box and homeobox) among the more well studied (40). During the last several years, a number of loss-of-function mutants created by recombinant genetic techniques illustrate the governing rules stated above (Table 1). Phenotypes similar to those of the knockout experiments are familiar to developmental toxicologists, and it should not be surprising to see that homeotic-type alterations in the vertebrae, particularly the anterior and posterior boundaries of ribs, are frequently induced by xenobiotics. Thus, anteriorizations (i.e., the taking on of the morphology of the immediately preceeding segment) of the first lumbar vertebrae as induced by maternal toxicity or stress (58,59); valproic acid (60); bromoxynil (61,62); salicylate (63); dimethadione (64) and retinoic acid (65) are but a few examples. Posteriorizations of the thoracolumbar border have also been observed (66)(67)(68), but this effect is clearly less frequent in the developmental toxicology literature. Toxicant-induced alterations are not limited to the thoraco-lumbar border and may also involve posteriorizations of the cervical-thoraco juncture, as demonstrated by experiments with methanol (69,70) and boric acid (67). In general, these agents induce frank dysmorphologies of the axial skeleton at higher dose levels; however, the phenotypes rarely, if ever, completely resemble the null hox phenotypes regardless of response level. Closer examination of subtle skeletal morphological features in developmental toxicity bioassays may strengthen the impact of the effects on the axial skeleton. We see changes at the major boundaries because that is where our attention has been focused and because of the ease of observing such changes. In addition, comparison with phenotypes from individual null mutations might not be expected to produce complete concordance due to other possible pathogenic pathways as well as thepotential redundancy in Hox gene function (52).
The study of pattern formation in the limb has perhaps received the most attention of any vertebrate organ (71)(72)(73)(74)(75). The limb begins as an outgrowth of the lateral somatic mesoderm as cell proliferation slows in the regions immediately anterior and posterior to the emerging bud. An apical epidermal ridge (AER) soon develops and is maintained by the underlying mesoderm. In turn, the AER supports proliferation of the underlying mesenchymal cells in what is termed the progress zone (PZ). Recent evidence suggests that fibroblast growth factor (FGF) 2 and FGF 4 may be the morphogenetic signal emanating from the AER. Continued cell proliferation in the PZ gradually establishes the proximaldistal axis of the limb. As shown by surgical removal of the AER at various stages, the longer cells stay in the PZ, the more distal the structures they will form. As cells leave the -PZ, they decrease their rate of proliferation and begin to differentiate; thus as they exit, they are believed to have acquired some aspects of positional identity. A small set of cells in the posterior area of the PZ then gains a special property that contributes to the anterior-posterior axis. This area, known as the zone of polarizing activity (ZPA), was first identified by its ability to induce digit duplications when grafted into the anterior region of the emerging bud (76). The polarizing ability of the ZPA was at one time thought to be due to its role in establishing a gradient of retinoic acid, but it has now been demonstrated to result from expression of sonic hedgehog (Shh), a gene related to Drosophila segment polarity gene hedgehog (77). The Shh gene product is an autocleavable protein whose active amino terminus  remains within the cell of origin and therefore does not meet the criterion of a true morphogen. Ectopically placed retinoic acid-soaked beads have been shown to induce Shh, create a new ZPA, and cause mirror-image digit duplications. As with the axial skeleton, Hox genes are important in providing positional information that defines axes of the limb, if not of the digits themselves. Genes in the Hoxa cluster have been reported to emerge in a collinear fashion from the PZ, such that the more 3' Hoxa-Ji is expressed more distally than Hoxa-13 (78). This gradient may be involved in proximal-distal axis definition. Similarly, the Hoxd cluster is expressed in a collinear fashion in response to the ZPA, with Hoxd-9 coming to be expressed in the most anterior and proximal region and Hoxd-10, d-11, d-12, and d-13 having progressively more posterior and distal expressions. Expression of the Hoxd cluster in the limb is dependent on the presence of the AER. The progressive expression also correlates with the onset of asymmetry within the limb, as the posterior half becomes larger than the anterior as Hoxd-1 1 and Hoxd-12 are activated, perhaps suggesting a role in mediating growth (73). By providing this overlapping gradient of expression, the five members of the Hoxd cluster help define the number and placement of digit types. The fact that the Hoxa gradient is orthogonal to the Hoxd gradient may give further positional identity to cells. However, results of knockout experiments have not necessarily supported some facets of this model of positional information. For example, following disruption of Hoxd-13 (which might be hypothesized to interfere with the most posterior digit based upon its expression pattern), all digits, but particularly digits II and V, are reduced in size, and half the animals possessed an additional rudimentary digit posterior to digit V (57). The early Hox gradients are also clearly dynamic in space and time, and the more restricted domains seem to fade with time into a more uniform expression pattern (73). As with the axial skeleton, the downstream events from Hox expression in the branching process and formation of bones in the limb remain to be discerned, although retinoic acid and its metabolites, binding proteins and receptors, are undoubtably involved (72,79). Finally, the Wnt genes appear to have a role in establishing the dorsal-ventral axis of the limb bud (80).
To what extent does our increasing knowledge of the molecular foundation of pattern formation allow us to judge the significance of toxicant-induced homeotic shifts or other structural perturbations during development? Are the Hox genes directly involved in providing positional information for subsequent pattern formation? Can we identify alterations in patterning-gene expression in the immediate stages following toxicant exposure when the agent is initiating the morphologic lesion? The difficulty in establishing relative landmarks at these early embryonic stages and quantitating molecular responses at the cellular level makes this problematic. Are the shifts in the boundaries the limit of their phenotypic expressions or just the tip of other responses that are more difficult to identify? If direct primary links between xenobiotics and altered gene function become evident, will we become more worried because we are perturbing the action of transcription factors key to morphogenesis; or do we become less worried as we understand the redundancies built into the overlapping expression of paralog members? Before we can answer such questions, we need to be able to quantify expression in space and time on the cellular level, characterize the extent of variability in controls, learn exactly how xenobiotics can affect expression (do they all, for instance, modulate local retinoic acid concentrations?), and understand the downstream events that translate the expression into cellular characteristics of particular segments (be it vertebrae or limb components). In these regards, it is desirable to have frameworks in place for assimilating the information and even testing hypotheses using computer models, as is being done for the 5-FU BBDR presented above. It is in this context that we introduce more theoretical formulations of pattern formation in biological systems.
Since Turing introduced the concept of morphogen in 1952 (81), modelers have explored the consequences of different theories of pattern formation through the behavior of mathematical models. Two major kinds of models have been devised to explain the complex patterns seen in development. The first type of model supposes that the development of pattern occurs in two stages. In the first stage, a field that provides positional information is laid down; it is a prepattern to be used by developing cells to determine their position in the field. During the second stage, cells sense their positions relative to this chemical coordinate system and react appropriately. This concept has been referred to as positional information (82,83). In the second type of model, the final pattern unfolds due to the manifestation of mechanical and chemical interactions among the developing cells. The volume by Othmer et al. (84) and the references cited therein provide a good introduction to this topic.
Models for developmental processes have been mostly abstract explanations of the consequrences of hypothesized mechanisms of pattern formation. Molecular embyrology, as exemplified by the discussion of the role of Hox genes and other factors in specifying segment identity, is now providing concrete expressions for the hypothesized mechanisms. Thus, models of pattern formation should take on new significance when combined with the detailed understanding of how locational information is actually expressed at the molecular level during development.
Historically, two principal mechanisms have been proposed to convey positional information in the mathematical models. The most common approach uses a gradient of chemicals to provide the positional information. In an interesting recent example, Levin (85) supposes that the products of two genes are initially distributed on a gradient and interact intracellularly to produce complicated patterns, making use of fractals and chaos theory to generate developmental patterns. A more common model relies on pairs of diffusing chemicals that interact with each other to form stable patterns in the concentration of the two chemicals (reaction-diffusion models). In the general form for such a system, one of the chemicals catalyzes its own synthesis as well as that of a second chemical, which inhibits the synthesis of the first chemical. Both chemicals diffuse away from their point of synthesis, but the inhibitor diffuses faster.
Thus, there arises a common developmental pattern of local self-enhancement and long-range inhibition. Simple models, with this structure as a point of departure and with a single spatial dimension as well as time, can demonstrate a number of spatial patterns such as gradients or stripes and can qualitatively reproduce the behavior of some experimental developing systems after they are perturbed (86).
In the other approach, mechanicochemical models, pattern formation and morphogenesis evolve simultaneously to produce the final pattern. Here a more simple positional signal, for example, a simple gradient in a chemical, interacts with other mechanical aspects of cells, such as differential adhesion and motility, to generate the final pattern. No prepattern is formed in these models; rather, it is the innate behavior of the cells themselves that forms the final morphogenetic pattern. As in reaction-diffusion models, successful mechanico-chemical models generate a pattern of local enhancement and longrange inhibition (87). Figure 11 shows an example of how a simple model involving chemotaxis and mitosis of pigment cells can reproduce the complicated skin pigmentation patterns seen in snakes (88). Sometimes, to account for the development of complex patterns, the underlying parameters of the model, such as the size of the developing tissue or organ, are allowed to change (87,88). The resulting pattern is an interaction between the stable pattern that would have evolved for fixed parameters and the change in the parameters.
For much mathematical modeling in developmental biology, the general form has been to show that some specific pattern could be generated by a particular mechanism. Often, the absence of much-detailed biological information, such as rates of reaction, actual cellular behavior, and even the identity of hypothetical reacting chemicals, has forced such models to be fairly abstract. Even so, their ability to show the consequences of simple hypotheses of interaction has been valuable, both for testing hypotheses about mechanisms of development and for augmenting biologists' intuition about such systems. However, in risk assessment, we are likely to have fairly detailed and specific information about how a chemical interacts at the subcellular level and how the cells' behavior changes as a consequence. To be able to link these changes to developmental changes, more detailed and specific models for morphogenesis are needed. The problem changes from demonstrating that a particular pattern could be generated by a given mechanism to quantifying how much a particular cellular behavior can change without affecting subsequent morphogenesis or, more generally, how much morphogenesis is affected by a given change in cellular behavior. Naturally, before we can solve this problem, much more needs to be learned about normal development, and more specific models for normal development need to be developed. This is perhaps an area where developmental toxicologists and developmental biologists can collaborate successfully. By using the plethora of developmental perturbations available through developmental toxicants and observing and modeling their effects on normal development, our knowledge of both normal and abnormal development should be greatly enhanced. The emerging knowledge base on molecular morphogenesis as exemplified by the axial skeleton and limbs appears ripe for the task.

Conclusions
Risk assessment for developmental end points is entering a state of flux after a relatively long period of inertness. The benchmark dose concept is certainly an improvement over the use of NOAELs and LOAELs (lowest observed adverse effect levels) for standard setting. Nevertheless, it presents the same problems for extrapolation as the NOAEL. Only by developing more complete pictures of how developmental toxicants perturb normal development will we be able to put extrapolation on a more empirical and scientific footing. This will not be easy and it will require that developmental biologists become familiar with developmental toxicology (and vice versa), that biologists become more comfortable with quantitative methods, and that biomathematicians be willing to work with the often complex and inelegant mathematical systems required to more realistically model specific biological systems. We also need to develop better ways to use partially developed models to enhance more statistical approaches to risk assessment, perhaps by modifying uncertainty factors or through dose scaling, as we occasionally use pharmacokinetic models now.