Overview of a workshop on quantitative models for developmental toxicity risk assessment.

A workshop was held to discuss potential advancements to improve the precision of risk estimates for developmental toxicity. This paper presents an overview of the discussions at the workshop, focusing on the risk assessment process and science policy considerations important in the use of quantitative models. Some of the pertinent biological considerations are reviewed, particularly those related to the repair capacity of the developing organism and how this affects the concept of a threshold for developmental toxicity effects, as well as the maternal and litter influences on developmental toxicity outcomes. Finally, the current status of use of quantitative approaches is described, possible short-term approaches are discussed, and future research needs in this area are outlined.


Introduction
Over the past 2 years, an Environmental Protection Agency (EPA) working group on Approaches to Quantitative Reproductive Risk Assessment has been evaluating various approaches to improve the precision of risk estimation in reproductive and developmental toxicology. The workshop summarized here was an outgrowth of those considerations and focused on the area of developmental toxicology because the strongest data bases that are most amenable to risk estimation are available in this area. The workshop was held May 7-8, 1987, at the Stouffer's Concourse Hotel, Crystal City, VA. Invited speakers and participants included reproductive and developmental toxicologists, epidemiologists, and statisticians both from EPA and from outside the Agency. This workshop was the first of a series to discuss the development and use of more quantitative models for risk assessment in reproductive and developmental toxicology, to consider *See Appendix for author affiliations and addresses. approaches that may be useful in the near future, and to stimulate research in the area.
Several of the speakers at the workshop reviewed the important risk assessment and science policy issues, as well as underlying biological assumptions and uncertainties. This overview represents a synthesis of these presentations. Four papers that accompany this overview address specific approaches that have been used in the mathematical evaluation of data in the area of developmental toxicology (1)(2)(3)(4).

Risk Assessment and Risk Assessment Guidelines
In 1986, the EPA published guidelines for risk assessment in five areas (5): carcinogenesis, developmental toxicity, mutagenesis, chemical mixtures, and exposure. Additional guidelines on male and female reproductive toxicity and on systemic toxicity are in preparation. These guidelines present the scientific principles and inferences used by the Agency in con-ducting risk assessments. They also are intended to promote quality and consistency in assessments across programs within and outside the EPA. The guidelines were developed based on the principles set forth by the National Academy of Sciences in their effort to describe the risk assessment process in the federal government (6). In that document, four components were identified: hazard identification, dose-response assessment, exposure assessment, and risk characterization. The hazard identification and dose-response assessment are closely tied and represent the first steps in describing the animal data. For the vast majority of chemicals, the data base available is totally derived from animal studies. In the exposure assessment, the level of human exposure from all potential sources is estimated. The final stage of the process, risk characterization, summarizes what is and is not known about particular chemicals and their impact on animals and humans; it evaluates the hazard in light of the estimated level of human exposure and describes the assumptions used and uncertainties in the delineation of risk. The risk characterization is used by risk managers to assist in the formulation of regulatory decisions.
The Guidelines for the Health Assessment of Suspect Developmental Toxicants define developmental toxicity broadly to include adverse effects on the developing organism that may be manifested as deaths, structural abnormalities, altered growth and/or functional deficits (7). Thus, developmental toxicity includes a variety of types of outcomes and does not focus only on birth defects. Exposure may occur prior to conception (to either parent), during prenatal development, or postnatally to the time of sexual maturation. In addition, developmental effects may be detected at any point during the lifespan of the organism, although effects manifested months to years postnatally are more difficult to link with a specific exposure.
In the extrapolation of risks for developmental toxicity, the approaches currently used are outlined in the developmental toxicity risk assessment guidelines (7). One approach is the application of an uncertainty factor to the NOEL (no-observed-effect level) for the most sensitive animal species tested. The uncertainty factor is usually composed of a 10-fold factor to account for interspecies differences and a 10-fold factor for intraspecies variability. If a NOEL is not available, then an additional 10-fold factor may be applied to the LOEL (lowest-observed-effect level). The other approach is the calculation of a margin of safety (MOS), which is the NOEL divided by the estimated level of human exposure from all potential sources. The MOS can then be evaluated for adequacy to protect human health. There are several drawbacks to either of these approaches, the primary one being that they use only one point on the dose-response curve (NOEL or LOEL) and ignore the rest of the data. Also, since the variability around the NOEL/LOEL is usually not taken into account, these approaches may reward poor studies, i.e., studies that result in a higher NOEL because of their limited ability to detect small changes over background.
The initial efforts in quantitative extrapolation from high doses conventionally used in animal studies to low environmentally relevant doses and across species were made in the area of carcinogenicity. There are a number of critical differences in the inferences made for carcinogenicity compared with other types of toxicity, the primary difference being the assumption that there is essentially no threshold for carcinogenicity. For all types of toxicity except carcinogenesis and mutagenesis, the assumption is generally made that there is a threshold, i.e., a level below which one would not expect to observe an effect. These differences in the underlying assumptions made for carcinogenicity and for other forms of toxicity thus make it inappropriate to apply the same kinds of techniques for extrapolation in all areas. Therefore, better definition of appropriate means for extrapolation of noncancer effects is needed, particularly in the areas of reproductive and developmental toxicity.
Risk assessment for developmental toxic effects also differs from that of carcinogenesis in that the effect under study occurs in a different unit (embryo/fetus or offspring) than the unit that is exposed (parent). This leads to very different data structures for developmental toxicity and carcinogenesis laboratory studies. Observations in the former are on individual members of each litter, and, because of intralitter relatedness, are not independent. This presents serious complications in modeling the dose-response relationship and eliminates the statistical methods used in carcinogenic risk assessment.

Science Policy Issues
During the last decade, the EPA and other federal health and safety regulatory agencies have focused much of their attention on the control of chemical carcinogens. The frequency of cancer, the fact that it is difficult to detect early and treat, and, as a consequence, is often fatal, have produced a high level of public concern. This is reflected in a number of congressional mandates that EPA and other agencies implement to deal with the risks of cancer. The regulation of carcinogens has been aided to a large extent by the elucidation of mechanisms of carcinogenesis and the incorporation of this information into quantitative models that can be applied consistently to estimate risk for carcinogenesis.
However, serious health effects other than cancer also place social and economic burdens on society. For example, it is estimated that 6% of live births per year in the U.S. have congenital defects that are identifiable in the first year of life. This number approximately doubles with advancing age, and close to one-half of all hospital beds in children's wards are occupied by individuals with congenital defects (8). The etiology of only about one-third of the cases is known, and environmental agents and drugs account for only 4 to 5% of all developmental defects (9). Since adverse developmental effects generally are manifest early in life, the result is often many years of suffering at a great health-care cost to society. The Agency has been using traditional margin-ofsafety (MOS) methods for developmental toxicity effects for some time; however, there are several advantages for moving beyond the MOS and uncertainty factor methods. For example, a quantitative risk assessment for developmental toxicity effects is an important step toward refining the dose-response relationship and interpreting scientific data for setting standards for human exposure. It should allow a more precise estimation of the risk at doses where human exposure can be expected to occur. It makes more complete use of the dose-response data by considering the slope of the estimated dose-response curve, the spacing of the experimental doses, and the number of animals tested, whereas the MOS approach relies primarily on the use of the NOEL. Ultimately, quantitative risk assessment provides a basis for comparing hazards from different substances in a controlled and predictable manner and thus can be used to set priorities for the allocation of resources of both the Agency and society.
As the scientific basis of quantitative risk assessments for adverse developmental effects continues to develop, certain science policy issues and critical questions must be addressed. Decisions will have to be made about which model is the most appropriate for the biological processes of concern. What criteria have to be met for scientists to recommend a quantitative model for risk assessments on developmental toxicants? Are different models required for different outcomes, e.g., birth defects, low birth weight, reduced prenatal and postnatal survival, reduced functional capacity? The issue of threshold must also be addressed. Because a threshold is generally assumed for developmental toxicity effects, the incorporation of such an assumption will affect the risk estimates obtained from the model. Should threshold considerations be incorporated into the model or should no threshold be assumed as is done in the cancer model? Of course, any model(s) that are chosen by the Agency should not be static but have the potential for incorporating new information.
The development of quantitative risk assessment methods for developmental toxicants is an encouraging trend, correcting what many believe has been an unbalanced focus on cancer. The use of quantitative risk assessments for prioritizing the risk of exposure to developmental toxicants will enable federal agencies to better assess social and economic priorities and provide the most efficient use of the resources available for our health and environmental quality.

Biological Considerations for Developing Quantitative Models
Several issues considered at the workshop related to biological considerations important in the develop-ment of quantitative risk assessment models for developmental toxicity. These included the repair capacity of the developing organism, how this affects the concept of a threshold for developmental toxicity effects, and the role of the maternal and litter influence on developmental toxicity outcomes.

Mechanisms of Developmental Toxicity and Repair
A variety of mechanisms have been suggested as the causation of birth defects. Genetic (mutation, chromosome nondisjunction, altered gene function), as well as nongenetic events (insufficient supply of energy sources and substrates, enzyme inhibition, altered cellto-cell contact, and osmolar imbalance), have been described for compounds possessing teratogenic activity (9). Despite the diversity of initial cellular reactions, a common final pathway involving insufficient cells or cell products to carry out morphogenesis is believed to exist, although a complete understanding of the sequence of events leading to teratogenesis has not been accomplished with any agent. Depressions of DNA synthesis as well as DNA damage do not appear to be as important as increased cell death in the initiation of teratogenic damage. For example, the embryo can tolerate substantial depressions in DNA synthesis, which, unless accompanied by excessive necrosis, do not lead to malformations (10,11).
The induction of teratogenic damage at the cellular level is clearly a multistage process wherein a sequence of events must occur before the final malformation is expressed. At the organ/whole embryo level, Fraser (12) described and empirically demonstrated the multifactorial/threshold concept based on cortisone induction of cleft palate in mice. According to this concept, both genetic and extrinsic factors may influence the final outcome and thus affect the threshold dose. In this scenario, a critical mass of cells must be affected before a threshold is crossed and dysmorphogeneic processes are set in motion. This model for the induction of developmental toxicity is in contrast with the concept developed for carcinogenesis where a single hit, a single molecule, or a single unit of exposure is believed to be sufficient to initiate the transformation process.
For some time, it has been recognized that compensatory growth, or increased mitotic activity, occurs in embryonic tissue after teratogenic insult. Snow has used the term "restorative growth" to describe the process whereby mammalian embryos replace cells lost through tissue damage or deficit (13,14). Enhanced cell proliferation may be observed in every tissue of the embryo with increased necrosis, although malformations only occur at specific sites of damage. Differential cell death, most likely due to differences in cell cycle characteristics, is the necessary prerequisite and initial trigger to restorative growth. As a result of differential cell death, imbalance is created among cell types in a tissue and/or organs within an embryo. The result is an uncoordinated growth process that is likely to cause mistiming of inductive interactions between tissues that are initially perturbed by differential cell death.
There have been a number of attempts to measure DNA damage and repair in embryonic tissues following treatment with cytotoxic teratogens (15,16). Even though DNA damage has been identified by a variety of techniques, measurement of DNA repair has been hampered by the cell death caused by these agents. Disappearance of DNA damage cannot be clearly attributed to cellular repair processes or to removal of affected cells from the population through necrosis. Even in cases where DNA damage has been shown to persist after the necrotic episode, it has not been possible to correlate it with dysmorphogenesis. Consequently, little is understood about repair processes at the cellular level beyond restorative growth.

Consideration of the Threshold Concept for Developmental Toxicity
The concepts developed concerning induction and repair of prenatal toxicity have led to formulation by Wilson of the following major principle of developmental toxicology (9): "Manifestations of deviant development increase in frequency and degree as dosage increases from the no-effect to the totally lethal level." Developmental toxicologists have generally assumed that all agents causing prenatal toxicity do have a threshold below which no effect of any kind can be demonstrated. This assumption is based on the bulk of developmental toxicity studies conducted since 1966 in which embryolethality and malformations were measured. Dose-response curves in developmental toxicity studies are typically steep, and embryotoxic responses depend upon the reaction of integrated groups of cells, tissues, and organs (17).
While the existence of thresholds for the dichotomous responses of malformation and embryolethality is generally accepted, the continuous variable of growth retardation has not been subjected to the same scrutiny. Conceptually, a threshold should exist for growth retardation insofar as a number of events must occur and accumulate before growth retardation is manifest.
The safety factor approach used by regulatory agencies assumes that the specified acceptable exposure level for a developmental toxicant is below the threshold dose of most if not all exposed members of the population. Kaplan et al. (18) have studied the validity of this assumption under a simple model that describes the variability of individual threshold doses in the human population. Their results show that the fraction of the population whose threshold doses are below an acceptable exposure level may be significant and represent an upper bound on the added population risk resulting from exposure to the toxicant. They also suggested that there are instances when the traditional safety factor of 100 may be inadequate. For example, if the slope of the experimental dose-response curve is shallow or if there is evidence that the animal threshold is well below the lowest experimental dose considered, then a safety factor larger than the traditional value may be necessary. In other cases, the conclusions are more equivocal, and so the safety factor of 100 may or may not be adequate. Estimation of the risk associated with a given acceptable exposure level using an appropriate model would aid in the evaluation of the additional risk that might be incurred if a threshold assumption was not appropriate.

Maternal and Litter Influence on Developmental Outcomes
A number of maternal factors are known to have significant bearing upon the outcome of standard developmental toxicity bioassays (as well as on human births). For example, the role of the genetic makeup of the pregnant female and fetus has long been known to be of crucial importance in developmental responses to exogenous agents. In both laboratory and epidemiological studies, researchers have demonstrated that there are maternal heritable factors that may significantly alter the probability of defects in the offspring. Maternal disease concurrent with pregnancy also may significantly affect in utero development. For example, rubella infection, cytomegalovirus, and toxoplasmosa in humans, as well as mycoplasma and cytomegalovirus in mice, have been shown to cause developmental toxicity (19). In most cases, it is difficult to differentiate between direct fetal effects and those induced secondarily as a result of maternal disease. A wide spectrum of dietary insufficiencies ranging from protein-calorie malnutrition to significantly reduced quantities of vitamins, trace elements and/or enzyme cofactors (20,21) also are known to adversely affect normal embroyo/fetal development. In addition, there have been numerous studies that examined the effects of maternal stress on embryo/fetal outcome and suggested the association of various forms of stress with adverse effects in the developing system.
Of the factors discussed, genetics and disease are of relatively little practical significance in laboratory studies since these factors are controlled to a considerable extent by the use of genetically well-defined and healthy laboratory animals. Given both the overt maternal toxicity called for by the vast majority of developmental toxicology protocols used for regulatory purposes and the current lack of knowledge of basic mechanisms involved in developmental toxicity, the question of the effects of maternal toxicity and attendant stress is a critical one about which little is known. Data derived from populations of animals where severe maternal toxicity has occurred must be interpreted carefully untiltit is possible to distinguish those embryo/fetal effects associated with such toxicity. Maternal toxicity and/or stress are extremely difficult to define, and while all may agree that a decrease in maternal weight gain or an increase in death may be clear manifestations of toxicity, there may be considerable debate concerning other agent-induced maternal effects such as decreased plasma cholinesterase or hepatic hypertrophy or hypo/hypertension.
Another important point is that many human developmental toxicants act at or very near the maternally toxic dose level (e.g., alcohol, methylmercury, chemotherapeutics) so that the presence of maternal toxicity should not invalidate the data generated at that dose level. Research efforts should be directed toward the identification of developmental effects that are associated with either well-defined types of maternal toxicity or general stress induced by such toxicity.
The attempt to extrapolate risk from laboratory animal species to man must deal with the problem created by attempting such comparisons between species in which large litters are commonplace and a species where multiple births are a rarity. There has, among teratologists, been an ongoing debate concerning the proper experimental unit upon which to base statistical analyses. Haseman and Hogan (22) and Haseman and Soares (23) investigated this issue using data from studies on dieldrin or dipterex. They concluded that individual malformed fetuses or fetal deaths did not result independently as experimental units and thus should not be regarded as the basic sampling unit. A method for data analysis of populations with greater variation than would be predicted by a simple binomial model (such as animal litters) has been recently published by Pack (24). This method takes both inter-and intralitter incidences of experimental effects into account. The accompanying paper by Chinchilli and Clark (3) also discusses this issue.

Summary of Workshop Discussions: Current Status and Future Directions
Short-Term Approaches A number of issues were discussed at the workshop and numerous questions were raised concerning future research needs. Approaches were proposed by Gaylor (2) and by Faustman et al. using the Rai and Van Ryzin model (1); either of these, or possibly some combination of the two, should be considered further. However, there are still problems to be addressed. For example, the models as presented use data from studies in which all pups within each litter have been examined for all types of effects. These data are available if one uses the fresh visceral dissection technique (25). However, in many laboratories using the Wilson technique (26), pups within litters are divided into groups for examination of visceral or skeletal defects. How does dividing litters so that not all types of data are available on all pups affect the risk estimates that are derived? Also, more work is needed on litter size considerations, how intralitter correlations may affect the models and the risk estimates and on the use of litter data from animal studies for extrapolation to humans, for whom single offspring are usually derived from each pregnancy. The accompanying papers by Chinchilli and Clark (3) and Butler and Kalasinski (4) address certain aspects of this issue.

Philosophical Considerations
There are also important philosophical considerations related to the way in which risk estimates are presented to the risk manager and to the public. For example, in the approach currently used, an exposure level is derived (Reference Dose, Acceptable Daily Intake) below which no increased risk above background levels would be expected, based on the current assumption of a threshold. However, in using a more quantitative approach, a risk estimate is derived (e.g., no greater than 1 in 104) that implies some level of residual risk even though the dose level may be the same as that obtained using uncertainty factors and assuming a threshold. The presentation of such risk estimates will require careful thought and some level of education. Also, scientists and health care specialists must consider what would be an acceptable level of risk, based on background levels and what is known about the biological processes underlying these phenomena. Currently used procedures of 100to 1000fold safety factors reduce the estimated risk levels to a range of no more than 1 in 1000 to 1 in 10,000 cases (27). Is this level of risk acceptable?

Policy Considerations
Obviously, there are also policy considerations to be made concerning the use of models. For example, when is enough information available to begin using an approach or a model? How much validation is necessary to begin using an approach? Is a threshold assumption too conservative for these types of effects? In other words, how often can we be certain of setting acceptable levels low enough to protect the majority of the population?

Future Research Needs
A number of points were raised in the workshop that require further research. Obviously, further testing of the models currently being developed is called for, using more data sets including those that are less robust than the ones previously used, and determining whether more dose levels would add significantly to the precision of the risk estimates from these models.
Secondly, an issue of great importance is the relationship among various end points (e.g., the relationship of maternal and developmental toxicity, the relationship among various end points of developmental toxicity, etc.), and how these relationships affect the type of dose-response models used and the estimations of risk. As Selevan and Lemasters (28) point out, end points observed at birth are the result of a long and complex process. The process yielding one type of adverse effect may result from a number of factors. A very high exposure could result in early fetal loss, while a lower one might result in a congenital malformation observed at birth. Thus, the probability of a congenital defect might fall with increasing exposure due to the increasing probability of fetal loss, resulting in an inverted U-shaped dose-response curve for the incidence of congenital malformations. This requires that assessment of the risk of malformations include adjustment for mortality. Thus, the patterns of these interrelationships is important in the development of appropriate dose-response models for developmental toxicity end points.
Third, the development of more standard approaches is needed for using pharmacokinetic data to determine dosimetry and to better define dose-response relationships. Although there has been much discussion about how pharmacokinetics would be useful in this respect (29), very little has been done to apply these techniques in the regulatory setting.
Fourth, heterogeneity within the human population in terms of sensitivity to toxic agents traditionally has been accounted for using a 10-fold factor in the risk assessment process. In the case of developmental toxicity, where estimates of the incidence of spontaneous abortion range from 30 to 80%, is it appropriate to assume that the variability within the human population really falls within an order of magnitude? Clinicians routinely make estimates about the risks for various types of malformations in children. Would the data on which these estimates are based be useful in determining the range of sensitivity of the human population?
Finally, the use of more biologically based models was discussed. Of course, in order to develop such models, a good deal of information is needed on the underlying biological processes and mechanisms of toxicity. Unfortunately, very little is known about mechanisms of developmental toxicity, but more information is being gathered about the underlying developmental processes involved [e.g., the restorative growth concept (14)1. How much of this type of information could be brought to bear on the development of biologically based models for developmental toxicity effects?
In summary, this workshop brought together reproductive and developmental toxicologists, epidemiologists, and statisticians to examine ways to improve estimations of risk for human developmental toxicity. Experience in the quantitation of carcinogenesis risks was considered in these discussions. As stated earlier, this workshop was seen as the first in a series of such efforts to focus attention in this area and to encourage research. It is hoped, this goal will be fulfilled and future workshops and symposia on this topic will see the development and application of approaches for the regulatory setting.