Developmental toxicology: status of the field and contribution of the National Toxicology Program.

The NTP has conducted developmental toxicity studies on more than 50 chemicals, often in multiple species. Several chemicals caused developmental toxicity in the absence of any toxicity to the mother. Although hazard to humans is determined by the level of exposure to the chemical and its inherent toxicity, those agents that selectively disturb the development of the conceptus are of particular concern because other manifestations of toxicity would not warn the mother of overexposure. Whether the LOAEL (lowest-observed adverse effect level) for maternal toxicity was high or low did not correlate with the potential of chemicals to cause developmental toxicity. The form of developmental toxicity that determined the LOAEL most frequently was decreased body weight in mice and rats, but not rabbits, where the LOAEL was determined more often by an increase in resorptions. Several in vitro and short-term tests appear promising as screens to predict the outcome of developmental toxicity studies in mammals. However, the only screens that have undergone formal validation studies are those evaluated by the NTP. Improvements in our ability to predict risk to humans have been limited by our knowledge of the mechanisms by which agents cause developmental toxicity. Thus, future growth is dependent on a better understanding of the biological processes that regulate normal development, therein providing the necessary framework for understanding mechanisms of abnormal development.


Introduction
Although submammalian species were used for many decades to evaluate the effect of environmental agents on the development of the conceptus, the use of mammalian species dates back only to around the turn of the last century (1). Experimental mammalian teratology as we have known it in the past several decades came into being in the early 1960s as a result of the thalidomide tragedy. The term "developmental toxicology" was first brought into published writings by James Wilson in the early 1970s (2). Developmental toxicology has evolved during the past couple of decades at the interface between teratology and toxicology. The primary contributions ofteratologists were the understanding of the anatomy and the embryology of normal and abnormal development and the clinical importance thereof. The contribution of the toxicologist was the use of information from animal models and human studies to predict the potential of agents to adversely affect development of the unborn, predicting on a population basis rather than an individual basis. The primary interest of the teratologist was the cause of major malformations, rather than the complete spectrum ofalterations from normal that might result from exposure to some endogenous or exogenous agent.
Thalidomide was clearly a classical teratogen in humans in that it caused major malformations, and did so at levels of exposure that were not associated with other toxic effects to the mother. Experience in toxicology and teratology laboratories during the years after thalidomide taught us that very few chemicals were classical teratogens in the sense of the thalidomide experience. Many agents, however, caused other forms ofabnormal development that were manifest as minor malformations or variations, changes in embryonic and fetal growth, the occurrence ofdeaths, and functional alterations as a result ofexposure during pregnancy. We also learned during the first couple of decades of intensively testing chemicals that the type of defect observed in the laboratory animals (rats, rabbits, mice, and hamsters) did not always predict the type ofdefect that might be found in humans. As a result, the absence of a major malformation in a laboratory animal study did not assure the absence of major malformations in humans. Thus, the decision was made that animal testing should not be limited to finding "other thalidomides" because any manifestation of developmental toxicity in animals may predict some type of abnormality in humans. The emphasis in animal testing, therefore, switched from looking for classic teratogens to looking for developmental toxicants, agents that would cause an increase in the occurrence ofany one ofthe four manifestations of developmental toxicity: death, structural abnormalities, altered growth, or functional deficits.
The first extensive description of the distinction between teratology and developmental toxicology and its impact on screening for agents that could potentially adversely affect human development was contained in the Guidelines for the Health Assessment of Suspect Developmental Toxicants pub-lished by the Environmental Protection Agency in 1986 (3). This document thoroughly addressed state-of-the-art toxicity test methods, provided an extensive review of a sound rationale for interpreting the results of developmental toxicity studies, and provided a useful perspective on the distinction between teratology and developmental toxicology.
The purpose of this paper is to comment on the evolving status of the field ofdevelopmental toxicology, to summarize the results of the developmental toxicology testing conducted by the National Toxicology Program (NTP) and discuss generalizations that can be drawn from that body of knowledge, and to describe what must be done for the field of developmental toxicology to continue to evolve.

Maturity of Developmental Toxicology
There are a number ofcriteria by which the evolutionary status or maturity ofany field of toxicology can bejudged. One major criterion is whether there are agreed-upon test methods for detecting the particular type of toxicity. In the case of developmental toxicology, we have been using essentially the same protocol for about 25 years. Thousands of chemicals have been evaluated using this protocol. Prior to thalidomide (1961), there were no official test guidelines for evaluating teratogenic or developmental toxic effects ofagents that might be submitted to a regulatory agency for approval. Reproductive studies were conducted that provided little confidence that an agent would not cause developmental toxicity.
After the thalidomide tragedy, a series of workshops by teratologists during 1964 and 1965 led to the establishment of criteria and protocols for testing agents for teratogenic potential. These principles of testing for teratogenic potential were captured in official guidelines of the Food and Drug Administration in 1966 (4). These guidelines, the product of a small number of insightful teratologists, have remained essentially unchanged since 1966. The emphasis has expanded, as mentioned above, to screen for developmental toxicity rather than just, teratogenicity, but the protocol is essentially as it was originally described. In his review in 1985, Schardein (5) stated that reports of the teratogenic potential of more than 2800 chemicals were found in the literature. By the time ofthis writing, surely the number exceeds 4000. Thus, there is considerable experience in many laboratories worldwide using this standard protocol for evaluating teratogenicity, and during the past 15 years, for evaluating developmental toxicity. The experience in measuring for functional deficits is, of course, much more limited. Thus, the field of developmental toxicology is mature from the standpoint of agreed-upon test methods, with less experience in measures of functional deficits compared to other end points of developmental toxicity.
Another measure of the maturity ofthe field of developmental toxicology is the identification of known human toxicants. About 50 drugs or chemicals have been associated in anecdotal or case reports or on the basis of epidemiological data with one or more of the four manifestations of developmental toxicity in humans (1). There are undoubtedly human developmental toxicants that have not been identified. Clearly, the number ofagents identified as developmental toxicants in laboratory animals exceeds those that have been identified as such in humans. Whether this reflects the high dose levels used in animal studies for identi-fying developmental toxicity, whether animals are inherently more sensitive to these manifestations across broad categories of chemicals, or whether the number of developmental toxicants in humans is seriously underestimated is not known. Likely, each ofthese factors is true to varying degrees for different agents. All of the agents that are confirmed developmental toxicants in humans are also developmental toxicants in laboratory animals. Thus, there is widely accepted evidence of adverse effects in humans attributed to agents that also cause developmental toxicity in laboratory animals.
Maturity ofa field can also be measured by the amount of experience with relevant animal models. As already mentioned, the published literature contains references to as many as 4000 agents that have been evaluated using a teratogenicity or developmental toxicity protocol. There is undoubtedly no other standard protocol for another end point of toxicity using an in vivo animal model (except, perhaps, for acute toxicity tests) where there is more experience. However, our depth of understanding ofthe test model is limited. The development of the embryo and fetus is so complex that we do not have a lot of detailed knowledge about normal development beyond morphological events that occur during gestation. Therefore, developmental toxicity data have been interpreted on an empirical rather than a mechanistic basis. In the absence of thorough knowledge of the normal physiological processes, it is difficult to understand what accounts for abnormal development following exposure to some agent. Current efforts of developmental biologists to improve our understanding of normal development at the molecular level will be extremely important to better understanding mechanisms of abnormal development. We have reached a plateau of understanding of our animal models that will likely not be exceeded by simply testing more chemicals.
Another measure of maturity is the level of sophistication of extrapolation of animal data to predict the possibility of adverse effects in humans. Developmental toxicity data are interpreted on the basis that there is a threshold. Because of our lack of understanding of mechanisms and the assumption that there is a threshold for developmental toxicity (unless there is clear evidence of a genetic basis), regulators have predicted human safety on the basis of calculated margins of human exposure relative to dose levels in animal studies associated with the absence of an adverse effect. As a result, decisions about safe levels ofexposure to drugs or chemicals relative to developmental toxic potential are not based on risk but are instead based on some uncertainty factor that isjudged to be adequate or not on an agentby-agent basis.
Thus, the maturity of our approach for extrapolating from animals to humans must take into account several points ofuncertainty. First, the etiology of the majority of developmental toxicity observed in humans is unknown. We have to assume that some unknown portion ofthat disease load is associated with exposure to chemicals acting as causative agents by themselves or in combination with other factors (nutrition, stress, etc.) or chemicals. Second, in the absence of chemical-specific mechanistic knowledge, we must assume that any adverse effect in animals predicts some effect in humans. Better understanding of our animal models and mechanisms of abnormal development will permit extrapolation between species with greater con-fidence and should permit the use of more sophisticated models for extrapolation that are mechanistically based.
Still another manifestation of the maturity of a field is its knowledge of mechanisms of action. As already mentioned, the great complexity of normal development has precluded understanding the mechanisms by which most toxicants adversely affect development. Critical steps in the development of abnormalities are understood to some extent and in some cases the proximate toxicant is known, but this is knowledge of the mode of action rather than the ultimate mechanism of action.
In summary, the field of developmental toxicology is quite mature in the sense of having standardized protocols for evaluating toxicity, protocols that have been used for thousands of chemicals except for functional end points where there is less experience. Thus, toxicologists and teratologists in many laboratories worldwide have considerable experience with the protocol, the background incidence ofabnormalities, and interpreting the results of such studies. However, our minimal knowledge of the mechanism of action of teratogens and developmental toxicants limits the level of sophistication of interpretation and extrapolation of data from such studies.

Interpretation of Developmental Toxicity Data
To assume that after 25 years of experience using a standardized protocol for evaluating developmental toxicity there would be commonality within the field about the interpretation of the results of such studies would be a mistake. While there is consensus on many issues, there are still disagreements about the interpretation of some subtleties ofthe data. The disagreements stem from several factors, including the amount of dependence on statistical procedures, whether one is looking at the data for evidence of a teratogenic response or developmental toxicity, and whether one is looking for a specific type of effect or all effects considered collectively. A workshop was held at the National Institute of Environmental Health Sciences (NIEHS) in June 1991 to discuss the relative importance of the various data components of a developmental toxicity study, both maternal and developmental end points, how to interpret changes in those end points, and how to integrate the developmental and adult toxicity data to reach a conclusion about potential harm to humans. Participants were selected on the basis of experience from within a broad spectrum of backgrounds including clinical, basic research, regulatory toxicology, statistics, etc. The major criteria that were discussed include the ratio of the no-observed adverse effect levels for adult and developmental toxicity (the A/D ratio) potency, knowledge of the pattern or type of effect, the existence of species concordance, and the extent of mechanistic knowledge. The participants concluded that many criteria need to be considered when interpreting the results of developmental toxicity studies and that no single criterion such as the A/D ratio or potency was a sufficient basis for interpretation of the data. As with many other manifestations of toxicity, potential hazard to humans was considered to be a function of human exposure to an agent and its developmental toxicity, the latter reflecting the inherent potential of a substance to cause an adverse effect on the developing conceptus under some defined condition. All of the criteria discussed in the workshop were considered to be of importance in characterizing developmental toxicity, the relative importance of individual criteria being a function ofthe question under consideration (6). Future progress in understanding mechanisms of developmental toxicity will likely help to refine the process of identifying potential hazards to humans.

Summary of NTP Developmental Toxicity Studies
This section is intended to provide a thorough review of NTP developmental toxicity studies conducted at the NIEHS. In addition to providing a brief summary of all of the studies, several generalizations regarding the complete database will be discussed. Particularly, the following questions will be addressed related to chemicals: For which chemicals was developmental toxicity observed in the absence of maternal toxicity? What chemicals affected multiple end points (death, malformations, variations, growth retardation)? Was there any segregation ofdevelopmental toxicity according to the LOAEL for maternal toxicity? Other questions will be addressed related to the animal models and interspecies comparisons: What developmental toxicity was observed at the LOAEL? What additional developmental toxic effects were observed above the LOAEL? What is the interspecies predictiveness?
The NTP database consists of 85 studies on 50 chemicals: 32 in rats, 39 in mice, 13 in rabits, and 1 in hamsters. Except for six of the chemicals that were tested within the NIEHS facility, all other studies were conducted in two laboratories outside of the NIEHS. The inhalation studies (except for the arsine studies) were conducted at Battelle Pacific Northwest Laboratories, Richland, Washington, under an interagency agreement with the NTP. The noninhalation studies (except those done at NIEHS) were conducted in the laboratories ofthe Research Triangle Institute, Research Triangle Park, North Carolina. All chemicals were tested between the years of 1980 and 1991 in American Association for Laboratory Animal Care-certified laboratories under Good Laboratory Practices conditions. The protocol for conducting the studies and interpreting the presence of alterations from normal was uniform among the three laboratories, although individual studies were tailored to specific questions on individual chemicals. Within the three laboratories, the same key personnel were involved in the studies throughout the entire experimental period.     It must be made clear that the 50 chemicals were not selected on a random basis and are, therefore, not representative of the universe of chemicals. Chemicals were selected for testing on the basis of human exposure, preexisting data suggesting that these may be developmentally toxic in single species or in in vivo or short-term screening data, structure-activity considerations, or a specific request for a developmental toxicity study by a government agency. Therefore, any generalizations drawn from the results of studies on these 50 chemicals may not apply more generally.
The route and mode of exposure of these 85 studies is summarized in Table 1. Compared to the universe of chemicals, a disproportionately high proportion ofthe chemicals tested in the NTP studies were water soluble and were given in an aqueous vehicle.
The accounting of studies conducted through the NIEHS is summarized in Table 2. The NTIS number is provided for anyone who wants to obtain a copy ofthe study report from the National Technical Information Services.* Studies that were conducted in-house at NIEHS do not have an NTIS number as their study report consisted of a published manuscript which is cited. Both the NOAEL (no-observed adverse effect level) for adult and developmental toxicity are provided along with the LOAEL (lowest-observed adverse effect level) for adult and developmental toxicity to emphasize the importance of the dose selection on the determination of these two numbers. In those studies where developmental toxicity was observed at some dose level, the effect that determined the LOAEL is identified along with any additional toxicities that were observed above the LOAEL. All of the raw data from these studies reside in the NTP Archive in Research Triangle Park, North Carolina. Several special studies are not included in Table 2 because of the design or purpose ofthe studies. Included among those not listed, for example, is a study on 1,1, l-trichloroethane to evaluate the repeatability ofa cardiac malformation reported in the published literature. This study did not include a complete evaluation of other possible alterations beyond the cardiovascular system (7). A comparison ofthe control data using distilled water versus com oil as vehicles has been reported (8). The initial comparison of these two vehicles suggested some statistically significant differences. A subsequent study conducted at one time using large numbers of animals failed to repeat the observation that was suggested based on a collection of control groups from individual studies (9).
The criteria by which the NTP interpreted the results ofthese studies are consistent with those identified earlier as the conclusions of the NTP workshop held to review the criteria for interpreting developmental toxicity studies. A major consideration was the presence of developmental toxicity in the presence or absence ofmaternal toxicity. The results ofeach study have been analyzed on the basis of statistically or toxicologically significant increases in the occurrence of fetal deaths or resorptions, changes in fetal body weight at the time of Caesarean section, or significant increases in the incidence of malformations or variations. The presence ofchanges in these end points was considered relative to the presence of maternal toxicity. Thus, each study has been classified into one of four categories on the basis of the outcome summarized in all dose levels so that NOAELs were not determined.
The specific protocols used for the developmental toxicity studies varied somewhat from study to study and between laboratories, but the basic protocol was designed to have at least three treated groups and one control group and at least 20 pregnancies, regardless of species, in each group. Pregnant females were killed 1 day before the expected delivery date for each species, and the number ofcorpora lutea, implants, live and dead fetuses, resorptions, and fetal body weight were recorded at the time of Caesarean section. External malformations and variations were recorded based on examinations at the time of Caesarean section. Fetal sex was determined on the basis ofexternal or internal genitalia. Examination for visceral malformations was conducted according to the method of Staples (10). All fetuses of each litter were examined for visceral malformations and variations and were subsequently cleared and stained for examination for skeletal alterations. All studies conducted at Research Triangle Institute were done using a replicate design. Data were analyzed for the presence of pairwise differences from control as well as the presence of trends.
Those chemicals that caused developmental toxicity in the absence of any significant maternal toxicity are summarized in Table 4. The number of species tested for each chemical varied. The number of species in which developmental toxicity was observed in the absence of maternal toxicity is indicated by the column heading. For example, studies were conducted on boric acid in three species, two of which showed developmental toxicity in the absence of maternal toxicity. Regarding the possibility that exposure of humans to these or other chemicals might represent a potential risk, it is important to reiterate that potential risk is a function of both exposure and the ability of the chemical to produce developmental toxicity. Among those chemicals that have some propensity to cause developmental toxicity, it seems logical that chemicals such as thalidomide that tend to cause developmental toxicity in the absence of any adverse effect on the mother represent somewhat more ofa concern than agents that are developmentally toxic only at levels of maternal exposure that cause other manifestations oftoxicity except in those cases where humans are exposed at dose levels toxic to the adult.
Another consideration regarding the nature of the developmental toxic effects caused by different chemicals is the profile oftoxic effects observed. Disregarding the dose level at which effects might be observed, it seems intuitively important to be more concerned about agents that cause a significant increase in a variety of alterations as opposed to a change in only one end point because of the multiplicity of end points involved with multiple effects and the increased probability that some mechanism would be operative in humans. Those chemicals that cause significant increases in death, growh retardation, malformations, and variations are summarized in Table 5. Interestingly, five of the seven chemicals on this list are also among those that caused developmental toxicity in the absence of maternal toxicity (Table 4).
A review ofthe LOAELs for maternal toxicity listed in Table  6 reveals a wide range ofnumbers for the 50 chemicals. This is true for inhalation and other routes ofexposure. In addition, for some ofthe chemicals that were tested in multiple species, there were significant species differences in the maternal LOAELs. For example, in the case ofacetone by the inhalation route, the maternal LOAELs differed roughly by a factor oftwo for mouse and rat. In the case of boric acid, the difference was much greater, with mice being much more tolerant than rats ofrabbits. There was a 5-fold difference in maternal LOAEL for inhaled 1 ,3-butadiene. A similar but reversed difference existed for inhaled n-hexane, with mice being much more tolerant than rats.  With this wide range in the maternal LOAELs, which deter-studies according to the maternal LOAEL. Chemicals are mines the highest dose level in the experimental toxicity segre-grouped according to those with a LOAEL for maternal toxicigated according to the range of maternal LOAEL. In other words, ty of less than 50 mg/kg, or LOAELs in the range of 50-500 was there a predilection for developmental toxicity based on the mg/kg, 500-1000, or greater than 1000 mg/kg. Table 8 sumpotency for maternal toxicity? The information presented in marizes an analysis of these data. Chemicals that caused Table 7 summarizes the outcome of the developmental toxicity developmental toxicity in the absence of maternal toxicity or those agents that were clearly not developmentally toxic (no developmental toxicity in the presence of significant maternal toxicity) were distributed across the various categories of maternal LOAEL. Thus, selective effects on the developing conceptus did not segregate according to maternal LOAEL, suggesting a lack of association ofdevelopmental toxicity potential and that for other measures of maternal toxicity used to select dose levels.
In addition to the above discussion that focused primarily on chemical-specific developmental toxicity, several points warrant discussion regarding the animal model used for these studies and comparisons between species. Because of known species differences in the profile of spontaneous and induced alterations in development and because of the difference in the length oftime between the last day of chemical administration and the time of Caesarean section, one would expect to find species differences  in the profile of developmental toxicity observed. Thus, the results of these studies have been summarized according to species regarding the developmental toxicity observed at or above the LOAEL ( Table 9). The analysis ofthis information by species is summarized in Table 10. As expected for mice and rats, the observation that determined the LOAEL the greatest percentage oftime was a decrease in fetal body weight. The number of times the LOAEL was determined by an increase in the incidence of variations in all three species is noteworthy because the LOAEL was determined by an increase in the incidence of variations in one study in each species (3-10% of the studies with some developmental toxicity). The importance ofthe profile of developmental toxicity observed above the LOAEL versus the effect observed at the LOAEL (Table 9) probably deserves more consideration than has been customary. For certain chemicals, the only effect observed at the LOAEL was, for example, a significant decrease in fetal body weight. At higher dose levels, other end points were "recruited" as part ofthe profile oftoxicity. With other chemicals, the effect observed at the low dose level occurred to a progressively greater extent at higher doses, but there was no recruitment of other end points. Intuitively, this would seem to say something about the potential for developmental toxicity in other species. Further analysis of this idea would seem to be warranted.   aOutcome is reported as positive or negative developmental toxicity/positive or negative maternal toxicity.
We have also questioned whether fetal body weight or fetal body weight plus resorptions would predict the outcome based on analysis ofall end points ofdevelopmental toxicity studies. If this were true, the expensive investment of resources to conduct the analysis for malformations and variations would not be necessary in an initial screen, since concordance between species is low and the usefulness ofthe data would not be compromised. The result of this analysis is summarized in Table 11. Unfortunately, a decision on the basis ofthe combination ofthe change in body weight or incidence of resorptions did not identify all of the agents in which developmental toxicity was detected. This ranged from 75% ofthe agents in rats to 86% in mice. A significant percentage of the agents caused malformations and variations without any change in fetal body weight or resorptions.
The NTP data set is not optimum for answering the question of how well one laboratory animal species predicts the outcome for other animal species because studies were not necessarily conducted in two or three species. In many cases, the chemicals were selected because one species/study was already reported in the literature, and we expanded the database to include an additional species. However, the best data we have for comparing the species responses is presented in Table 3, where the studies that were conducted in multiple species are summarized according to the outcome category. The most revealing observation is the number ofchemicals that were found to be selective developmental toxicants in more than one species. This includes boric acid, ethylene glycol diethyl ether, and ethylene glycol monobutyl ether. The number ofchemicals that were in the category where there was no developmental toxicity in the presence of significant maternal toxicity was greater, including acrylamide, arsine, bisphenol A, diethylene glycol diethyl ether, and diphenhydramine. Definitive conclusions about interspecies predictiveness would require review of all available data on a larger number of chemicals.

Status of in Vitro Teratology Screens
In vitro terwology assays have the potential to be ofgreat utility both in mechanistic and screening applications. However, the Abbreviations: NC, not classifiable; LOAEL, lowest-observed adverse effect level. aOutcome code reported as positive or negative developmental toxicity/ positive or negative maternal toxicity. hTwo times daily. 'LOAEL was highest dose in nine studies; no developmental toxicity was seen in seven studies.
'LOAEL was the highest dose in five studies; no developmental toxicity was seen in three studies. 'Percentage (number of studies with no developmental toxicity/total number of studies). considerable effort using in vitro assays to better understand mechanisms of action of teratogens and developmental toxicants has had limited success because of the complexity of embryogenesis and the multiplicity of mechanisms ofabnormal development. What we know about mechanisms has largely been gained through such studies, but we still know very little about mechanisms of abnormal development. The use ofin vitro systems for screening large numbers of untested chemicals has not been a panacea either. Very few screens have been validated and none is being used for wide-scale screening ofchemicals from diverse classes of chemical structure or function. The only assays that have undergone rigorous evaluation in independent laboratories are two examined by the NTP: the human embryonic palatal mesenchymal cell growth inhibition assay and the mouse ovarian tumor cell attachment inhibition assay (11).
A workshop was sponsored by the NTP in 1989 to reevaluate the need for and use of in vitro teratology assays, to examine the validation process for in vitro tests, and to discuss progress in the validation of in vitro teratology screens. A summary of this conference has been published (12). The participants of this conference enthusiastically supported further development of shortterm in vivo and in vitro systems both as prescreens for developmental toxicity and as experimental systems to explore mechanisms ofaction oftoxicants. Several industrial laboratories have developed in vitro screens for assaying particular families of chemicals where a combination of in vivo and in vitro developmental toxicity information is already established. The in vitro screens are used to characterize other members in the family of chemicals. There was general agreement, though, that too few in vitro teratology prescreens have been evaluated under multiple-laboratory conditions with common, agreed-upon test agents to draw firm conclusions regarding the merit and reproducibility of in vitro teratology prescreens. There was strong endorsement of the need to develop an updated reference list (gold standard) of chemicals of known developmental toxicity potential to enhance further development and validation of prescreens. This latter recommendation has been pursued by the NTP through the formation ofa committee to develop such a new reference list.
Summary Observations on the Status of Developmental Toxicology and Role of the NTP In the context of the maturation ofthe field of developmental toxicology over the past 20 years and the role ofthe NTP over the past 11 years, several points warrant further discussion. First are comments on some specific chemicals, then on animal models, and observations about specific contributions by the NTP.
The toxicity caused by certain chemicals within the NTP database is noteworthy because of the severity and the nature of the response. For example, administration of boric acid to mice caused a 33 % decrease in fetal body weight at a concentration in feed (0.4%) that provided a dose of about 1 g/kg/day. At the oral maximum tolerated dose to rabbits (250 mg/kg/day), there was no effect on fetal body weight. In contrast to no response in the rabbit and a modest response in the mouse, there was a 53 % decrease in fetal body weight in litters of rats given boric acid in feed at a concentration (0.8%) that provided a dose of 539 mg/kg/day. Thus, for the end point of decreased fetal body weight after administration orally during major organogenesis, there was a significant species difference in response, with the presence of an unusually severe response in the rat. (Boric acid may represent a useful model chemical for studying growth retardation.) Another interesting response is the severe and divergent effects of 2 ',3 '-dideoxycytidine and 2,3,4,7,8-pentachlorodibenzofuran (PCDF) in C57BL/6N mice. Although both are true teratogens in the sense of causing an increase in structural malformations, the profile was widely different. Cleft palate and dilated renal pelvis were observed in nearly 100% of litters and pups at dose levels of PCDF that caused little or no maternal toxicity (these pups were examined only for cleft palate and dilated renal pelvis, plus resorptions and weight, because TCDD and the structurally related tetrachlorodibenzofuran caused no other structural alterations). In contrast to this very selective toxicity caused by these dioxins and furans, 2 ',3 '-dideoxycitidine caused a significant increase (again about 100% of litters and fetuses) in a wide variety of malformations representing many organ systems, but did not cause a significant increase in cleft palate or renal lesions. There was only a marginally significant increased trend for cleft palate. The importance of this difference in profile of malformations caused by these two potent mouse teratogens for risk assessment considerations is unclear and awaits more mechanistic research.
Another subset of chemicals that is unique includes those that caused developmental toxicity not only in multiple species, but also in multiple end points in those species. This includes boric acid, diethylene glycol dimethyl ether, diethylhexyl phthalate, and ethylene glycol diethyl ether. Not all of the chemicals tested by the NTP were evaluated in multiple species, and this review does not take into account other published literature, but from our database, these four chemicals caused a developmental toxicity response that raises more concern about potential risk than many other chemicals.
The potential correlation between the ability of a chemical to cause nonreproductive toxicity and to cause developmental toxicity was important to the NTP as it relates to our mission to characterize the toxicity of chemicals. If there was a correlation, it would have been a useful guide to help us discover new, previously unidentified developmental toxicants of particular public health concern. In the absence of such a predictive criterion, other factors will be used to select chemicals for testing.
In a search to reduce the cost of testing by eliminating nonproductive portions of the standard developmental toxicity protocol, we confirmed that no single end point of toxicity predicts a response at all other end points. Thus, a protocol that includes measures of body weight, resorptions, and the incidence of malformations and variations appears to be necessary for broadscale screening of chemicals.
Identification and quantification of maternal toxicity in these NTP studies was more thorough and consistent over the years than for any other publicly available collection of developmental toxicity data. Despite this background of data and experience, subtle measures of maternal toxicity continue to challenge interpretation on a consistent basis. For example, transient and reversible pharmacologic effects may or may not be considered evidence of toxicity. Changes in organ weight consistent with physiological adaptation may be statistically significant but not of toxicological importance. These and other findings that tend to be chemical specific and of uncertain toxicological importance continue to be interpreted on a chemical-by-chemical basis.
Lastly, the data and experience of the NTP have played a focal role in decisions that have affected the whole field of developmental toxicology. NTP test results continue to support regulatory decisions and the data help provide bases for regulatory test guidelines and risk assessment guidelines. Workshops sponsored by the NTP have addressed critical issues and have provided a neutral and scientific arena to resolve divergences in the field and to foster discussions of directions and priorities for the field. Recent workshops include one on validation of invitro teratology screens (September, 1989), another on the interpretation of SegmentII test results (June, 1991), and most recently one on lactation as a target for chemical-induced toxicity and as a means of neonatal exposure to toxicants (March, 1992).
Consistent with its charter, the NTP has made significant contributions in the area of methods development and validation in the area of developmental toxicity screens. Through the National Institute for Occupational Safety and Health component of the NTP, the Chernoff-Kavlock test was rigorously evaluated, including a workshop to pull together data and experience with the test. Drosophila is being evaluated as a potential screen. Two in vitro systems were the subject of formal validation studies by the NTP, the only developmental toxicity screens to receive such an intensive evaluation. To further guide validation efforts in the future, the NTP has organized a comittee of experts to develop a new list of reference chemicals to provide focus to this important process.
The NTP database has served as a valuable resource in recent efforts to develop better methods to analyze developmental toxicity data. Because of the completeness of the data, the consistency of the protocol and quality control over the years, and the public availability of the data, the database has been used by academic and regulatory scientists to evaluate the use of the benchmark dose approach and otherways to model developmental toxicity data.
The NTP has been able to conduct studies that are important to the field but would never be conducted by the private sector and would not be supported through grant mechanisms. For example, the NTP conducted retrospective and prospective studies to determine if there was any toxic effect from the use of corn oil as a vehicle.
In summary, NTP data and scientists have served an important role in the continued evolution of the field of developmental toxicology. Future plans of this national program assure this as a continuing role.

Future Directions in Developmental Toxicology
Although considerable protection ofpublic health has been afforded through the work that has been done during the past 25 years, future progress is dependent on several main areas of further research and model development.

Information Related to Animal Models in Current Use
As mentioned earlier, developmental toxicity studies are currently interpreted on an empirical basis because of our lack of understanding of the processes involved in normal development and a lack of understanding of mechanisms of toxicity. To go beyond an empirical observation, we must better understand the biological basis of the physiological processes involved in normal embryonic, fetal, and postnatal development. Until we understand normal development, we will not be able to understand mechanisms of abnormal development to any great extent. Modern tools of developmental biology and molecular biology provide a new opportunity to better understand normal development. The field must incorporate these techniques to take a significant step forward beyond the empirical interpretation of our studies.
Another area of work that would enhance our ability to use the information from animal studies in predicting potential hazards to humans would be a more thorough evaluation of the known human positive agents using standard animal models. Many of the agents that we consider to be developmental toxicants or teratogens in humans have not been thoroughly studied using standard developmental toxicity protocols. Once confirmed as a human toxicant, studies by most investigators are limited to the specific area of research interest ofthat investigator. As a result, much of the work involves follow-up studies on a specific malformation and attempts to mimic the effects observed in humans in an animal model. Thus, we have limited data to evaluate the predictiveness of the type of response in a developmental toxicity study for that observed in humans. This could be accomplished by conducting standard developmental toxicity studies on those agents known to cause toxicity in humans. Such data would permit the first substantive evaluation ofthe questions of interspecies concordance of toxic effects and relative sensitivity of species based on potency and diversity of toxic response. For example, the NTP studies confirm the high frequency of growth retardation in mice and rats as the effect which determined the LOAEL. Is this a function of the short time between the last dose of chemical and the time of Caesarean section and is, therefore, unique to rodents, or is it predictive of toxicity in humans?
Further work must also be done to evaluate the importance of confounders (adult toxicity, altered food or water consumption, etc.) on the interpretation of developmental toxicity studies. The importance of these confounders in,the interpretation of data, along with the importance of variants, remains a source of disagreement in the interpretation of test data.
More quantitative methods for analyzing test data are clearly required to better predict the extent and type of risk for humans. However, in the absence of mechanistic understanding, it will be difficult to make major steps forward in these quantitative methods ofpredicting risks. An interim step is being taken to at least use more of the database upon which to make safety decisions, based on the derivation ofbenchmark doses (13). This approach still involves the use of uncertainty factors and does not predict risk per se, but is definitely a step in the right direction to incorporate the slope of the dose response curve rather than determine safety simply from a NOAEL.
Clearly, the field also needs to identify markers of effect and susceptibility. To move the field ofepidemiology forward into the realm of molecular epidemiology requires the development of sensitive markers to better define exposure, to improve our prediction ofpregnancy outcome, and to help identify sensitive subpopulations of people.

Chemical-specific Information
We need to continue to define mechanisms and modes of action, the site of action, and identification of proximate toxicants to better understand chemical-induced developmental toxicity. Major steps forward in extrapolation between species will only come with increased knowledge of target-site dosimetry. Our definition ofexposure as currently used for most studies (mg/kg given orally or applied to the skin, ppm or mg/m3 in inhaled air) are poor surrogates for dose. Much of the confusion in the literature about extrapolation between species is probably attributable to our poor definition of dosimetry and inadequate scaling efforts. The relative importance of peak blood level versus area under-the-curve considerations are definitely important to pursue. Our ability to predict on the basis of structure activity and reactivity is very limited in the field ofdevelopmental toxicology. Predictions within chemical families, such as glycol ethers, are relatively accurate but predictions across families of chemicals are poorly founded.

New Test Methods
It is clear that better screens would be helpful in prioritizing agents that should be tested in more definitive protocols as confirmed at the NTP workshop on in vitro methods. This would consist ofin vitro methods, test systems using alternate species, as well as improved short-term mammalian tests. We need animal models that are specific for discrete parts of the complex biological processes that account for normal development. These models should be selected to incorporate specific points of vulnerability in development, points that would predict the outcome of the more complex developmental toxicity screens that are a composite of all of these parts of development.

Exposure Parameters
Additional work needs to be conducted to refine our ability to measure internal doses at critical sites. Further work must be done to identify sensitive subpopulations and other factors that account for observed differences in interindividual susceptibility. Because risk assessment involves scaling factors that probably vary by age and end point, additional work to provide a better basis for these scaling factors is warranted. Also, risk assessment will continue to require extrapolation between routes of exposure, an area where data are frequently very limited to permit informed extrapolation across routes.
Many people from several different laboratories made significant contributions to the conduct ofthe studies summarized in this paper. Particular thanks must be given to Carole Kimmel, Richard Morrissey, and Jerrold Heindel for their role on behalf of the NTP in all stages ofthe studies. Special thanks are also given to Catherine Price, Julia George, and Rochelle Tyl (Research Triangle Institute), and Terry Mast (Batelle Pacific Northwest Laboratories) who served as the capable study directors for most of these studies. Recognition must also be given to the technical crews who did the hands-on laboratory work and also the chemists, statisticians, and laboratory animal veterinarians who helped with the studies and reporting of the results. Thanks to Judy Bullard for typing this manuscript and tables.
David Rall, to whom this issue of Environmental Health Perspectives is dedicated, deserves special thanks for his consistent support ofthe efforts of the Reproductive and Developmental Toxicology Program ofthe National Toxicology Program.