Identifying environmental factors harmful to reproduction.

Reproduction is essential for the continuation of the species and for life itself. In biological terms, living and reproducing are essentially one and the same. There is, therefore, no sharp division between identifying factors harmful to reproduction and identifying factors harmful to life or vice versa. Detection of harmful factors requires balanced use of a variety of methodologies from databases on structure-activity relationships through in vitro and in vivo test systems of varying complexity to surveys of wildlife and human populations. Human surveys provide the only assured means of discriminating between real and imagined harmful factors, but they are time consuming and provide information after the harm has been done. Test systems with whole animals provide the best prospects for identifying harmful factors quickly, but currently available methods used for testing agrochemicals and drugs need a thorough overhaul before they can provide a role model. Whether there is a need for new methodology is doubtful. More certain is the need to use existing methodology more wisely. We need a better understanding of the environment--whatever it is--and a more thoughtful approach to investigation of multifactorial situations.


Introduction
Reproduction ensures the continuity of the species, the race, the family, and of life itself. Living and reproducing are so interwoven that mammals, including humans, are structured physically, physiologically, and even psychologically to ensure efficient reproduction. The complexity of this interrelationship is such that separation of effects on reproduction from those affecting life in general is an artifice. Separation is a device of intellectual convenience that, if persued too earnestly, can be misleading. It can be even more misleading to attempt to place different aspects of reproduction into distinct categories. For this reason, this paper may stray outside anticipated borders.
Concern about factors affecting reproduction is innate and inevitable. Concern about effects on reproduction, for which there is no immediate explanation, is even more inevitable. In trying to identify causes for unexplained effects, it is inevitable that society would look for any harmful factors in the environment. It is also inevitable that the objectivity of the search will be confounded by emotion and instinctive fears.
Huntingdon Research Centre Ltd., P.O. Box

Human Surveys
As to how we may detect and identify harmful factors, the first and most obvious method is direct observation of the reproductive outcomes in human populations. In earlier centuries these observations were often speculative and highly subjective. This often led to erroneous conclusions, such as attributing malformations to visitations from demons, or to nefarious sexual practices. Nowadays, attribution to the witch's curse would be unbelievable but, nevertheless, the witch hunt continues with chemicals and pharmaceuticals being the target. The continued legalistic attacks on the drug Bendectin provide a prime example of how primitive fears can override logic in even the most advanced societies (1)(2)(3)(4).
Ultimately, for all its difficulties, direct observation of humans provides the only certain means of identifying factors harmful to human reproduction. Nevertheless, even with modern technology, it may take years rather than months to obtain certain results, and it still equates to "shutting the stable door after the horse has bolted."

Wildlife Surveys
Another means of identifying environmental factors that might affect human reproduction is to monitor wildlife populations. Species at the end of food chains could be especially sensitive indicators of adverse environmental situations. Such species are perpetually close to the limit for survival, and the population density can be markedly affected by an effect on any of the species in the food chain.
An example often cited (5-7) is the effect of the persistent chlorinated hydrocarbon insecticides on populations of raptors, such as the kestrel, and on populations of small mammals and insectivores. Discontinuing the use of these insecticides has led to a marked recovery in populations of these species in countries such as Great Britain.
Unfortunately, in some other parts of the world, discontinuing insecticide use also provided a reprieve for malaria carrying mosquitoes, with adverse consequences to humans. Also, replacement of DDT with more acutely toxic insecticides caused a number of deaths. Ironically, it is debatable whether human reproduction was ever affected by these insecticides.
Monitoring wildlife populations is notoriously difficult.
It requires years of observation to avoid confusion with natural fluctuations in population density. It is available only for those countries that can afford it for the protection of wildlife itself. The most serious drawback would be the lack of understanding of both the vagaries of studying animal populations and of toxicology. At best, monitoring wildlife populations may provide a source of hypothesis generation regarding harmful factors in the environment.

Testing with Model Systems
A third method of identifying adverse environmental factors is testing with model systems. The attraction is that testing offers the prospect of early identification, with greater economy, compared to surveys of human or animal populations. Testing offers the prospect of identifying adverse factors before human populations are seriously affected, not after. The confounding factors that bedevil the epidemiologist can be controlled to a considerable extent. The question is, can these prospects be realized in practice? This is the question I attempt to address here.
I would like to say that we have established an efficient test systems, but this would not be credible. The confounding factors of epidemiology are simply replaced by confounding factors of a different type. Identification of factors before humans are affected is feasible only for new factors. What about the many unexplored factors that already exist? For established environmental factors, we have to consider that the effects may be subtle and difficult to detect. Already, the more obviously harmful factors have been identified by direct observation of the human condition. These known factors would certainly include radiation, disease, pestilence, famine, malnutrition, ignorance, and poverty. Perhaps we should be a lot better at alleviating these more obvious harmful factors before adding further, more subtle ones.
We also have to ask the question of what test systems would be appropriate. I do not think that there has been any concerted effort to devise a test strategy specifically addressing the problems provided by the environment. What is the environment, anyway? The world provides us with an infinite variety of ever changing environments, but many people do not appear to be able to see beyond the factory wall, or even the laboratory bench. Whatever an individual's concept of the environment is, it is still a dynamic complex of interacting factors. Danger may lie not in the presence of a factor but in the absence of one, for example, the absence of a trace element or nutrient. Danger may lie in an imbalance rather than the presence or absence of any one factor, and it may lie in the change rather than the situation at any one time.
Against this dynamic, multifactorial situation, most test systems are aimed at determining the effects of a single factor in a defined set of conditions. Perhaps more to the point, attitudes and philosophies have become conditioned to this mental "environment," and they may not be the most appropriate for testing environments.
In the absence of a specifically designed strategy for environmental factors, the most obvious role models are the regulatory tests required for chemicals and pharmaceuticals (8)(9)(10)(11). In their favor, most, if not all, of the materials known to be harmful to human reproduction have shown adverse effects in these test systems. Most certainly, these testing methods have prevented the addition to the environment of some potentially harmful materials.

Negative Aspects of Current Guidelines
Conversely, on the debit side, the test systems also have prevented the introduction of materials that would have been beneficial to humans. The problem is that there is no reliable measure of the proportion of harmful materials avoided to the proportion of beneficial materials lost. It seems a serious omission that we have no suitable mechanism to monitor the efficiency of these regulatory test methods and no suitable mechanism to control the idiosyncrasies of those who operate them. There is strong circumstantial evidence to indicate that, in the absence of a reliable reference point, testing methods for chemicals and pharmaceuticals have become inefficient, uneconomical, and, at times, ridiculous.
For example, for international registration of a new drug, it is not uncommon for more rats to be used for testing for effects on reproduction than for all other toxicity requirements combined. Manson (11) has quoted use of more than 6,000 rats as a general figure; numbers of 10,000 or more have been used in some cases. I know of at least one case where even more animals have been used as the manufacturer has striven to conduct guideline tests in rats when the rat was an unsuitable model for humans. Testing on this scale is unreasonable. Clearly, for testing all the unknowns in the environment it would be untenable.
In the absence of a reliable check system, guidelines also have accumulated a number of unvalidated and doubtful procedures (8,12,13) such as: a) the requirement for prolonged (9-10 weeks) premating of males, b) standardization or culling of litters, c) random selection of secondgeneration offspring, d) inclusion of interim sacrifices for fetal examination, and e) inadequate specifications for choice of dosages, etc. Explanation as to why these procedures can be considered as flaws is beyond the scope of this paper.
Also, in the absence of a check system, it would appear that many people have lost sight of the primary purpose of 20 testing, namely, to detect materials potentially hazardous to reproduction. The scientific content has been diminished, and, instead, testing has become an expensive game for administrators, lawyers, and officials. The fundamental similarities between the aims and scope of guidelines has been neglected. The small differences between them have been exaggerated and exploited. Quite clearly, if existing guidelines are to be used as a role model for testing environmental factors, it would be necessary to eliminate these negative features and get down to basics (8,(12)(13)(14)(15)(16).

Fundamentals
If we look at the basic similarities of these guidelines, they all attempt to cover all aspects of reproductive toxicity (Fig. 1). This is achieved either in one or in a combination of studies. Reproductive toxicity is often subdivided into two elements. One element concerns effects on the fully mature, functional adult, the other concerns effects on the developing organism. As mentioned earlier, this is a separation of convenience because the two elements are indivisible components of an integrated process.
For various reasons, most attention is given to effects that may be induced during the period of development from conception through puberty (Fig. 2). During this time the sensitivity and response to the environment may differ quite markedly from that of the mature, functional adult. It is important to note that development continues well after birth. Organic lead provides an example of a substance for which exposure during infancy is most to be feared (17) and for which the consequences may not become apparent until later in life. Another example is diethylstilbestrol, for which exposure during perinatal stages of development results in delayed manifestations such as reproductive tract tumors of females appearing at puberty (18) and psychological disorders becoming apparent in early adult life (19).
The full scope of developmental toxicity, particularly the late manifestations of functional deficit, is not fully appreciated. For historical and psychological reasons, most attention is given to prenatal effects evident before or at birth. The greatest preoccupation is with localized effects on growth that result in birth defects. This concern with birth defects (teratogenesis) prompted the introduction of guidelines for reproductive toxicity and has greatly influenced their evolution ever since, although not necessarily for the better.
The emphasis on teratogenesis is completely out of proportion relative to the chance of occurrence. For many reasons teratogenesis is the least likely manifestation of developmental toxicity, and this is reflected in the low prevalence of coincidental malformations in all mammalian species (Fig. 3). Even magnitude increases in prevalence would have little impact on the viability of populations.
It is the rarity of malformation that causes devastating shock to the family. It is the shock that induces fear and dread that distorts rational assessment. The rarity of malformations makes for extreme difficulty in assured detection of an increased prevalence by direct observation. In fact, in testing, the indirect method of observing other manifestations that always associate with teratogenesis provides a more assured means of detection and discrimination. These other manifestations such as death, altered weight and increased prevalence of minor structural changes are also important in their own right. If efficient testing systems are to evolve, we need to arrive at a better balance between the real risk of teratogenesis and the exaggerated perception of risk that has prevailed to date.
In a broader perspective, to detect most of the direct and indirect ways that reproduction may be impaired, minimum requirements would include exposure from just   prior to mating of mature adults through puberty of the offspring. Observations would need to be continued through conception in the second generation for detection of latent manifestations. In other words, testing requires a combination of exposure and observations through one complete reproductive or life cycle (Fig. 4). As we are dealing with a highly integrated process, only whole mammals, such as rats or mice, can be perceived as reliable surrogates for humans. The existing test design that most closely fits these requirements is the two-generation study required for testing of agrochemicals and food additives. It would also appear to be the design most likely to be of use for testing environmental factors because it is intended for protection against involuntary exposure of large populations rather than voluntary use by the individual. Current versions of the different agencies unfortunately show irritating minor differences (Table 1), and they could be replaced (12) by a simpler, more practical design (Fig. 5). In this design the first generation could be exposed to the test conditions from shortly before mating: exposure and observations for various manifestations of reproductive toxicity are continued through the second generation.
Prenatal effects are detected by their impact on postnatal observations: this is also the principal of the Chernoff Kavlock assay (21), which has been proposed as a means of detecting factors in the environment that might induce malformation. If need be, continued mating of one or both generations readily transforms this design into a fast breeding study (22). The fast breeding study has also been proposed as a means of detecting environmental factors harmful to reproduction. It has earlier, if forgotten, origins, as a test for the nutritional value of mouse diets. The two-generation study is an apical test which, if interpreted correctly, makes it quite efficient at detecting whether an effect has occurred. Unfortunately, in the current climate of regulatory testing, the most popular methods of data analysis and interpretation leave some- "Based on official wrziitten format, which is not always consistent with application in practice.

22
IDENTIFYING FACTORS HARMFUL TO REPRODUCTION thing to be desired (Table 2). Further, the apical nature of the study that makes it efficient in detecting an effect also makes it difficult to determine the specificity and origin of any effect observed. For more specific identification of effects, a segmental approach is preferred. Details of the three segment design studies are given in another paper presented at this workshop (25). As mentioned previously, the current threesegment design used for drugs is inefficient. Perhaps a way to better segmental designs is to re-derive them as subdivisions of a two-generation study.
As a step on the way, it can be seen that the current European Community (EC) segment 1 design (Fig. 6) is a two-generation study in which there is no direct exposure of the second generation. A second difference is that some females are killed before term so they may be examined for prenatal effects. These features weaken the power of the study more than they increase the specificity.
A first step to attaining a noticeable level of specificity is to subdivide either the two-generation study or an EC segment 1 study into two parts. One part emphasizes investigation for developmental toxicity, the other emphasizes investigation of adult fertility.
For investigation of developmental toxicity, exposure starts at implantation and continues through weaning. A second generation is reared to maturity to enable detection of latent effects. The design is almost identical to that of the Japanese experiment 3 but also incorporates exposures during organogenesis (Fig. 7).
For the complementary study for effects on adult fertility, exposure is initiated just before mating and terminated at or just after implantation. Females are killed and examined at about day 14 of pregnancy. This provides a fertility study equivalent to the current Japanese experiment 1 (Fig. 8).
In combination, these two parts cover all the exposures and observations required for testing pharmaceuticals, except for intensive examination for fetal abnormalities. As indicated earlier, I do not think this a serious omission. Objectively, the risk of inducing malformations is slight, and if it occurred there would be a high probability of detection by postnatal end points. Further, in contrast to the devastating impact of malformations on individuals, the impact on populations is negligible.
However, for those who believe otherwise, one solution would be to extend the exposure period of the fertility study and conduct detailed fetal examinations for detection of abnormalities (Fig. 9). This design would provide a better match between observations and exposure periods than the current Japanese design and better group sizes than the corresponding part of the EC segment 1.
Another solution would be to leave the fertility study as it is and add the familiar, conventional segment 2 study (Fig. 10) to provide a true three-segment design. This    ----J w o P o w -v ----. Z Z Z e . w s I 23 would be far less wasteful of time and animals than current 3 segment designs because unnecessary duplication of reared second generations and fetal examinations are avoided. A major weakness in all past, present and proposed tests is that, irrespective of the duration of the premating treatment period, mating is an insensitive means ofdetecting effects on males. A more sensitive method, especially for detecting effects on spermatogenesis, is the direct pathological examination of testes and accessory glands.
Such examinations can be done in repeated-dose toxicity studies as well as in designated reproduction studies.
In overview, the various designs can be seen as a sequence starting with a single two-generation study, progressing through a two part subdivision, then a threepart subdivision, and beyond to case-by-case designs (Fig.  11). Through this sequence, the emphasis of testing changes from detection of any adverse effect through increasing specificity and characterization of effects.

Application to Environmental Factors
Simplifying individual tests and linking them logically provides a role model for testing major new factors or strongly suspected existing factors. The focal point is a two generation study for detection of any effect. Should one be detected then the segmental subdivisions can be used for secondary evaluation and clarification. Results of testing suggest that secondary evaluation would not be frequent as only a small proportion of materials induce specific or selective effects on reproduction.
This still leaves the problem of dealing with unexplored factors in the environment. Given the enormous number of unexplored factors and the complexity of the environment, even the prospect of conducting a single, two generation study is daunting. There is a need for an even simpler methodology that at least can indicate the priority for testing. In this respect the test that provides the maximum information for the minimum effort is a one generation study involving exposure and observation of animals from just prior to mating through weaning of the offspring; in other words, the first part of the two-generation study (Fig. 5).
Extending testing beyond a one-generation study greatly increases the time and effort, but produces very little extra information. Omission of the F1 generation and associated investigations for delayed manifestations may be a justifiable risk given the fact that such effects are rare. This is demonstrated by the low frequency with which effects are first detected in the second generation (23,24) of a two generation study (Table 3).
Regarding the preoccupation with teratogenicity, results from numerous Chernoff/Kavlock assays indicate that postnatal observations effectively detect teratogens indirectly. For example, a rat reproduction study (Table 4) provides unequivocal evidence that thalidomide would be harmful to reproduction when, in the same species, a standard "test for teratogenicity" does not.  Unfortunately, the most convincing testimony to the value of such a one-generation study is not openly available but hidden in company archives. Tests with the same or similar format are used by many reproductive toxicologists as preliminary studies to the studies required for testing chemicals and pharmaceuticals.
At levels of testing simpler than a short one-generation study, the situation is reversed because the saving in time and effort is negligible, whereas the diminution in scope and relevance to human reproduction increases enormously. This is true not only for in vitro tests but also for very similar ChernofE/Kavlock assay and for the familiar segment 2 (embryotoxicity) study.

Summary
In summation, we need to use all of the methods mentioned to identify factors in the environment harmful to human reproduction. The expensive, time-consuming wildlife surveys and the reputedly cheap short-term tests for different reasons would appear to be limited to providing a source of hypothesis generation. A simple onegeneration study would seem to provide the best means of selecting priorities for further testing. It might provide the best means for initial exploration of suspect multifactorial environments to which relatively large populations may be exposed.
Stripped of the layers of administrative extravagance and simplified, the testing methods now used for chemicals could provide a means of investigating new factors or strongly suspected existing factors. They provide the fiLrst point at which we might entertain the idea of presuming the absence of effect. Most certainly we need to put effort into human surveys as these provide the only certain way of identifying whether or not environmental factors will affect human reproduction. Whether there is a need to develop new methodology is doubtful: I suspect that we have methodology but lack the wisdom to use it wisely. We need better understanding of the environment and need to develop an appropriate philosophy in order to use existing methods within their limitations and in a balanced and objective manner.