Male reproductive toxicology: comparison of the human to animal models.

The human male is of relatively low fertility and thus may be at greater risk from reproductive toxicants than are males of the common laboratory animal model species. Lack of knowledge of the physiological differences that contribute to interspecies variation between man and animals can prevent the effective application of animal data to the assessment of human reproductive risk. Evaluation of spermatogenesis from testicular histology, while uncommon, can provide valuable information about human reproductive risk. The measurement of sperm count or concentration has long been the most feasible approach for human semen evaluation, but may be an insensitive indicator of reproductive function because of high sample-to-sample variability. Interspecies extrapolation factors can be calculated by comparing the reduction in sperm count in humans and test species after exposure to drugs or chemicals. These factors can provide a realistic assessment of relative risk, provided that the sperm are counted at the appropriate time after exposure. However, the degree to which extrapolation factors derived for one agent, and only from sperm counts, can be generalized is not known. Monitoring of sperm motility and morphology parameters is also a common means of evaluating human semen quality, but these techniques are also hampered by the relatively high interindividual and intersample variability. Computer-assisted and morphometric approaches show promise of decreasing the subjective nature of these evaluations and increasing their value in risk assessment procedures. Improvements in predicting human reproductive risk can be expected to come from increased knowledge about reproductive mechanisms in man and animals, together with the utilization of objective measures of cellular indicators of male reproductive function.


Introduction
The human male is of relatively low fertility as compared to most animal species. An estimated 15 to 20% of all American couples are infertile (1), less than onethird of all conceptions result in a live birth (2), and some 20 to 30% of all developmental defects have genetic origins (3). Although between 10 and 20 chemicals or drugs have been directly demonstrated to adversely affect human male reproduction (1), the majority of our information concerning reproductive toxicants has been collected using animal models. Thus, the scientific process for the evaluation of human reproductive risk from drugs and chemicals includes extrapolation from data obtained using these models. Yet, interspecies differences may prevent the effective application of information gained from the study of animal species to the assessment of human risk, and the lack of understanding of the factors contributing to differences in response among species will inevitably weaken the conclusions drawn from such studies. This report compares and contrasts several indices of reproductive efficiency in animals and man. The discussion is not intended to be a comprehensive review of interspecies extrapolation or of the characteristics of animal models. For more details *Department of Genetic Toxicology, Chemical Industry Institute of Toxicology, Research Triangle Park, NC 27709. in these important areas, the reader is directed to several excellent reviews on both subjects (4)(5)(6)(7)(8)(9)(10)(11)(12)(13).

Nature of Interspecies Variability
Interspecies variation in response to toxic chemicals can arise from differences in many parameters. Data gathered from humans may differ from data from test species for procedural reasons, including variations or inconsistencies in statistical treatment of the data, in the design of the animal study (including dosing concentration and route of administration of the test compound), or in the epidemiological evaluation procedures in the human study (9). Major variability in response occurs because of genetic and physiological differences between animals and man ( Table 1). Several recent volumes (4)(5)(6) have catalogued the physiological and toxicological basis for the expression of interspecies vari- Table 1. Possible physiological differences among species.

Metabolism
Enzyme types and specificities Kinetics Intracellular pathways of toxicity Membrane biochemistry and receptors Absorption, distribution, storage, and excretion Specific organ function ation, and recent reviews have dealt with more specific issues, including comparative drug metabolism (10), the use of scaling factors for relating species body size to various physiological and pharmacokinetic characteristics (8), and the efficacy of using animal models to predict the potential chronic effects of toxic xenobiotics (9).
Genetic differences account for many of the variations that occur between animals and man; physiological functions often vary not only among species but even among strains of the same species (9). The variations may be expressed as both quantitative and qualitative differences in the metabolism of drugs and chemicals, for example. DBA/2J mice are resistant to aryl hydrocarbon hydroxylase induction by 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), as compared to C3H or C57BL/ 6J strains, at least in part because of a lower concentration of TCDD-specific cellular receptors (14,15). Similarly, the differential toxicity of methyl chloride in rats and mice is thought to be related to species-specific variations in glutathione conjugation of the chemical (16). Some interspecies differences may be minimal, as in the case of oral and pulmonary absorption or pulmonary and renal excretion (which are generally comparable among vertebrate species), but others are pronounced, as in the case of dermal absorption (which varies because of morphological differences among species) (8). Similarly, storage and distribution of compounds tends to differ little among vertebrate species, but more significant variations may occur in target organ sensitivity, membrane receptors, and enzyme kinetics (7).
Differences in specific organ function may play a particularly large role in the etiology of animal/human variation in reproductive risk. In general, the human male is likely at relatively greater risk from toxic agents because of differences in gonadal function. The number of sperm per human ejaculate is typically only two-to fourfold higher than the number at which fertility is significantly reduced (1 7), whereas the number of sperm in a rat, rabbit, or bull ejaculate is many times (up to 1400-fold) the number which will produce maximum fertility (11). To illustrate, in experiments where rats were surgically or chemically rendered oligospermic, Aafjes and co-workers (18) reported that epididymal sperm counts could be reduced as much as 90% without significantly affecting fertility. Other species differences in reproductive parameters are summarized in Table 2, which has been compiled from the literature. Although some spermatogenesis parameters are similar, note that human males have markedly smaller relative testis size and the lowest rate of daily sperm production per gram testis by a factor of more than three. Moreover, the percentages of progressively motile sperm and morphologically normal sperm in human semen are lower in man than in any of the animal models studied (13).
The relative fertility of humans and of animal models is portrayed in Figure 1. Assuming a relationship between some theoretical reproductive parameter "X" (e.g., sperm number, motility, production rate, etc.) and fertility, animal models are far to the right on the plateau of maximum fertility. Human males, on the other hand, are much nearer the minimum X required (Fig. 1A). Given a 60 to 80% decrease in parameter X, males of the usual test species remain maximally fertile, but fertility in the human male begins to be compromised (Fig. 1B). It is also clear from this diagram that the fertility of animal models may be an insensitive indicator of human reproductive risk.
Using specialized laboratory procedures, it is possible to bring the fertility of animal test species into the range of human fertility (i.e., to move them to the left on the fertility curve in Figure 1, in the region indicated by the arrow). For example, artificial insemination procedures can decrease the efficiency of the reproductive process in animals by decreasing the number of sperm inseminated (13) and thus increase the sensitivity of the fertility test. Alternatively, the sensitivity of natural breeding studies can be improved by decreasing the size of the large sperm reserves in rats by using a protocol of continuous breeding prior to testing (19). By reducing epididymal sperm reserves, this technique produces a rat which, in terms of its reproductive efficiency, is apparently much more like a human. These and other procedures that take into account the physiological differences between model species and humans should result in more sensitive testing procedures to detect fertility changes in humans.
If animal fertility is not a reliable measure of reproductive risk to humans, what sort of end points might be useful? Common laboratory animal tests are listed in Table 3, along with a summary of their practicality in rats, rabbits, and humans. These two test species were selected for emphasis, since rats are commonly used in toxicological studies and have well-characterized reproductive processes and because rabbits are the smallest species from which ejaculates can be conveniently obtained. Rabbits are normally the only species used in longitudinal studies, in which fertility changes and seminal parameters are evaluated in the same animal over time. Longitudinal studies can also be performed with rats by recovery of uterine sperm after mating with overectomized, hormonally primed females H M Reproductve Praeter 'X' and animal spermatogenesis and testicular histology should be emphasized.
Reproductive Parameter 'X' FIGURE 1. Idealized fertility curve, illustrating relative fertility of man (H) and common laboratory animal models (M), plotted versus some theoretical reproductive parameter, "X." (A) The position of normal model species and humans reflects the greater reproductive competence of the test species. The arrow indicates the region to which fertility should be reduced to increase the sensitivity of animal models. (B) After a 60% reduction in parameter X, the model species is still fully fertile, but human fertility has decreased from maximum. (20). The sperm recovered are not directly comparable to ejaculated human sperm, as the ejaculate has been diluted by vaginal and uterine fluids and the cells have passed through the cervix, which may act as a barrier to abnormally shaped or functioning sperm. Thus, parameters measured in this population may not be representative of the ejaculate as a whole, and therefore not directly comparable to human ejaculated sperm. Studies of comparative testicular histology in humans and animal models can provide valuable information about the effect of drugs and chemicals on sperm production rates and spermatogonial survival. Although similar in many respects, certain differences in human Comparative Testicular Histology Spermatogenesis is the process by which spermatogonial stem cells divide and differentiate to spermatozoa, and methods for its critical evaluation have been thoroughly reviewed (21). The spermatogenic cycles of man and animals have many similarities: each step of the process is precisely timed and complex associations of cells mature in synchrony (22). In any given area of the seminiferous epithelium, there are five or six generations of germ cells that appear together, forming cell associations of definite and fixed composition (23). Specifically, one or two generations of spermatids are always associated with one or two generations of spermatocytes and with spermatogonia at given stages of their development. Each cell association, which can be classified by cytological or histological criteria, forms a stage of spermatogenesis, and from 6 (human) to 14 (rat) stages have been described (23,24). The cycle of the seminiferous epithelium consists of the complete series of spermatogenic stages as they appear sequentially over time in the same area of the seminiferous epithelium. The length of the cycle corresponds to the time from the disappearance of one particular stage to its 25.-0 reappearance in the same area of the seminiferous tubule, and ranges from 8.6 days in the mouse to 16 days in humans ( Table 2). The duration of spermatogenesis, i.e., the length of time it takes for a given stem cell to produce mature spermatozoa, corresponds to the length of approximately 4 to 4.5 cycles in laboratory animals and humans (ranging from 35 days in the mouse to 74 days in humans).
A cross-section of a mammalian testes displays numerous cross-sections of the seminiferous tubules (Fig.  2). In most mammals, the stages of spermatogenesis appear not only sequentially in time, but also sequentially in space (along the length of the seminiferous tubule). This so-called wave of the seminiferous epithelium (Fig. 3) has been described for numerous model species (25) and results from the synchronous differentiation of the cellular associations. The serendipitous result of the wave is that each seminiferous tubule will be of only one spermatogenic stage when viewed in cross-section (Fig. 3). This considerably eases the job of the pathologist who wishes to quantify the toxic effects of drugs and chemicals in the animal testis.
In contrast, the arrangement of germ cells in the human seminiferous epithelium appears relatively chaotic when the tubule is viewed in whole mount. Specific cellular associations (of which there are six) are seen to appear in irregular zones throughout the seminiferous epithelium (Fig. 4), and there are frequent heterogeneous stages, characterized by the inappropriate presence or absence of germ cells in typical cell associations (24). Thus, a cross-section through a tubule will reveal a mixture of cells of different stages (Fig. 4). More recently, Schulze and Rehder (26) utilized morphological and morphometric procedures to study human spermatogenesis. They report that, rather than being haphazardly arranged, increasingly mature spermatocyte populations are arranged on helices that are contracted conically toward the lumen of the seminiferous tubule. These authors note that this arrangement may be a natural outcome of the markedly lower rate of spermatogonial division in humans as compared to other mammals. Nonetheless, a typical cross-section of testis will present an appearance similar to that seen in Figure  4. As a result, quantitation of toxic effects on spermatogenesis is a much more difficult task in humans, and direct comparison to animal models may not always be possible or warranted. Other, more quantitative, methods have been described for determination of sperm production rates from testicular histology in humans, including enumeration of spermatocytes and/or deter-  mination of spermatid number (21,27) and can also be applied to assess reproductive toxicity if testicular samples can be obtained. Because of difficulties in obtaining human testicular biopsies, assessments of reproductive toxicity in humans are commonly restricted to the analysis of semen or sperm quality, including sperm concentration and count, the percentage of motile sperm, and the percentage of sperm with abnormal morphology. Ejaculates cannot effectively be obtained from the rat, and epididymal sperm samples cannot routinely be obtained from humans, so comparative studies of rat and human sperm quality must perforce compare epididymal rat sperm to ejaculated human sperm. How do these end points compare in humans and animal models?

Comparison of Sperm Counts
Humans have markedly lower rates of sperm production and overall sperm counts than do the usual animal models ( Table 3). Measurement of sperm numbers has long been the most feasible and, thus, the most common method for the evaluation of human semen quality. However, although 20 million sperm per milliliter is widely regarded as the lower limit of the normal sperm count (12), over 80% of infertile males are reported to have higher counts (28), suggesting that some other reproductive parameter is also contributing to the infertility. Thus, sperm counts, though easy to obtain, may not be the most sensitive of indices for infertility or even testicular toxicity. Indeed, coefficients of variation (CV) of over 0.50 have been reported for repeated sperm counts from fertile donors over a period of 1 year (12), and we have measured CVs as high as 0.89 in counts from over 500 semen samples from 159 men (Working and Levine, unpublished). Variance in sperm counts and concentrations in semen or epididymal sperm from test species is markedly lower (around 0.25-0.35), but still high enough to hinder the ready detection of reproductive effects.
Comparisons of the reduction in sperm count in humans and animal models after exposure to drugs or chemicals can be useful, provided that appropriate time points for the sperm count are selected. Interspecies extrapolation factors (IEF), which are ratios of the dose of a drug or chemical needed to produce a given reduction in sperm count in a model species to that required in humans, have been calculated for several chemotherapeutic drugs and ionizing radiation (30,31). When the IEF is low, there is low relative risk to humans (i.e., a much larger dose is required in humans than in animals to cause equivalent reductions in sperm count). Conversely, when the IEF is high, the human reproductive risk is high.
Meistrich and co-workers note that the time at which sperm counts are determined will vary with the cell type affected by treatment and with the kinetics of spermatogenesis. Thus, selection of the proper time point for quantitation of sperm number is essential. For example, the IEF determined after exposure to ionizing radiation (31) is approximately three (meaning that man is three times more sensitive than mouse) when sperm counts are made at the point when they reach a minimum (only 4 days in mouse, over 150 days in man). These data suggest that the survival of different cell stages is being measured in mouse and man, so IEFs based on the time of minimum counts are likely to be misleading. Alternative methods for calculating IEFs use sperm counts obtained at the minimum time when stem cells could have matured to sperm, which yields an IEF of 11 to 21 (31). However, this method may overestimate the relative human risk because of known species differences in the repopulation kinetics of stem cells. In animals, stem cell regeneration and seminiferous epithelium repopulation (with concomitant sperm production) typically begin simultaneously (32). In contrast, current evidence suggests that epithelial repopulation in humans does not begin until the entire stem cell population has been replenished (33). Sperm count recovery in human males has been reported to take as long as 3 to 4 years after exposure (34). A final method for obtaining an IEF is to perform sperm counts at the time when recovering counts have reached a maximum, yielding, for animals and humans exposed to ionizing radiation, an IEF of about one (31). This last value will not be influenced by variations in spermatogenic kinetics among species, or by the particular cell stage dam-aged, and may be the most accurate means of deriving IEF values.
The degree to which IEFs obtained from acute exposures to chemotherapeutic agents or ionizing radiation and derived only from sperm counts can be generalized is not known (35). In theory, however, IEFs once derived can be applied to data from animal studies to predict human risk in the absence of human data.

Comparison of Sperm Motility
Quantitation of sperm motility is a common means for assessing the quality of semen samples collected during routine clinical studies. The relationship between sperm motility and fertility in humans has long been recognized (36), and the determination of sperm motion parameters has proven to be a valuable diagnostic tool in assessing the quality of human semen (37)(38)(39). Toxicological studies of the reproductive effects of chemical exposure in animals also often assess the quality of epididymal sperm or sperm in semen as an indicator of reproductive function, usually by subjective visual assessments of sperm movement (11,40).
Although subjective measures of sperm motility may not be useful for accurately assessing relative animal and human reproductive risk, numerous methods for the objective determination of human and animal sperm motility and velocity have been developed in recent years. The most convenient and practical of these are videomicrographic methods, which are often computerassisted (29,38,(41)(42)(43). Using standard videomicrographic techniques, Blazak et al. (41) reported a CV of 0.16 for the percentage of motile sperm determined in single samples from 15 adult Fischer 344 rats, and we measured a CV of 0.15 for the percentage of motile sperm in samples from 50 adult rats using a computerassisted motility analyzer (43) (Table 4). Slightly more biological variation is seen in human samples. Katz et al. (29) calculated an overall CV of 0.27 for percent motility in repeated semen samples from fertile donors over 1 year, with per man CV ranging from 0.12 to 0.49. In more recent work (Working and Levine, unpublished), using a computer-assisted videomicrographic system, we determined a CV of 0.35 for percent motility in over 500 semen samples from 159 men of unspecified fertility, and CVs ranging from 0.06 to 0.48 per man for four repeated samples from the same donors over a 6month period (Table 4). Sperm swimming speed can also be measured using videomicrographic procedures, and this parameter exhibits somewhat less variability ( Table 5). The CV for mean swimming speed in sperm from 10 rats was 0.07 (41) and in sperm from 50 rats was 0.09 (43). Once again, human samples had similar, but somewhat higher, variance. Repeated samples from fertile donors yielded an overall CV of 0.19, with a per man range of 0.06 to 0.31 (29). In our study, the CV for mean swimming speed for sperm from 538 semen samples was 0.20, with per man values (from four samples over a 6-month period) ranging from 0.02 to 0.49 (Working and Levine, unpublished).
Clearly, the percentage of motile sperm and mean swimming speed parameters are more invariant than sperm count, which had CVs ranging from 0.24 in the rat (41) to over 0.80 in the human (29). These more stable values may provide a better indicator of toxic effect and may be of more value in discriminating between fertile and nonfertile human samples than simple sperm counts or concentrations.

Comparison of Sperm Morphology
Sperm morphology can be assessed both subjectively and objectively, and by either method it is clear that human semen typically contains a high percentage of abnormally shaped sperm (44); indeed, men are not considered to have a fertility problem until the frequency of abnormal cells exceeds 50% (12). Most test species have much lower proportions of abnormal cells, generally less than 5%, and the wide variety of sperm head shapes in test species often precludes the direct comparison of alterations in sperm head shape in animals and man.
Subjective measures of human sperm morphology are complicated by interlaboratory and interscorer variability in the assignment of sperm shapes to user-deter-  "Significantly different from control (p : 0.05). mined morphological categories, but the development of standard classification schemes has partially solved this problem (44,45). Also promising are videomicrographic-based methods for the morphometric analysis of sperm morphology (46,4 7), in which sperm head shape and size characteristics are measured using image analysis techniques. Using such an approach, Katz and co-workers (46) measured the maximum width and length and the circumference of the human sperm head in single specimens from fertile and infertile men ( Table  6). There was greater variability per ejaculate in infertile men, and the length/width ratio effectively distinguished the fertile and infertile groups, suggesting that this method may have some utility as an indicator of reproductive changes. Methods for the objective measurement of animal sperm morphology that use similar morphometric techniques (48) or flow cytometry (49) have also been developed and may permit more direct comparison of exposure-related changes in sperm head morphology in test species and the human male.
Clearly, no single reproductive end point in any laboratory animal model can serve as an accurate indicator of reproductive risk in the human male. Current procedures that use laboratory models are often inadequate, usually because they are too subjective and fairly insensitive. Many approaches pay little attention to the physiological differences between man and the common test species, and thus fall short in their ability to predict human reproductive hazard. Future improvements can be expected to come from increased knowledge about reproductive mechanisms in man and animals, together with the utilization of objective measures of cellular indicators (e.g., sperm motility and morphology) ofmale reproductive function and of test procedures that increase the sensitivity of the animal model systems (e.g., artificial insemination).
The author acknowledges the useful comments and expert assistance of Dr. Karin S. Bentley, Dr. Mark E. Hurtt, and Kathleen Mohr. Thanks also to Ravi Mathew and Michelle Hayes for their excellent statistical analysis of the human sperm data. Drawings were done by Susan Sadler.