Have sperm densities declined? A reanalysis of global trend data.

In 1992 a worldwide decline in sperm density was reported; this was quickly followed by numerous critiques and editorials. Because of the public health importance of this finding, a detailed reanalysis of data from 61 studies was warranted to resolve these issues. Multiple linear regression models (controlling for abstinence time, age, percent proven fertility, specimen collection method, study goal and location) were used to examine regional differences and the interaction between region (United States, Europe, and non-Western countries) and year. Nonlinear models and residual confounding were also examined in these data. Using a linear model (adjusted R2 = 0. 80), means and slopes differed significantly across regions (p = 0. 02). Mean sperm densities were highest in Europe and lowest in non-Western countries. A decline in sperm density was seen in the United States (studies from 1938-1988; slope = -1.50; 95% confidence interval (CI), -1.90--1.10) and Europe (1971-1990; slope = -3.13; CI, -4.96- -1.30), but not in non-Western countries (1978-1989; slope = 1.56; CI, -1.00-4.12). Results from nonlinear models (quadratic and spline) were similar. Thus, further analysis of these studies supports a significant decline in sperm density in the United States and Europe. Confounding and selection bias are unlikely to account for these results. However, some intraregional differences were as large as mean decline in sperm density between 1938 and 1990, and recent reports from Europe and the United States further support large interarea differences in sperm density. Identifying the cause(s) of these regional and temporal differences, whether environmental or other, is clearly warranted.

Among the adverse health endpoints that have been linked to endocrine-altering chemicals in the environment, male reproductive dysfunction, and particularly impaired semen quality, is of particular concern. An analysis of 61 studies of sperm density conduded that "...reports published world wide indicate dearly that sperm density has dedined appreciably during 1938-1990" (1). This study stimulated considerable controversy. It was argued that fitting a linear model to these data was inappropriate (2)(3)(4) although the post-1970 increase that had been suggested was later found to be nonsignificant (p = 0.36) (5). Olsen et al. (2) utilized several nonlinear univariate models (spline, step function, and quadratic), which fit the data equally well and somewhat better than the linear model. These nonlinear models suggested a dedine in sperm density until some time in the 1970s, at which point the slope became positive (for the spline) or the curve turned upward (for the quadratic); alternatively, the horizontal line dropped (for the step function). Thus, these models imply quite different trends in sperm density. Because of the public health importance of this question, we conducted a review of the original studies and reanalyzed these data using multiple regression methods.

Methods
Analysis ofprevious studies. Studies published between 1930 and 1990 that included data on sperm density were screened for eligibility by Carlsen et al. (1). The authors' protocol excluded studies that included men in infertile couples and men referred because of genital abnormalities and studies that selected men on the basis of sperm count. Studies that used nonmanual methods for counting sperm were also excluded whenever possible, although laboratory methods were not always specified. Sixty-one studies published between 1938 and 1990 were induded. [See Carlsen et al. (1) for a complete list of references.] Using linear regression, the authors found that sperm density decreased linearly during the study period at the rate of -0.93 x 10 /ml/year, decreasing from 113 x 106/ml to 66 x 106/ml (p<0.0001). Current analysis. We reviewed all articles cited by Carlsen et al. (1) and excluded three non-English language studies (6)(7)(8) and two studies that included men who conceived only after an infertility workup (9,10). The remaining 56 studies, including 97% of the 14,947 subjects in Carlsen et al (1), were included in this reanalysis. Data on the following variables were abstracted and included in all multiple regression models: mean (or median) sperm density, publication year, study location, study goal, percent of men with proven fertility, semen collection method, age, and abstinence time. Variables indicating the completeness of this information were also included.
The arithmetic mean of sperm density was used when available; otherwise, the median or geometric mean, adjusted for the difference from the arithmetic mean, was used (n = 3 studies). The 56 studies were stratified into regions: the United States (27 studies published 1938-1988), Europe and Australia (16 studies published 1971-1990), and other (non-Western) countries (13 studies published 1978-1989). Interaction terms to assess differences between regions in mean sperm density and slope were induded in all multiple regression models. Multiple regression (using procedures for generalized linear models and regression analysis) was used to fit linear, step, spline, and quadratic models (11). It has been suggested that nonlinear models fit this data set better than the linear, because of an apparent upward turn in sperm density during the last 15-20 years of the study period (2,3). The step model assumes that mean sperm density is constant until it drops suddenly to a second constant level. The spline model fits two lines, with a possible change in slope, at a preselected time during the study period. The quadratic model includes a second-order term (year-squared), which allows for a single upward (or downward) curve in sperm density. Year was transformed for quadratic models to aid in convergence of this more complex model by using year' = (year -1930) as the independent variable. All regressions are weighted by the number of study subjects, and results are given in units of 106/ml. Because of missing information, confounding could not be completely controlled. We used information on the modeled variables, as well as additional variables abstracted from these studies, to look for evidence of residual confounding and bias. These additional variables include method of counting sperm, criterion for proven fertility, population source (sperm donors, volunteers, etc.), use of transformed sperm counts, and years of sample collection. Detailed definitions of all covariates are available upon request.

Results
Univariate linear and nonlinear models were fit to the 56 studies, and results were similar to those obtained by Olsen et al. (2 using all 61 studies. These are induded for comparison to multiple regression models (Table 1).
Initial analysis showed significant interregional differences in mean sperm density (p = 0.02). A series of multiple regression models were then fit to these data. Results of the linear model (Model I) are shown in Table 2. Significant differences between regions were found for both intercepts and slopes. The greatest decline is seen for the European studies (1.13/year, CI, -4.96--1.30). For U.S. studies, the slope is less steep ('1.50/year; CI, -1.90--1.10), but greater than that for the univariate model (-0.95). The slope for non-Western countries was positive and differed significantly from the European and U.S. slopes, although the confidence interval was wide (1.56/year; CI, -1.00-4.12). This latter group of studies did not fit any of the models well, reflecting the heterogeneity of the countries induded (e.g., Brazil, India, Israel, Libya, Hong Kong, Kuwait, Nigeria, Tanzania, and Thailand), the small number of studies (12), and the short time during which these were published (12 years). Figure 1A contains the fitted regression lines for the three regions. The adjusted 12 for the full linear model induding all covariates was 0.80, compared to 0.36 for the univariate model and 0.62 for a model that only included terms for region, year, and the interaction of region and year.
Nonlinear models were also fit to these data. The quadratic model (Fig. ID), which also fit the data well (adjusted 12 = 0.78), demonstrates the absence of curvature in the U.S. studies. Some downward curvature is seen for the European studies, while means for non-Western studies show some upward curvature. However, none of the secord order (year2) terms were significant in the model with regional interaction. As seen in Table 2, results for the spline model (Model II) and linear model (Model I) were almost identical, differing only in a slight (nonsignificant) change in the U.S. slope post-1970 (from -1.52 to -1.47; p = 0.97). The similarity of these two models can be seen in Figure 1A and B. These data also fit a step function (Model III), with a significant post-1970 decrease in sperm density in all regions (see Table 2 and Figure 1C). None of these models suggest a post-1970 rise in sperm density except, possibly, in non-Western countries. The apparently improved fit of the nonlinear univariate models reported previously (2 was an artifact of confounding by region and the interaction of region and study year. Confounding (by abstinence time, age, specimen collection method, geographic region), selection bias (changing definitions of normal men and proven fertility), measurement error (methods of counting sperm, variability in sperm counts, study year as independent variable), and statistical artifacts (choice of incorrect model, assuming sperm density normally distributed) are now explored using data from these and other studies (Table 3).
Confounding. MacLeod and Gold (13) and Magnus et al. (12) found that sperm concentration was 50-69% greater in samples collected after 10 days of abstinence than after 3 days (p<0.05). Bendvold (14) reported that mean abstinence time decreased from 7.5 to 4.4 days between 1956 and 1986. James (15) suggested that an increase in marital coital frequency may have contributed to the decline in sperm density reported in that study by shortening average abstinence time. Among the 56 studies analyzed, those with no information on abstinence times were published somewhat earlier than those with reported (or protocol-specified) abstinence time (1976 vs. 1981; p = 0.13). Twelve studies specified the actual mean or range of abstinence times, while 30 studies included only a protocol recommended abstinence time. However, adherence to this recommendation is uncertain. The protocol of Auger et al. (16) requested an abstinence of [3][4][5] days, yet only 66% of men complied. Although compliance with the recommended abstinence time was not assessed in most studies, a similar lack of compliance was likely in all studies. Therefore, because abstinence time was unknown, or known by protocol only in the majority of these studies, it could not be adequately controlled and remains a likely confounder.
The relationship between age and sperm count is complicated by the increase in abstinence time with age. In a study of 484 fertile men, Schwartz et al. (1) found that mean abstinence time increased from 3.8 days among men less than 26 years of age to   (18), in a study with no control for abstinence time, found sperm concentrations over 40 x 106/ml higher in grandfathers (mean age 67 years) than in fathers (mean age 29 years). These results may be misleading because daily sperm production has been shown to decrease with age (19). In a univariate analysis of the 42 studies that included information on age of subjects, a significantly lower sperm concentration was seen in studies that included men over 40 (p = 0.04).These studies were published later (mean publication year 1982) than studies in which all men were under 40 (mean year 1969) or in which age was not reported (mean year 1962); thus, age may confound this analysis. Publication year 1995 Zavos and Goodpasture (20) found that when semen samples were obtained using a collection device during intercourse, sperm concentration was 56% higher than when samples from the same subjects were collected by masturbation (p<.01). Most of the studies we analyzed stated that semen collection was by masturbation, but some specified other methods (n = 5); in some studies, the collection method was unspecified (n = 10). Sperm concentration was significantly lower in studies with samples collected by masturbation than in the other 15 studies (73.2 x 106/ml vs. 99.5 x 106/ml; p<O.O1). Because alternative collection methods tended to be used in earlier studies (mean publication year 1956), the high concentrations in earlier studies may, at least in part, be attributable to collection methods. Geographic region has been long recognized as an important confounder of sperm a. Four models of temporal and regional variation in sperm density controlling for proven fertility, abstinence time, age, specimen collection method, study goal, and interaction of region and study year.  (24) noted that trends in semen quality differed by geographic area. We compared broad geographic regions, finding significant between-region differences in slopes and intercepts. In addition, considerable variation is present within these regions. In fact, within-region differences in sperm concentration, such as those reported by Selection bias. Because of the nature of sample collection, no random sample of semen quality is possible and selection bias will always be of concern, though biases may differ by method of population selection. Studies of normal men may be biased by the use of an arbitrary (and changing) cutoff in sperm count to define normality (26). In 1937, Meaker (27) (1971)(1972)(1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990) Non-western studies (1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989) 106/ml; mean publication year 1982). Thus, trends for both groups of men may be biased by changing selection criteria.
Measurement error. Variation in sperm count, first studied in 1938 (29), is an important source of measurement error. Variation in sperm count can be partitioned into that attributable to the analytic method, to the technician, and to the subject himself. Although Carlsen et al. (1) limited their analysis to studies that used manual counting methods, even these changed during the study period (30). The laboratory protocols issued by the World Health Organization (WHO) have served to minimize this source ofvariability (31,32). These protocols recommend that counts be obtained by hemocytometer, as was done in most of the 56 studies analyzed, although some more recent studies used other counting devices, such as the Coulter Counter (Coulter Electronic Sales Co., Hialeah, FL). However, since the correlation between counts obtained using the hemocytometer and the Coulter Counter has been shown to be very high (0.99) (33), these changes probably had little effect on these data. Intertechnician variability is also a relatively small source of error, with a coefficient of variation estimated to be 6.1% (34). Nonetheless, authors agree that the withinsubject coefficient of variation is appreciable (40-46%) (35)(36)(37). While this variability serves to decrease the precision of the trend estimates, it is not likely to introduce bias.
Use of publication year instead of sample collection years was noted by Farrow (38) as a possible source of bias. Publication year was used by Carlsen et al. (1) because collection years were seldom provided. Only nine studies included this information; all of these were published after 1980, and the interval between median collection and publication ranged from 1 to 7.5 years (mean interval 4.5 years). The effect of this error is uncertain. The selection of the independent variable for these trend analyses is a more important consideration. Whether sperm density is regressed on year of sample collection, as was done by Carlsen et al. (1), or on year of birth, as was done by Auger et al. (16) and Irvine et al. (39), depends on the hypothesis under investigation. The implications of this choice have been clarified recently by Keiding (40).
Statistical artifacts. After taking region into account using a model with interaction for region and year, the quadratic, spline, and linear models are nearly equivalent. A step function can be fit to these data, but the rationale for using it is less dear because it assumes an abrupt jump in sperm density at a single time point. All multiple regression models showed that sperm density decreased with time; no post-1970s rise was seen, except perhaps in non-Western countries. Thus, the criticism that Carlsen et al. (1) had inappropriately fit a linear model (2)(3)(4) appears to be addressed by the use of modeling that takes region into account and separates the data from Western and non-Western countries.
Because sperm density is not normally distributed (28), the use of logarithmic or square root transformations and median (or geometric mean) rather than arithmetic mean (2,28) has been suggested. These transformations, while theoretically desirable, are not possible without access to the raw data. Moreover, such transformations are unlikely to alter these results appreciably; a univariate regression line fit to median sperm density, available in 15 studies, yielded a slope similar to that seen for mean density (slope = -1.27; p = 0.0002 for medians). As discussed by Carlsen et al. (1) and others (2,3,5), studies included in this analysis were not uniformly distributed over time and space. In fact, the distribution of study publication dates and locations reflects the recent interest and technological development in this field, particularly in the United States and Europe. Eighty-five percent of subjects came from studies published between 1975 and 1990; France and the United States accounted for 71% of all subjects; and most of the world's population has not been studied at all. This unbalanced design decreases statistical power, but should not introduce bias. It may, however, limit the generalizability of the results.

Discussion
Declining semen quality and environmental causes of such a decline are not new concerns. Nelson and Bunge (22)    and found a decline in sperm concentration during 1949-198 1, which was inversely correlated with several environmental exposures. James (15) concluded that sperm counts declined with publication year, at least after 1960, based on a global analysis of 29 studies of sperm concentration over a 45-year period. Conclusions from these analyses were, therefore, consistent with that of Carlsen et al. (1), despite the indusion of different time periods and studies and the use ofdifferent exclusionary criteria.
Although the first non-U.S. study in our analysis was not published until 1971, additional data suggest that sperm densities outside the United States were high before that time. Varnek (1944) (6) and Robles (1947) (8), included in Carlsen et al. (1) but not in this analysis (because they were published in Danish and Spanish, respectively), had mean sperm densities of 85.4 x 106/ml and 103.2 x 106 /ml, respectively. Davidson (1949) (43), not included in Carlsen although eligible, reported a mean density of 143 x 106 /ml in 15 fertile British men. The mean sperm density from five studies, published in 1944-1962, which indude 2,456 infertile European men (14,44,(45)(46)(47) was 98.5 x 106/ml. It is reasonable to assume that sperm densities from fertile European men would have been at least as high and therefore would have been consistent with pre-1970 data from the United States.
The multiple linear regression model shows an even steeper dedine in sperm density in the United States  and in Europe (1971)(1972)(1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990) than that reported by Carlsen et al. (1), but no decline in non-Western countries, where data are sparse and are available only since 1978. When regional differences are considered, the data do not support either a "hockey stick" (spline) or an upwardly curving (quadratic) function, as has been suggested (2-4). Although confounding was controlled to the extent possible in these analyses, residual confounding may have contributed to the observed decline, but is unlikely to entirely explain it. Statistical factors (i.e., failure to transform nonnormally distributed data, use of mean vs. median, and use of publication year vs. year of sample collection) are not likely to have influenced these results appreciably. The overall downward trend does not rule out a dedine in some regions and an increase (or no change) in others, nor does it rule out considerable intraregional differences, even within the United States or Europe. Recent reports from France (16,48,49), Scotland (39), Belgium (50), and the United States (23,24,51) are conflicting, supporting a decline in France, Belgium, and Scotland, but not in the United States. A recent study from London (52) suggests that differences in sperm density within a single city may be important. This between-area variation may be due to real differences between environments and populations. This analysis demonstrates that the decline in sperm density reported by Carlsen et al. (1) is not likely to be an artifact of bias, confounding, or statistical analysis. We have not addressed the cause(s) of this decline or assumed an environmental etiology. Cross-sectional comparisons of semen quality now under way in comparably selected populations in several countries may identify areas of low (and high) sperm Environmental Health Perspectives * Volume 105, Number 11, November 1997 concentration. Careful exposure assessment will be required to identify etiologic factors. In the future, these studies should include a broader representation of non-Western populations and banking of semen and serum to facilitate studies of biomarkers of exposure and trends in those exposures.