Statistical issues in risk assessment of reproductive outcomes with chemical mixtures.

Establishing the relationship between a given chemical exposure and human reproductive health risk is complicated by exposures or other concomitant factors that may vary from pregnancy to pregnancy. Moreover, when exposures are to complex mixtures of chemicals, varying with time in number of components, doses of individual components, and constancy of exposure, the picture becomes even more complicated. A pilot study of risk of adverse reproductive outcomes among male wastewater treatment workers and their wives is described here. The wives of 231 workers were interviewed to evaluate retrospectively the outcomes of spontaneous early fetal loss and infertility. In addition, 87 workers participated in a cross-sectional evaluation of sperm/semen parameters. Due to the ever-changing nature of the exposure and the lack of quantification of specific exposures, six dichotomous variables were used for each specific job description to give a surrogate measure of exposure. Hence, no quantitative exposure-response relationships could be modeled. These six variables were independently assigned by two environmental hygienists, and their interrater reliability was assessed. Results are presented and further innovations in statistical methodology are proposed for further applications.


Introduction
Quantitative risk assessment of reproductive outcomes from exposures to chemical mixtures poses challenging statistical problems. First, there are the challenges that are common to many areas of risk assessment, namely, those ofdose-response modeling, low-dose extrapolation, and interspecies conversion. Next, the study ofreproductive outcomes itselfposes unusual statistical problems, e.g., nonindependence between outcomes in siblings. Finally, when the exposures are to complex mixtures, varying over time in number ofagents, doses of individual toxicants, and constancy of exposure, statistical analysis becomes even more complicated.
In this paper we discuss some ofthese issues in detail and make recommendations. To facilitate this discussion, we briefly describe a pilot study of risk of adverse reproductive outcomes among male wastewater treatment workers and their wives to highlight some of these issues and make recommendations. Publications describing the specific results ofthis pilot study in greater depth are currently in preparation.

Pilot Study
The purpose of this study was 2-fold. The  to assess retrospectively the reproductive capacity of workers chronically exposed to chemically contaminated wastewater in sewage, focusing on the end points of spontaneous early fetal loss and infertility. The second objective was to assess crosssectionally semen quality in this same worker population.
To meet the first objective, 317 workers were identified as meeting study criteria. Ofthese, the wives of231 workers agreed to be interviewed regarding their reproductive history. One hundred sixty-eight were wives ofmetropolitan sewer district (MSD) workers, and the remaining 63 were wives ofworkers at the water works (WW). The exposed group consisted of 137 wives of workers in the sewer maintenance and wastewater treatment divisions of MSD, while the comparison group comprised 90 wives ofworkers from the MSD and WW with no or very minimal exposure to chemicals, e.g., residential meter readers. In addition, the wives of four workers with chemical exposure from sources other than wastewater also were interviewed but were excluded from the analysis.
Answers to the reproductive questionnaire were obtained by telephone interview. The instrument was modified from a preexisting instrument with proven reliability and validity (1,2). Questions were asked regarding general health status, outcome data on all pregnancies, gestational exposure to smoking, use of alcohol and drugs, as well as a history of infertility problems, a family history of reproductive dysfunction, and a detailed contraceptive history.
To address the second objective, semen samples were obtained from 87 workers at MSD (59 exposed workers, 28 comparison workers). These samples were first evaluated with respect to sperm concentration, percent motile, percent normal morphology, and percent viable. In addition, a computer-linked digitizing system was used to obtain quantitative measures of sperm morphology, swimming speeds, and swimming patterns.

Defining Exposure to Chemical Mixtures
The workers were exposed intermittently to a myriad of substances both in the processing of sludge and in the repair and maintenance of sewer pipes and drains. The variability of total organic vapor concentrations in the influents of two treatment plants in the MSD has been described previously (3,4). When industrial hygiene measurements have been taken for specific exposures to chlorobenzene (range, < 10-11 pg/L), ethylbenzene (< 10-96 Ag/L), methyl chloride (< 10-100 jig/L), methylene chloride (< 10-360 ig/L), tetrachloroethylene (< 10-62 tg/L), toluene (48-780 AgIL) and l,1,1-trichloroethane (< 10-380 ,zg/L), these compounds have been detected greater than 50% of the time.
Because ofthe complexity ofdefining exposure in this population, several exposure definitions were needed to explain collectively the nature of exposure. No one model could be defined a priori as the best. An MSD industrial hygienist and the project environmental engineer worked with the project team to develop models to characterize worker exposure to industrial and residential sewage. Each of 276 job activities was assigned several exposure attributes by both the industrial hygienist and environmental engineer. These attributes included level (exposed/unexposed), location ofexposure (field/plant), job function (surveyor, tradesman, operator or labor/semi-skilled), and exposure intensity. Exposure intensity included five variables, each with two or three categories. Numeric values were assigned to each category. These five variables were: a) less than daily or daily contact; b) open air, semi-enclosed space or confined space; c) dry or wet work; d) quiescent or turbulent flow; e) residential only, residential and industrial, or industrial only waste. Quantification of exposure levels was not possible due to the high daily variability ofthe types and intensities ofexposures.

Analysis of Fetal Loss
The reproductive effects of chemical agents are frequently assessed by examination ofdata obtained from litters or families. It has long been recognized (5) that these data have an inherent litter effect, i.e., a tendency for siblings to respond more alike than nonsiblings. There are several pregnancy outcomes that have demonstrated the occurrence of the litter effect in human reproduction. For example, occurrence of a fetal loss is associated with increased chance of subsequent fetal loss within the same family (6).
The question of the correct sampling unit in the presence of this special type of dependence among observations in animal studies has been extensively debated in the literature (7)(8)(9)(10). In effect, analyses that use the fetus as the sampling unit are artificially enlarging the sample size (11). To date, however, few procedures have appeared in the literature to deal with depend-ence among observations within the same family in studies of human reproductive histroy. Selevan (2) has recognized the existence ofthis problem and the lack ofstatistical methods to deal with it.
Ifa contingency table is constructed by cross-tabulation ofthe exposure variable (yes or no) against the pregnancy outcome (loss or no loss) across all fetuses, a chi-square test or Fisher's exact test that the proportion of loss does not differ between the two exposure groups will have an inflated type I error rate (11)(12)(13).
In animal studies, the beta-binomial model has been proposed to allow for the litter effect (14), where the outcomes in a given litter are binomially distributed with success parameter, p, being distributed across litters as a beta-random variable. In this model, which has been shown to fit experimental data better than simple Pbisson or binomial models (15), the observatio%s within the same family are conditionally independent for a givenp, but are unconditionally dependent.
Rai and VanRyzin (16) have proposed a proportional hazardslike model for the analysis ofteratological studies in which the probability, P(d,s), of response for an offspring from a female at dose d and with litter size s is given by They further assume that litter size S is a discrete random variable with probability density function fts,4ld). They assume, however, independence among offspring in a given litter, and thus they do not truly adjust for the litter effect.
Human reproductive studies differ from animal studies in that the human family, as a rule, consists of siblings born across time rather than simultaneously. Thus, although genetic fictors would still favor the presence of dependence between pregnancies, other factors that might vary within mother from pregnancy to pregnancy (such as gravidity) must be able to be entered into the analysis.
Also let X U j= ( 9,X .. ,X1j,) be the set of covariables measured on family i, pregnancy j, and let X.' denote the low risk value of the kth covariable. Then where ,B* = f0 + f[ Xand* . = X.. -X*. If O3* has a uniform distribution over (a,b) then the resulting mixture, the uniform-logistic model, is given by Kissling shows that as ba, the logistic probability is the limiting value of the uniform logistic probability. She further develops the likelihood for N families, with the ith family having had ni pregnancies, which results in k, adverse outcomes and shows that 0 is invariant to the choice of the low-risk covariates, X*. Kissling uses this model to analyze an occupational reproductive data set and compares it to the simple stratified analysis and the logistic model, finding the same results in all three analyses, with the logistic model giving a marginally better fit than the uniform-logistic model. No simulation comparisons were made. We are currently investigating several alternative models that are variations on the uniform-logistic model described above. To allow for a richer class of models, the beta-logistic (BL) model (logistic probability for adverse outcome mixed with a beta distribution), the logistic-logistic (LL) model (logistic probability for adverse outcome mixed with a logistic distribution) and the gamma-one-hit (GO) model (one-hit probability for adverse outcome mixed with a gamma distribution) are being investigated. Since the uniform distribution is a beta distribution with parameters 1 and 1, the uniform-logistic model is a special case of the BL model. The class of beta distributions contains bellshaped, U-shaped, J-shaped, and reverse-J-shaped density functions. Hence, the beta distribution gives a wide choice for the distributions of background risk of adverse reproductive outcome, which is indexed by the intercept term of the logistic probability. The LL model is being investigated also, since there may be no a priori reason to restrict background risk to some finite range, which occurs in the BL model.
Finally, the one-hit probability is used as the probability of adverse outcome, since it is used frequently in other areas where genetic changes are thought to lead to the outcome of interest (e.g., cancer risk assessment). Since the intercept term of the one-hit probability must be positive, the gamma family provides a rich class of distributions over the positive real numbers, resulting in the GO model.
All ofthese models allow for covariates varying from pregnancy to pregnancy, as well as for dependence between pregnancies in the same family. To date, these models have been shown to have unique, consistent estimators for their parameters (18,19).
Simulation studies are ongoing to establish the large-sample performance of these models prior to application. To date, we have shown that for the beta-logistic model, the 95 % confidence intervals constructed from estimates of (3 in 100 simulated data sets cover the nominal ,3 used to generate these data sets, for nominal values of j3 of 0, 1 and 2, with beta distribution parameters of(p,q) taking on values of(1,1), (3,1) and (3,2.5). We also used logistic regression to analyze these same 9 groups of 100 randomly generated data sets, ignoring the existence of dependence between pregnancies in the same family unit. In this analysis, the estimates of ( also agreed very closely with the nominal values, and the esimates ofthe intercept agreed with the expected values of the nominal beta distribution used, i.e., pl(p + q). Upon close inspection, this is to be anticipated since the variability ofthese beta random variables is very small relative to the mean on the interval used, (0,1). We are currently conducting simulation studies where the logistic intercept terms are randomly distributed as beta variables on the interval (-2,0), with (p,q) chosen such that the beta distribution has expected value equal to 0.3, a value for the baseline risk of spontaneous early fetal loss that is consistent with that found in the literature (20).

Analysis of Infertility
Several methods have been developed in the last 10 years to use in the retrospective analysis ofhuman fertility. Wong (21), Levine et al. (22)(23)(24), and Starr and Levine (25) used indirect standardization to obtain a standardized fertility ratio (SFR) to compare the observed and expected numbers of live births in exposed and unexposed groups. These methods have been applied primarily in occupational epidemiology studies to address concerns that workplace exposures may be associated with adverse effects on workers' reproductive capacities. Boyle and Starr (26) more recently have developed the proportional hazards model for fertility evaluations. Both the SFR approach and the proportional hazards approach use each female's reproductive history as divided into 1-year intervals specific to age. Each person-year of observation is classified as to the occurrence ofa live birth in that year. Other events (e.g., spontaneous early fetal loss, elective abortion, stillbirth, congenital malformation) do not contribute to the analysis.
Alternatively, Baird et al. (27) have proposed the use of time to pregnancy as a measure for study of the reproductive effects of environmental and occupational exposures. Weinberg et al. (28) used the Cox proportional hazards model for discrete outcome data (29) for assessing prospectively the reduction of fecundability in women with prenatal exposure to cigarette smoking. The outcome ofinterest was the occurrence ofpregnancy and the time to occurrence, not the outcome ofthe pregnancy. In all ofthese methods, the unit ofobservation is some period oftime, either a time interval such as 1 year, or a menstrual cycle. Each interval of observation is assumed to be independent of all other intervals of observation. The adequacy ofthis assumption is not known (30).
In the SFR, the exposed and unexposed groups were compared with respect to their ratios of observed to expected numbers of live births, where the expected number of live births was calculated on the basis of U.S. national birth probabilities, specific for female birth cohort, age, parity and race. In calculating the SFR, the assumption is made that once a subject is exposed, he continues to be exposed. For approximately 10% ofthe MSD exposed group, this was not the case, i.e., they held unexposed jobs at times interspersed during their exposed period. In general, the SFR will not accomodate adjustment for covariates other than those associated with the U. S. birth probabilities. For instance, the effects ofperiods ofcontraceptive use or changes in specific exposures across time cannot be modeled using the SFR analysis. In addition, the SFR only analyzes live births, thus there is no way to summarize a subject's reproductive history in terms of spontaneous early fetal losses, live births, still births, and neonatal deaths. Further research is needed to develop alternative methods for analyzing fertility. One avenue to pursue is the application of intensity function analysis for multiple failure time data to intervals between reproductive events ofdifferent types (31). This method requires very detailed reproductive, general medical, and occupational histories and linkages between them to address this complex issue.

Analysis of Sperm Parameters
Major advances in techniques for analyzing human semen quality have occurred in the last decade. The traditional, labor intensive, manually derived measures ofhuman semen (volume, sperm concentration, percentage of motile sperm, sperm morphology) are being supplanted by computer-assisted systems. Computer-assisted sperm analysis (CASA) systems consist of video tools and sophisticated computers which together can automate image digitization, hence providing quantitative measures ofsperm motility and morphometry (32). While CASA allows for rapid, efficient characterization of large numbers of sperm samples, problems still exist. Standard operating settings for these systems are rarely defined (33). Current technology also limits CASA (32), since these systems must be able to differentiate sperm cells from other objects, measure sperm heads, count cell densities, and detect and measure cell motion and its pattern.
Other difficulties are presented by the choice and application ofstatistical methods to the resulting CASA measurements. The means and standard deviations may not adequately describe data that are asymmetrically distributed, and exposure may affect the central tendency and shape of these sperm characteristics. Hence, other distributional parameters (e.g., 5 and 10% trimmed means, median, minimum, maximum, 1st, 5th, 10th, 25th, 75th, 90th, 95th, and 99th percentiles, range, and interquartile range) should be examined (33). Transformations should be made as necessary to stabilize the within-person variability relative to within-person central tendency and to achieve normality (34); otherwise, nonparametric methods should be used (33). Multivariate analysis ofvariance (35) should be used to assess the characteristics ofthe entire response distribution simultaneously.
Issues also arise in designing studies to predict reproductive success from the CASA measurements (36). Definitions of fertility/infertility have not been standard. Cross-sectional and casecontrol studies linking fertility status with sperm evaluations are using current status or current sperm characteristics to predict the potential for future performance. In addition, characteristics of the female partner must be incorporated into studies of male fertility status. Meisterich and Brown (37) have developed equations for the probability of infertility given sperm count based upon other published cross-sectional data.
Further research into the appropriate transformations for these data is needed. For instance, the utility ofthe Box-Cox transformation (38) for these data should be pursued. The Box-Cox procedure will identify the most appropriate transformation from a rich class, which contains the log and square-root transformations as special cases. Moreover, the relationships between the data from the automated systems and the summary measures of count, percent viable, percent motile, and percent normal morphology need to be explored. For instance, there were four measures of motility taken (percent motile in the sample, absolute velocity, linear velocity, and swimming pattern of many  sperm within a given sample). There is a need to summarize across measures within individual to define sperm integrity with respect to motility, in addition to the need to summarize across individuals to compare groups. For instance, can percent motile be related to some summary index ofsperm swimming speed? Application of multivariate methods to these data will provide further means of reduction of these data to meaningful indices of individual sperm attributes within subject as well as detecting differences in these indices and their interrelationships between exposure groups.

Results of the Pilot Study
In this study, pregnancy was identified as being occupationally exposed or not by linking the conception dates with the worker's employment history. A pregnancy was considered exposed ifan exposedjob occurred within the 4-month period prior to conception and was considered unexposed if no exposedjob had occurred during this same period. In the unexposed group, 13.7 % of pregnancies (14/102) ended in a spontaneous early fetal loss, compared to 5.2% of pregnancies (4/77) in the exposed group (chi-square = 3.5, df = l,p = 0.06). The exposed and unexposed groups were found to be comparable with respect to the following risk factors for spontaneous early fetal loss: maternal age, race, smoking during pregnancy, chronic disease, history of prior fetal loss, maternal reproductive dysfunction, and paternal reproductive dysfunction. A Mantel-Haenszel analysis was conducted to compare the exposed and unexposed groups with respect to spontaneous early fetal loss after stratifying for these individual factors, and there were no significant differences found. Logistic regression was also performed to assess the effects of all risk factors simultaneously. In this case, exposure again was found to be significantly negatively associated with spontaneous early fetal loss. Because of these negative results, the models under development described above were not applied to the results of this study. The results of the infertility analysis in this study population are described elsewhere (39). The overall SFR was 1.17, which was not statistically significant.
With respect to the analysis ofthe sperm swimming speed and morphometry data in this pilot study, two approaches were taken. In the first analysis, a nested analysis of variance was performed on the 2440 sperm measured, with individual subject nested within group (exposed or unexposed). The alternative method of analysis was to perform a weighted general linear model on each subject's average measurement (e.g., average absolute velocity), with independent variables of group, age, and smoking status (never smoked, ever smoked), with the weight being each subject's standard deviation ofthat measurement (e.g., the standard deviation of absolute velocity for each subject). In neither analysis were the data consistent with a normal distribution (p < 0.05 for the Kolmogorov-Smirnov test for both analyses). Nested analysis ofvariance on the logand square-root-transformed data had results similar to the analysis of the untransformed data.
The results ofthe analysis ofthe spenn and semen analyses are being finalizd. As an example, in the analysis ofabsolute velocity, the groups were not significantly different by nested analysis ofvariance (average ± SE = 38.03 ± 0.58 ytm/sec in the unexposed, 41.85 ± 0.44 Am/sec in the exposed,p > 0.10) nor by the weighted analysis ofvariance (36.36 ± 2.72 Am/sec in the unexposed, 40.66 ± 1.53 gm/sec in the exposed after adjustment for age and smoking status, p > 0.10).

Summary
Quantification ofexposure to mixtures changing constantly in individual components and doses is especially difficult. When compounded by the issues arising in the analysis ofreproductive data, the picture grows even more complex. In addition to the problems discussed here, questions arise in the development of measures ofexposure and relating these measures to specific outcomes regarding how to summarize and what time period around the outcome to use for summarization. Hence, the area ofquantitative risk assessment for reproductive outcomes with exposures to chemical mixtures provides many exciting avenues for further research.