Studying Human Fertility and Environmental Exposures

Landrigan et al. (2004) reported on exposure to asbestos as a result of events involving the World Trade Center (WTC). Their results are somewhat lower than that reported by others (Lange 2004) for this unfortunate event. I reported a single asbestos bulk sample at 40% asbestos (Lange 2004), although Landrigan et al. suggested that most are in the range of 1–3%. Because there was one “high” bulk sample observed, it is likely that numerous other locations had similar “elevated” asbestos levels. Airborne exposures were also elevated for a considerable time period after the event (Lange 2004). Although measurements were reported as task-length averages (TLA), it is likely that some personal samples (Lange 2004) exceeded the U.S. Occupational Safety and Health Administration permissible exposure limit (PEL) of 0.1 fibers/cm3 well after the first few days of the event. For example, during 1 January 2002–11 February 2002 in the west area, the arithmetic mean exposure and an upper reported value (0.500 fibers/cm3-TLA) were above the PEL (Lange 2004). 
 
Landrigan et al. (2004) also reported clearance samples as fibers per millimeter squared, which likely should be in structures per millimeter squared. It should be noted that this clearance standard has not been shown to be health based, and structures per millimeter squared cannot be equilibrated or converted to fibers per millimeter squared. 
 
Even with evidence of higher exposure levels, on the basis of reported data (Lange 2004), it is unlikely that exposure to asbestos itself will result in any actual health effects. This is because the asbestos was mostly chrysotile (Landrigan et al. 2004) and the duration of exposure for most workers was short (Lange 2003). However, as previously reported (Lange 2001, 2002, 2004), regulatory agencies ignored their own regulations at the WTC, whereas asbestos concentrations (bulk and air) for other locations would probably trigger a regulatory response and most likely a citation with a requirement of some action plan. Thus, it appears that there are two standards to be taken from the WTC, one for agencies themselves and another for all others.

In their review of approaches to studying the influence of environmental exposures on human fecundity,  compared several ways of assessing fecundity.
Fecundity-the probability of pregnancy in couples having regular intercourse without contraception-can be assessed by applying appropriate statistical approaches to time-topregnancy (TTP) data.  provided a thorough presentation of the detailed prospective approach to assess TTP. We agree that advantages of this approach, in which daily urine samples are collected, include allowing the estimation of the daily probability of pregnancy within a menstrual cycle and studying the early survival of the embryo; however, we have reservations about the authors' conclusion that the detailed prospective approach should be seen as the gold standard for studying the effects of environmental exposures on fecundity.
We believe that prospective TTP studies, whether detailed or not, have one main limitation, which lies in the difficulty of defining precisely the target population: These studies are often based on the inclusion of couples soon planning to attempt conception or to stop using contraceptive methods. In our opinion, this population is ill-defined and lacks a sampling frame, which makes the estimation of participation rates difficult. Indeed, many published detailed prospective TTP studies had unreported or low participation rates , opening the door for selection biases. We also doubt that these "super pregnancy planners," who program their pregnancy attempts months ahead, are representative of the general population. For example, detailed prospective TTP studies have sometimes included couples with higher-than-average educational level (Wilcox et al. 1988) or those who use natural family planning methods not widely used (Dunson et al. 2002). These characteristics may be associated with the probability of pregnancy and with the environmental exposures of interest, thus resulting in possible biases.
These limitations of the prospective approach do not justify a preference for retrospective studies. As pointed out by , the exclusion of infertile couples in most retrospective studies is indeed of particular concern; it reduces statistical power and leads to underestimation of the effect of the environmental exposure of interest (Slama et al. 2004).
The current duration approach, another approach not mentioned by , makes it possible to include infertile couples without resorting to detailed prospective studies. The current duration approach relies on the inclusion of couples currently trying to conceive or who are having intercourse without contraception (Keiding et al. 2002;Olsen and Andersen 1999). The recruited couples are asked how long they have been having unprotected sexual intercourse. Follow-up of these couples is not required (Keiding et al. 2002), but it is possible to obtain information on the occurrence of a pregnancy. In this case, the approach is based on principles from the case-cohort design (Olsen and Andersen 1999).
In the current duration approach, data on the frequency of sexual intercourse, the duration of the menstrual cycle during the attempt at pregnancy, and environmental exposures can be collected with virtually no recall bias. The collection of urine or other biologic samples is possible, at least from the date of inclusion; that is, some time after cessation of contraceptive use. The advantage of the current duration approach is that the inclusion criterion (currently having sexual intercourse without contraception) is more clear-cut than that of the prospective approach. This approach thus has a clearly defined sampling frame. We are currently testing this approach on a representative population of French women 18-45 years of age.
The four approaches to assessing TTP are based on different inclusion schemes. The retrospective approach is based on the inclusion of couples who already had a pregnancy; prospective approaches (detailed and not) are most often based on the inclusion of couples who will soon discontinue contraceptive use; and the current duration approach is based on the inclusion of couples currently trying to conceive. We believe that none of these methods can currently be considered a gold standard. In particular, unlike , we do not think that the potential bias from the exclusion of pregnancies occurring during contraceptive use ) is specific to the retrospective approach, because prospective (and current duration) studies seldom include couples using contraceptive methods.
Instead, we believe that the existence of new, alternative approaches should provoke comparative studies, leaving room for debate before conclusions are drawn about which approach is preferable for a given purpose.

Studying Human Fertility
We very much welcome the National Children's Study, which promises to raise the study of factors affecting reproduction and development to a new level. An impressive and exciting range of new methodologies is being developed (Chapin and Buck 2004;National Children's Study 2004). However, we think it important to correct some of the inaccurate statements concerning the use of retrospective time to pregnancy (TTP) made by . We do not see prospective methods and the retrospective approach as alternatives; they are complementary, each having their strengths and weaknesses. Unfortunately, Tingen et al. presented a negative and distorted view of retrospective TTP studies, describing things that are "often" or "typically" done but that do not represent current best practice; then they used their description to denigrate all such studies. Although it is true that retrospective studies are subject to multiple potential "bias in recruitment, recall, and behavior or exposure trends" , careful sampling and questionnaire design and use of appropriate methods of analysis can address most of these issues.
Retrospective studies are not necessarily pregnancy based. They can be conducted in random population-based samples and frequently are cross-sectional or birth cohort studies (Joffe 2000;Joffe and Li 1994;Karmaus et al. 1999;Sallmén et al. 1995;Schaumburg and Boldsen 1992;Schaumburg and Olsen 1989;Thonneau et al. 1999), thereby overcoming the problem that only women who eventually conceived are included. Even in pregnancy-based studies, if there are concerns about differential prenatal care (an issue in the United States but not in western Europe, for example), recruitment could be based on births rather than pregnancies, obviously with loss of nonbirth outcomes. If sampling is population based, it is feasible to ascertain periods of unprotected intercourse not leading to conception (generally stipulating a minimum duration such as 6 months); these attempts can be added to the pregnancy-related TTP values to generate the "time of unprotected intercourse" (Karmaus et al. 1999).  presented simple issues of questionnaire design negatively, but these problems can be easily solved. For example, if data are collected in relation to the starting time instead of the conception time , behavior change does not lead to bias but only to nondifferential loss of information.
A central issue is planning bias, the question being how to exclude accidental (unplanned) pregnancies without bias occurring if the exposure variable is associated with the degree of "plannedness." Retrospective studies can readily investigate this by following the standard guidance to collect full information for all pregnancies, including all covariates, and carry out parallel analyses with "unplanned pregnancy rate" as outcome variable . Prospective studies are unable to do this because only planners are recruited.  stated that in TTP studies, "women are asked to recount their contraceptive and sexual history." This is incorrect; in TTP studies, women are not asked for this detailed information because it would be invasive and inaccurate. Instead, women are simply asked how long it took to conceive, a question that is acceptable and that most can answer. The replies give an accurate representation of the true TTP distribution (Baird et al. 1991;Joffe et al. 1993Joffe et al. , 1995Zielhuis et al. 1992), even with recall of up to 20 years (Joffe et al. 1995). Although digit preference (and other nondifferential misclassification) can occur, the implication is that more respondents are required than would be the case with perfect information. Nevertheless, stable estimates of the TTP distribution can be obtained with approximately 200 values in each exposure group, or fewer in the case of ordered categories such as successive 5-year periods (Joffe 2000).
We agree that a major limitation of retrospective studies is that it is impossible to obtain detailed, timed information on exposures and key biologic events such as ovulation, and difficult to ascertain certain covariates such as frequency or timing of intercourse. This is the key strength of the prospective design. On the other hand, retrospective studies are representative because, as already noted, sampling from the general population is available and planning bias can be handled. The questions are easily administered and answered, and the response rate is high. Even response bias can be avoided by nesting the TTP questions within a more general population survey, thus decoupling survey nonresponse from differential fertility or other motivation that would convert low response rates to response bias (Joffe 2000). Selection bias remains a potential problem for some retrospective designs but can be handled by appropriate statistical analysis allowing for truncation effects (Scheike and Jensen 1997).
Not only are prospective studies timeconsuming and costly, and therefore likely to be rarely used, but they have important methodologic drawbacks. For example, it is impossible to distinguish the approximately 3% of couples who are sterile from those who merely take a long time to conceive (> 10% typically take > 12 months), unless follow-up is extremely long.
More seriously, prospective studies are dominated by the lack of a sampling frame (except in occupational studies) and by a potent combination of planning bias and response bias. They can include only couples who deliberately plan and are willing to volunteer for onerous monitoring. This is acceptable for internal comparisons (e.g., studying day-specific conception rates, each subject being her own control) but raises serious problems with external validity.  referred to this only in their Table 1-"Participants might be less representative of target population"-but not in the text; in contrast, Buck et al. (2004) admitted that women who plan their pregnancies may be systematically different from those who do not, that this may adversely affect external validity to a degree which cannot be empirically evaluated, and that the findings may not be generalizable to all women.
A 606 VOLUME 112 | NUMBER 11 | August 2004 • Environmental Health Perspectives Correspondence agree that the sampling frame is a major methodologic problem in fecundity studies of all designs. The current duration strategy of enrolling couples currently attempting pregnancy is a promising approach, particularly when couples are followed after enrollment to obtain detailed prospective information. Data from the menstrual cycles before enrollment can then be combined with detailed data from cycles during the study period using recently proposed statistical methods (Dunson 2003).
However, it is important to note that this innovative combination of retrospective and prospective designs still does not address the vexing problem of couples who do not have a clearly defined pregnancy attempt. Demographic surveys and qualitative research reveal that many-perhaps mostpregnancies are not exactly planned in the sense of an exactly defined onset of intention to become pregnant (Trussell et al. 1999). Even the onset of sexual intercourse without contraception may not always be easy to define reliably, with periods of use interspersed with periods of nonuse. Ultimately, a complete evaluation of this issue will need to include couples using contraception, at least at study enrollment. Some studies have done this, at least for barrier contraceptives (Eskenazi et al. 1995).
Joffe et al. comment on alternative retrospective designs that can be considered to address the problem of a nonrepresentative sample. We agree that prospective studies are limited by the fact that individuals willing to participate may not be representative of the general population (as in prospective epidemiologic studies of other heath outcomes). However, many of Joffe et al.'s comments on the prospective design are unduly negative. For example, the stated methodologic problem of it being "impossible to distinguish the approximately 3% of couples who are sterile from those who merely take a long time to conceive" is not specific to the prospective design, but a general issue in distinguishing sterility from infertility in the absence of known causes of sterility .
The "best" design (if it exists) really depends on the scientific questions of interest. Retrospective and population-based studies have an important role in assessing population fecundability in demographic studies, in studying effective fecundability, and in surveillance for possibly significant environmental exposures. However, our focus is on studies investigating the potentially complex and time-varying effects of environmental exposures on biologic fecundability. Intercourse timing relative to ovulation has a critical role, not only in determining the overall probability of conception in a menstrual cycle, and hence time to pregnancy, but also in predicting later outcomes, such as early pregnancy loss (Wilcox et al. 1998). Confounding resulting from differences in exposed and unexposed individuals in their sexual behavior, including timing and frequency of intercourse, is a major concern. There can be problems even if the individuals have the same intercourse frequency because there is substantial variability in the timing of ovulation (Wilcox et al. 2000). In addition, prospective data on mucus and hormones potentially provide important information about biologic mechanisms.
For all of these reasons, we continue to recommend that whenever possible, detailed prospective data of the type that we have outlined should be collected in epidemiologic studies of fecundity, as well as in studies that seek to relate periconception exposures to later reproductive and developmental outcomes. Daily sampling of urine (via samples sent to the laboratory, or onsite with commercially available computerized devices) is one way to achieve this, but not the only one. We detailed other currently available and feasible approaches in our article ). Landrigan et al. (2004) reported on exposure to asbestos as a result of events involving the World Trade Center (WTC). Their results are somewhat lower than that reported by others (Lange 2004) for this unfortunate event. I reported a single asbestos bulk sample at 40% asbestos (Lange 2004), although Landrigan et al. suggested that most are in the range of 1-3%. Because there was one "high" bulk sample observed, it is likely that numerous other locations had similar "elevated" asbestos levels. Airborne exposures were also elevated for a considerable time period after the event (Lange 2004). Although measurements were reported as task-length averages (TLA), it is likely that some personal samples (Lange 2004) exceeded the U.S. Occupational Safety and Health Administration permissible exposure limit (PEL) of 0.1 fibers/cm 3 well after the first few days of the event. For example, during 1 January 2002-11 February 2002 in the west area, the arithmetic mean exposure and an upper reported value (0.500 fibers/cm 3 -TLA) were above the PEL (Lange 2004). Landrigan et al. (2004) also reported clearance samples as fibers per millimeter squared, which likely should be in structures per millimeter squared. It should be noted that this clearance standard has not been shown to be health based, and structures per millimeter squared cannot be equilibrated or converted to fibers per millimeter squared.

The WTC Disaster and Asbestos Regulations
Even with evidence of higher exposure levels, on the basis of reported data (Lange 2004), it is unlikely that exposure to asbestos itself will result in any actual health effects. This is because the asbestos was mostly chrysotile (Landrigan et al. 2004) and the duration of exposure for most workers was short (Lange 2003). However, as previously reported (Lange 2001(Lange , 2002(Lange , 2004, regulatory agencies ignored their own regulations at the WTC, whereas asbestos concentrations (bulk and air) for other locations would probably trigger a regulatory response and most likely a citation with a requirement of some action plan. Thus, it appears that there are two standards to be taken from the WTC, one for agencies themselves and another for all others.
The author declares he has no competing financial interests.

The WTC Disaster: Landrigan's Response
My colleagues and I thank Lange for his letter confirming our finding that asbestos was present in settled dust as well as in airborne samples obtained at Ground Zero, the site of the World Trade Center, and for his having agreed with us that this asbestos almost certainly represented an exposure hazard for workers. The asbestos that was detected in the dust at Ground Zero originated from asbestos that had been sprayed onto the steel skeleton of the Twin Towers as fireproofing when the structure was being built. It was well known that asbestos was applied in the North Tower up to about the 40th story and at other locations throughout the structure before the practice of spraying on asbestos was banned in New York City in the early 1970s (Nicholson et al. 1971;Reitze et al. 1972). Concentrations of asbestos in the dust at Ground Zero were highly variable, and the level in any particular sample reflects the location of sampling and the composition of the dust that happened to be in that area. We agree with Lange's view that workers likely had intermittent exposures to asbestos that would have arisen unpredictably when, for example, they picked up a steel beam or turned over rubble and liberated asbestos fibers into the air. The asbestos hazard to workers was magnified by the fact that the U.S. Occupational Safety and Health Administration (OSHA) failed to require constant use of respirators at Ground Zero.
We disagree strongly with Lange's statement that "it is unlikely that exposure to asbestos itself will result in any actual health effects." Lange appears to base his assertion, first, on the fact that most of the asbestos at the World Trade Center was chrysotile asbestos, and second, that duration of exposure for most workers was brief.
Unfortunately neither of those factors conveys protection. We remain concerned that there now exists a risk for mesothelioma caused by occupational exposure to asbestos for the brave men and women who worked and volunteered at Ground Zero.
All types of asbestos fibers, chrysotile included, have been shown in laboratory as well as clinical studies to be capable of causing malignant mesothelioma (Nicholson and Landrigan 1996). All types of asbestos fibers, chrysotile included, have been declared proven human carcinogens by OSHA, the U.S. Environmental Protection Agency, and the International Agency for Research on Cancer. Pathologic studies have found short chrysotile fibers, the predominant type of fiber in World Trade Center dust, to be the predominant fiber in mesothelioma tissue (Dodson et al. 1991;LeBouffant et al. 1973;Suzuki and Yuen 2002). Moreover, mesothelioma has been reported in persons with relatively low-dose, nonoccupational exposure to asbestos of brief duration (Anderson 1982;Camus et al. 1998;Magnani et al. 2001). The greatest future risk of mesothelioma would appear to exist among first responders who were covered by the cloud of dust on 11 September 2001 as well as in other workers employed directly at Ground Zero and workers employed in cleaning asbestos-laden dust from contaminated buildings. Although we agree with Lange that the number of mesothelioma cases will probably not be great, we think it quite misleading to state that no risk exists.
The author declares he has no competing financial interests.

Philip J. Landrigan Mount Sinai School of Medicine
New York, New York E-mail: phil.landrigan@mssm.edu

Trichloroethylene and Cardiac Malformations
In a report of cardiac malformations in rats exposed to trichloroethylene (TCE) in drinking water,  used two (1.5 and 1,100 ppm) of the four treatment concentrations that they reported in a previous study    Table 1.  did not report the number of litters per group, so that correlation was not possible. Regardless, it would be an astonishing coincidence for two studies to produce exactly the same number of fetuses in each group. Still more astonishing is the identical number of "abnormal hearts." Nothing reported by  gives notice that previously published data are being reported again, but that seems to be the inescapable conclusion. If this is a republication of 1993 data, then there has also been reclassification of "defects" with the passage of time.
Another feature of the article by  that attracted our attention was the uncharacteristically large control group (55 litters). One can surmise that in the earlier study , each group would have consisted of approximately 10 females, which is consistent with the size of exposed groups (9-13) reported by Johnson et al. Their control group, however, was unprecedentedly large, both in the context of conventional study design and relative to the other groups in this study.  provided no rationale for designing their study with a concurrent control five times larger than the treatment groups, which leads us to ask whether the control group reported here is, in fact, a composite of controls from multiple, perhaps five, different studies. The immediate impact of this large control group is that the very cardiac "abnormalities" at the 1.5 ppm dose that did not differ significantly from controls in 1993 become statistically significant in 2003.
Conventional developmental and reproductive toxicology assays in mice, rats, and rabbits consistently fail to find adverse effects Environmental Health Perspectives • VOLUME 112 | NUMBER 11 | August 2004 A 607 of TCE on fertility or embryonic development aside from embryo-or fetotoxicity associated with maternal toxicity [Cosby and Dukelow 1992;Dorfmueller et al. 1979;Hardin et al. 1981;Healy et al. 1982;Manson et al. 1984;National Toxicology Program (NTP) 1985, 1986Schwetz et al. 1975]. Johnson and Dawson, with their collaborators, are alone in reporting that TCE is a "specific" cardiac teratogen (Dawson et al. 1990Goldberg et al. 1992;Johnson et al. 1998;Loeber et al. 1988). We have always considered those findings suspect, and our comparison of data from the studies of  and  serves only to intensify our reservations. Studies from this group have potential for important public health and public policy implications, so it is particularly important for the scientific and regulatory communities to have confidence in the conduct and reporting of those studies.
We are also concerned that S.J. Goldberg, one of the authors of the publications alleging that TCE is a selective cardiac teratogen, has been a plaintiff expert in TCE lawsuits and failed to reveal that fact in his publications.

Trichloroethylene: Johnson et al.'s Response
We share Hardin et al.'s belief that any apparent conflict of interest should be reported. We note that Brent provided testimony for the defense in TCE litigation, notably for the same case in which Goldberg (based on his extensive epidemiologic and laboratory research on the effects of TCE) acted as an expert witness for the plaintiff. We did not report Goldberg's experience acting as an expert witness because the point of expert witness is to provide unbiased, factual explanations of extant data. We believe this does not constitute a conflict of interest; we have included a caveat about extrapolating data to humans in our publications. To our knowledge none of our data have been used inappropriately.
The work published in 1993 (Dawson et al.) and in 2003 (Johnson et al.) was actually performed during a much shorter period of time. Many extraneous factors contributed to the late publication of the 2003 paper. Data from our previous work was included in the more recent paper because we needed "boundary values" between or below which we were looking for a threshold or a critical level. This was a long-term study, and it would have been an inappropriate use of animals to repeat the earlier animal studies for those groups. We should have stated more clearly that we were using the groups already studied to prevent repetition and to conserve animal resources, as recommended by the Animal Welfare Act (1990); however, we did refer to our previous paper. Our 2003 publication contained new data as well as previously published data. We welcome this opportunity to clarify our method.
Our alleged reclassification of defects in our Table 2  ) merely reflects careful reevaluation by the cardiologist and minor updates in terminology that mirror current clinical usage to clarify the nature of a defect (e.g., great vessel defect vs. the more specific aortic hypoplasia; L-transposition vs. abnormal looping, etc.). There are other minor numerical differences in the tables (Table 2, Johnson et al. 2003, and  Tables 1 and 3, , not remarked upon by Hardin et al., which derive from the more extensive statistical analysis in the later paper. In an apparent typographic error, we failed to report a pulmonary valve defect for the 1.5 ppm TCE in the 2003 paper. This should have been included in Table 2; however, it would not have changed the number of hearts with defects.
Again, because this was a long-term continuous project, we did use all of the controls together in a cumulative manner. We used the larger sample size with data collected over a long period because it increases the generalizeability of our data, demonstrating clearly the background rate and the variability around rate estimates. Control values A 608 VOLUME 112 | NUMBER 11 | August 2004 • Environmental Health Perspectives Table 1. Cardiac malformations in rats exposed throughout pregnancy to drinking water containing 1.5 or 1,100 ppm TCE.
Cardiac abnormalities a TCE dose Heart malformations b TCE dose  1.5 ppm 1,100 ppm  were consistent throughout our studies. The larger sample size did increase statistical power somewhat in our most recent paper , again without inappropriate use of further valuable animal resources. It should be noted that the increase in statistical power is small compared to the increase generated by the effect sizes and the increase in the number of dose groups-data that can only be generated in a long-term project. Our statistical analysis was simple and conventional. Hardin et al. are incorrect in stating that the differences at the 1.5-ppm dose were statistically significant in our recent paper . The p-values were reported in Figures 1 and 2 of our paper as 0.14 and 0.08, respectively, values not conventionally seen as statistically significant. Different levels of statistical significance used in each of the studies for each of the groups were carefully listed in the tables and figures and explained in the text.
There are many references in the scientific literature about effects of halogenated hydrocarbons on development. We included only a few of these in our articles. We are a multidisciplinary team and have studied both TCE and its major metabolites, often basing some of our work on the findings of others in the field without duplicating the work of others. We have consulted with other prominent researchers in the field from time to time in establishing our experimental design or in interpreting our results.
We have found only heart defects associated with these compounds, despite looking for other effects. This work has been consistent with the original epidemiological studies on which our laboratory work was based. We have been funded by government and other nonbiased agencies requiring competitive grant application and accountability. We have presented our results as peer-reviewed published articles in excellent journals. Our work has all been carried out at The University of Arizona. A major strength of our studies was microdissection of each heart by investigators fully versed in the pathology of congenital cardiac malformations as well as noncardiac anatomy.
We fully agree with Hardin et al. that studies in this area "have potential for important health and public policy implications, so it is particularly important for the scientific and regulatory communities to have confidence in the conduct and reporting of those studies." We believe that our studies have been rationally planned, are statistically and scientifically sound, and are of value for this purpose. We welcome this opportunity for postpublication discussion of results.