Declaring chemicals "not carcinogenic to humans" requires validation, not speculation.

The description of the project suggests that the study was designed to investigate the effects of incinerator emissions on the local population in two suburbs of Antwerp, Belgium, Wilrijk and Hoboken. However, the study was part of a larger research project commissioned by the Flemish government to determine the feasibility and issues to be considered in developing environmental health monitoring. More particular, the study of Den Hond et al (2002) was designed to determine whether evaluation of biomarkers measured as part of the regular medical examination of adolescents at the end of their secondary education period would be feasible. We were asked by the Flemish government to review the results of this project (Cuijpers et al. 2000; Ghoetghebeur et al. 2000). We feel that the objectives and hypotheses underlying a feasibility study should not be convoluted with those relative to the hypotheses arising from specific concerns in certain a priori-defined regions. A major shortcoming of the design is the lack of randomness regarding both the study areas and the recruitment of the participants. The study areas were well-known polluted regions within the conurbation of Antwerp. This knowledge may have influenced the choice of tests to be performed. Researchers should be cautious regarding a priori choices of study areas, whether or not the sponsor influences the choice. The use of volunteers instead of randomly selected participants is another flaw, introducing an extra risk of confounding. As a consequence, for example, the proportion of boys to girls in the three study areas differed considerably: from 0.7 to 3.0. In this case there may have been a selection bias of adolescents in lesser physical condition who volunteered for examination. Reported results were the outcome of analyzing a multitude of associations among the empirical data; they did not emerge after a rigorous test of an a priori-formulated hypothesis, at least not to our knowledge. Such findings are valuable because they may generate hypotheses for further research. In this respect we agree with the authors’ closing sentence (Den Hond et al. 2002):

The description of the project suggests that the study was designed to investigate the effects of incinerator emissions on the local population in two suburbs of Antwerp, Belgium, Wilrijk and Hoboken. However, the study was part of a larger research project commissioned by the Flemish government to determine the feasibility and issues to be considered in developing environmental health monitoring. More particular, the study of Den  was designed to determine whether evaluation of biomarkers measured as part of the regular medical examination of adolescents at the end of their secondary education period would be feasible. We were asked by the Flemish government to review the results of this project Ghoetghebeur et al. 2000). We feel that the objectives and hypotheses underlying a feasibility study should not be convoluted with those relative to the hypotheses arising from specific concerns in certain a priori-defined regions.
A major shortcoming of the design is the lack of randomness regarding both the study areas and the recruitment of the participants. The study areas were well-known polluted regions within the conurbation of Antwerp. This knowledge may have influenced the choice of tests to be performed. Researchers should be cautious regarding a priori choices of study areas, whether or not the sponsor influences the choice.
The use of volunteers instead of randomly selected participants is another flaw, introducing an extra risk of confounding. As a consequence, for example, the proportion of boys to girls in the three study areas differed considerably: from 0.7 to 3.0. In this case there may have been a selection bias of adolescents in lesser physical condition who volunteered for examination.
Reported results were the outcome of analyzing a multitude of associations among the empirical data; they did not emerge after a rigorous test of an a priori-formulated hypothesis, at least not to our knowledge. Such findings are valuable because they may generate hypotheses for further research. In this respect we agree with the authors' closing sentence : … [F]urther studies should be undertaken to confirm or to refute our interpretation of the present findings.
The sexual development of the study subjects was judged by the examining physicians. The correspondence in ratings varied between fair and good (as validated by kappa coefficients) (Fleiss 1981). It is then important to exclude a systematic over-or underestimation by one of the examining physicians. This test was not been reported in the study.
Finally, we would like to remark on the reference the authors made  to an earlier report on pregnancy outcome in Flanders : … [I]n 1997 the Flemish government reported a higher percentage of medically assisted conceptions in the district around the waste incinerators compared with the rest of Flanders.… We feel that this report is incorrectly and selectively referenced, suggesting that the findings are in line with earlier results. Taking into account the mother's age, no significant differences in way of delivery were found in the earlier study . Furthermore, that report does not contain statements about the necessity of medical assistance at delivery. Part of the above criticism was outlined in the review reports  but unfortunately not addressed in the article by Den .
To conclude, we would like to reiterate that we commend the authors with their highly relevant and extensive research work. However, they should have placed their study more clearly in the perspective of the Flemish environmental health project. Their conclusions generate interesting hypotheses for further work but should not be viewed as the results of a cause-and-effect study.

"Sexual Maturation in Relation to Polychlorinated Aromatic Hydrocarbons…": Den Hond et al.'s Response
The Flemish Government commissioned the Environment and Health Study after a competitive call for research proposals in 1998 (Onderzoek Milieu en Gezondheid 1998). From the outset, this research project included two distinct population-based surveys, one in adolescents (16-18 years of age) and another in adults (21-65 years of age). We assume scientific responsibility only for the project in adolescents.
Our study (which was carried out in 1999) sought to investigate whether, in adolescents, biomarkers can show exposure to and health effects of common environmental pollutants . We planned to maximize the gradient in environmental exposure by recruitment of adolescents from a rural control area (Peer, Belgium) and a polluted industrial suburb. For logistic reasons, the adolescents were recruited from and were examined at local schools. Life-long residence in the study areas was a prerequisite for participation. The Flemish Government imposed the selection of the polluted area because of a long-lasting controversy about the possible impacts on public health of two waste incinerators in Wilrijk and a large nonferrous smelter in Hoboken. Health authorities assumed that our study would settle the local concern about detrimental health outcomes associated with the waste incinerators in Wilrijk. In spite of the imposed selection of the polluted area, our study had the advantage of a clear-cut contrast in exposure between the control and polluted areas, as evidenced by all available records on environmental monitoring.
Molenberghs et al. found the lack of randomness a major shortcoming of our study design. However, they miscalculated the proportion of boys to girls, which was 1.5 (60/40) in Peer, 1.0 (21/21) in Wilrijk, and 2.1 (39/19) in Hoboken. We believe that, in the context of the imposed selection, we maximally exploited the remaining degrees of freedom to avoid serious selection bias. Indeed, participants and nonparticipants had similar age, sex distribution, parental social class, and regional residence. In the industrial suburb they resided at similar distances from the main sources of pollution, which strongly suggests random selection with regard to exposure ). Although we cannot extrapolate to the general Flemish population, our conclusions are applicable to the adolescents enrolled in our population-based sample.
After our Data Monitoring Committeewhich was enlarged by three international experts for peer-review-had discussed the study outcomes on 12 May 2000, our results became available for disclosure to the public and subsequent scientific publication. The health administration then appointed Molenberghs and colleagues as the official external reviewers of our study (Goetghebeur et al. 2000;. These statisticians seem to imply that our conclusions should have been restricted to the feasibility of biomonitoring in adolescents and that we should not have disclosed the unexpected medical findings suggesting relevant environmental health risks. To what extent their criticisms are justified and independent remains open for discussion. In this context, it is interesting to note that, on 25 July 2000, the Flemish Minister of Public Health addressed a letter to the principal investigator of our study (J.A.S.) demanding that disclosure of the findings be postponed for 6 months until the study in adults would be completed. This embargo was lifted on 21 November 2000, a few weeks after the municipal elections.
In contrast to the criticism of Molenberghs et al., we actually based our study on a priori hypotheses associating defined biomarkers of exposure with relevant effect markers . This is also the case for the specific article on the endocrine effects of polychlorinated aromatic hydrocarbons (PCAHs) (Den  and other recent papers that emanated from the adolescent study, for example, those on the immune (Van Den Heuvel et al. 2002) and genotoxic (Verheyen et al. Unpublished data) effects. Because we were concerned about the reproducibility of the staging of sexual maturation according to Marshall and Tanner's scale (Marshall and Tanner 1969;Marshall and Tanner 1970) and the measurement of testicular volume using Prader's orchidometer (Dörnberger et al. 1987), we mounted a separate validation study of which the results have been reported and discussed elsewhere .
Molenberghs et al. confuse medically assisted conceptions with medically assisted deliveries. Nevertheless, we acknowledge that age is a major determinant of fertility in women. However, the officially published report did not include an age-adjusted analysis of the proportion of medically assisted conceptions (Advoet et al. 1999 We hope that, with these clarifications, our study is more clearly placed in the perspective of the Flemish Environment and Health Study. We agree with Molenberghs et al. that our findings, although in line with the concept of endocrine disruption and Sharpe and Skakkebaek's hypothesis, require further research. As in all cross-sectional epidemiologic studies, we did not prove causation. However, numerous experimental studies support the causal association between exposure to PCAHs and effects on sexual maturation.

Declaring Chemicals "Not Carcinogenic to Humans" Requires Validation, Not Speculation
Regulatory agencies should provide detailed guidelines on how to use mechanistic and epidemiologic data to dismiss positive cancer evidence obtained from studies in experimental animals. Roberts and Ashby (2002) bemoaned that the U.S. Environmental Protection Agency (EPA) does not provide opportunity to conclude that there is no evidence of carcinogenicity from well-conducted and adequately powered epidemiologic studies. In their letter, Roberts and Ashby provided examples for which they claimed that there is enough information to draw valid conclusions of "no effect." Our examination Environmental Health Perspectives • VOLUME 111 | NUMBER 4 | April 2003 of their examples and viewpoints reveals several critical issues that they have ignored, but which must be addressed in any future regulatory agency guidelines that may allow the dismissal of carcinogenic effects in animals based on mechanistic and epidemiologic data. Roberts and Ashby (2002) appealed to the International Agency for Research on Cancer (IARC) guidelines (IARC 2000a) for evaluating epidemiologic studies to rectify the absence of procedures in the U.S. EPA's guidelines (US EPA 1999) that would allow the conclusion that there is no carcinogenic hazard based on well-conducted epidemiologic studies. However, Roberts and Ashby disregarded the IARC distinction between the classifications "inadequate evidence of carcinogenicity" and "evidence suggesting lack of carcinogenicity" in humans. Requirements for the latter are intentionally stringent, including multiple, mutually consistent, adequately powered studies covering the full range of human exposures that exclude with reasonable certainty bias, confounding, and chance and provide individual and pooled estimates of risk near unity with narrow confidence intervals. In particular, IARC (2000a) cautions that "latent periods substantially shorter than 30 years cannot provide evidence for lack of carcinogenicity." With the IARC criteria in mind, it is surprising that Roberts and Ashby (2002) pointed to epidemiologic studies on clofibrate as an example, where they claim there is sufficient power to detect a cancer increase and that there are enough data to conclude that there is no carcinogenic effect. Although the World Health Organization trial on prevention of ischemic heart disease with clofibrate includes 208,000 man-years of observation in the mortality update (WHO 1984), the 5year treatment period plus the 8-year mortality follow-up period is insufficient to adequately determine the extent of drugrelated cancer incidences or deaths, or to make a valid conclusion of no risk of cancer. Cancer is a group of diseases that most often have latency periods greater than 20 years for clinical manifestation. Further, because there were only 206 total cancer deaths among the 5,331 men in the clofibrate group (WHO 1984), this study has inadequate power to demonstrate lack of a carcinogenic hazard, which should be evaluated for individual sites. This study also provided no information on cancer risk to women. The metaanalysis by Law et al. (1994) did not provide any estimates of cancer risk for clofibrate, though it was cited by Roberts and Ashby (2002) as part of the negative human evidence on the carcinogenicity of this hypolipidemic drug. Certainly, exposure and follow-up durations, gender differences, and study power are critical issues for regulatory agencies to address when formulating guidelines for use of epidemiologic studies to support a claim of no evidence of carcinogenicity in humans despite positive cancer data in experimental animals.
On the use of mechanistic data for evaluating chemical carcinogens, we have repeatedly stressed the need for rigorous testing of mechanistic hypotheses (Huff 1995;Melnick et al. 1997). In this regard it is instructive to examine the basis for IARC's conclusions on the relevance of the carcinogenicity of peroxisome proliferators in experimental animals. As noted by Roberts and Ashby (2002), IARC opined that the mechanism of tumor induction by clofibrate [and di(2-ethylhexyl)phthalate (DEHP)] would not be operative in humans. For DEHP, there is sufficient evidence of carcinogenicity in rats and mice, yet IARC downgraded the classification of this chemical from "possibly carcinogenic in humans" to "not classifiable as to its carcinogenicity to humans" (IARC 2000b). Downgrading DEHP was based on the working group's conclusion that in addition to peroxisome proliferation, cell proliferation induced by DEHP is due to activation of the peroxisome proliferator-activated receptor (PPARα) and that peroxisome proliferation had not been documented in human hepatocyte cultures. However, in considering IARC's evaluation, it is important to recognize that humans possess a functional PPARα and that the working group noted that effects of DEHP in human liver have not been adequately evaluated. Cell proliferation mediated by PPARα was considered by the IARC working group to be the critical step in the carcinogenicity of peroxisome proliferators.
Several deficiencies and critical data gaps in the peroxisome proliferation mode-ofaction hypothesis have been noted (Melnick 2001). Moreover, recent studies have shown that hepatocyte proliferation and peroxisome proliferation occur by different mechanisms, and that PPARα-independent effects that occur in Kupffer cells are required for the induction of cell proliferation and suppression of apoptosis (Parzefall et al. 2001;Peters et al. 2001;Rose et al. 1999). Although the demonstration of the important role of Kupffer cells in peroxisome proliferator-mediated effects is fairly recent, it seems odd that this subject was not mentioned in the IARC evaluation of DEHP (IARC 2000b;Melnick 2003). Based on current information, it is clear that peroxisome proliferators cause liver tumors in rodents, but the mechanism of action is not understood (Peters et al. 2001). Tomatis (2002) warned the occupational and environmental health communities that serious public health consequences might follow if decision-making bodies rely on untested mechanistic hypotheses that are later shown experimentally to be incorrect. Thus, when formulating guidelines for acceptance of mechanistic hypotheses, we expect regulatory agencies to require rigorous testing and validation before using incomplete data sets to downgrade the categorization of chemical carcinogens. Such guidelines must also address whether differences in mechanistic events among species are truly qualitative rather than quantitative in nature. For quantitative differences, the guidelines should also require information on the range of parameter variability in exposed humans so that sensitive subpopulations are not ignored in these categorizations. The issue is not singularly the adequacy of mechanistic or epidemiologic evidence, but certainty for protecting public health.