A benchmark dose analysis of prenatal exposure to polychlorinated biphenyls.

Benchmark dose (BMD) analysis is used to determine levels of exposure to environmental contaminants associated with increased public health risk. In this study we used a benchmark approach to evaluate the risks associated with prenatal exposure to polychlorinated biphenyls (PCBs). We evaluated for intellectual impairment a cohort of children whose prenatal PCB exposure had been assessed from biologic specimens. We calculated BMDs and lower-bound confidence limits (BMDLs) for four end points using four sets of risk criteria. BMDLs were estimated using three different statistical methodologies. The BMDs and BMDLs were remarkably consistent across the four end points for each set of risk criteria, but differed substantially for the different risk criteria. The proportion of the sample considered at risk ranged from 9.8% for the least protective criteria to 74.1% for the most protective. Two methodologies, likelihood ratio and bootstrapping, generated generally similar BMDLs. BMD analysis provides a straightforward, reliable method for evaluating levels of exposure associated with increased public health risk. In the analyses performed in this study, the number of individuals considered at risk depended more on the risk criterion selected than on the outcome assessed.

Polychlorinated biphenyls (PCBs) are a family of synthetic hydrocarbon compounds used extensively between the 1930s and mid-1970s for a variety of industrial purposes such as insulating materials and dielectric fluids in electric transformers and capacitors. Although banned in most industrial nations, they continue to be among the most ubiquitous environmental contaminants. Four prospective longitudinal studies have linked background levels of prenatal PCB exposure to poorer behavioral and intellectual functioning in infancy and childhood: the North Carolina study (1), the Michigan study (2), and studies in the Netherlands (3), and Oswego, New York (4,5). Prenatal PCB exposure is associated with less optimal newborn behavioral function (4,6), poorer infant recognition memory (5,7), lower levels of general intellectual competence during the preschool period [ (3,8,9), but see Gladen et al. (10)], and poorer performance on standardized tests of verbal IQ and reading comprehension at age 11 years (2). These effects have been linked specifically to PCB exposure during the prenatal period; virtually no adverse effects have been observed in relation to postnatal exposure from breast-feeding.
Until recently, most risk assessments conducted for environmental regulation have attempted to identify a no-observedadverse-effect level (NOAEL)-that is, an exposure level below which adverse effects are not observed. The NOAEL methodology was designed for data from laboratory animal experiments in which groups of animals are administered discrete doses of a toxic agent. The NOAEL is defined as the highest dose at which no adverse effect is observed. The NOAEL approach is not well suited for data from studies relating human exposure to adverse effects, where the distribution of exposure levels is usually continuous. In this context it requires the imposition of arbitrary cut points on the continuous data to create discrete exposure groups, a procedure that can generate spurious thresholds in the data and can reduce the power to detect adverse effects.
Some human studies using linear multiple regression analysis to evaluate teratogenic effects have performed supplementary analysis in which the children are divided into discrete groups according to exposure level. Visual inspection of bar graphs displaying the group means generated by these analyses often suggests a threshold of exposure below which adverse effects are not observed. In Rogan et al. (6), for example, poorer newborn behavioral function was evident only in the 49 infants most heavily exposed to PCBs, the top 5.7% of the sample. In Jacobson and Jacobson (2), adverse effects of prenatal PCB exposure on verbal IQ were observed only in the top 16.8% of the sample, although poorer reading comprehension was observed at somewhat lower exposure levels. Adverse effects on intellectual competence were evident at even lower levels in the Netherlands study (3), and poorer infant recognition memory was observed at lower exposure levels in Oswego (5) than in the Michigan study (7). One problem in assessing thresholds in human studies is that threshold values can vary considerably depending on the reliability of the measure of developmental outcome, the sensitivity of the end point at the age at which it is assessed, and the exposure group cut points selected by the researcher (11). Improvements in the reliability and sensitivity of biologic assays for PCBs are likely responsible for the detection of adverse effects at lower exposure levels in the Netherlands and Oswego than in the North Carolina and Michigan studies.
Given the high degree of between-study variability in threshold values and the fact that for some end points no thresholds are seen, it is often very difficult to identify definitive threshold values in human data. One alternative that has been used in risk assessments in recent years is benchmark dose (BMD) analysis (12). BMD analysis assumes that the relation between exposure and adverse outcome is continuous, thus avoiding the identification of specific threshold values, and uses statistical and policy criteria to establish cutoffs for risk assessment. The BMD approach begins by identifying a criterion for adverse effect. For some end points, such as IQ score, the criterion might be based on clinical criteria (e.g., an IQ < 70 is considered to constitute mental retardation). In most analyses, however, the criterion for adverse effect has been the bottom 5th percentile in the distribution of the test scores in a nonexposed population; the latter criterion is referred to as a p 0 of 0.05. The BMD is defined as the level of exposure that will increase the risk of performance below the designated cutoff score by a prespecified amount (e.g., from 5% to 10% or from 5% to 15%). This increase is referred to as the benchmark response (BMR). Given a p 0 of 0.05, a BMR of 0.05 represents a doubling of risk (i.e., from 5% to 10%), whereas a BMR of 0.10 represents a tripling (5% to 15%).
It can be argued that the p 0 = 0.05 criterion used in most benchmark analyses to date is not sufficiently protective because it focuses exclusively on the increased risk of a deficit in the moderate-to-severe range. In our research on prenatal PCB exposure in a predominantly white, middle-class cohort in western Michigan, we found no evidence of moderate-to-severe deficit. All but one of the children performed in the normal range, and the one child who was mentally retarded was excluded on the grounds that she was an outlier (2). Nevertheless, our analyses showed that prenatal PCB exposure at or above 1.25 µg/g was associated with a tripling of the incidence of poor performance in the low-normal range, defined as more than 1 standard deviation below the mean. Although the children performing in that range were not more likely to require special education services, it can be assumed that, given their low IQ and reading scores, they had to struggle to keep up in a normal classroom. In light of these findings, it seems reasonable also to use benchmark analysis to determine the level of exposure associated with an increase in relatively subtle deficits. If subtle deficit is defined as a score more than 1 standard deviation below the sample mean (i.e., in the bottom 16th percentile), the cutoff would be termed a p 0 of 0.16 in benchmark methodology (13). Given a p 0 of 0.16, a BMR of 0.05 would represent a 31.3% increase in risk (i.e., from 16% to 21%); a BMR of 0.10 would represent a 62.5% increase (16% to 26%).
One problem that arises in research on PCBs and other ubiquitous environmental contaminants is the difficulty of identifying a truly nonexposed population to determine the test score cutoff associated with a p 0 of 0.05 or 0.16. Many neurobehavioral tests lack general population norms, and even where norms exist, such as for IQ tests, they may not be relevant for the population being assessed in a given study. In studies of populations in which virtually all individuals have some degree of exposure, we can extrapolate from the dose-response curve for the study sample to determine a test score that would correspond to the cutoff for the bottom 5th or 16th percentile in a truly nonexposed population. The value of the test score where the dose-response curve crosses the y-axis (at the point of zero exposure) would be considered the mean for a hypothetical nonexposed population. A normal curve with that mean and a variance equal to the mean square error from the regression line can be used to represent the nonexposed population. From this distribution one can determine the cut point below which the bottom 5% or 16% of the hypothetical nonexposed population might be expected to score. If p 0 is 0.05 and the BMR is 0.10, the BMD will be the level of exposure at which 15% (i.e., an additional 10%) of the population would be expected to score below the cut point. In benchmark analysis the BMDL, defined as the lower bound of the 95% confidence interval (CI) for the BMD, is used as the principal criterion for regulatory purposes. The BMDL is used instead of the BMD to provide a margin of safety and ensure that the most sensitive individuals in the population are protected.
Our study is the first to apply a BMD analysis to data on the effects of prenatal exposure to PCBs. We examined four end points from the Michigan data set-three from the 11-year follow-up and one from the 4-year assessment. For one end point, full-scale IQ, we compared three methods for computing the BMDL: maximum likelihood estimate, likelihood ratio, and bootstrapping. For all four end points, we compared the BMDs and BMDLs using two cut points, p 0 of 0.05 and 0.16, and two response criteria, BMRs of 0.05 and 0.10.

Materials and Methods
Sample. The Michigan cohort was recruited in four maternity hospitals in western Michigan in 1980-1981, a period during which Lake Michigan fish were relatively heavily contaminated with PCBs (14). Two hundred forty-two mothers who had eaten at least 11.8 kg of Lake Michigan fish during the previous 6 years and 71 mothers who did not eat these fish participated in the newborn phase of the study. We used three biologic  Table 1.

Measures of exposure.
Cord and maternal serum samples were obtained shortly after delivery, and maternal milk samples within 0.2-4.5 months postpartum (median = 1 month). All specimens were analyzed for PCBs by packed column gas chromatography, using the Webb-McCall method (17,18). Because of the limitations of this analytic methodology, PCB values were not detectable in 70% of the cord and 22% of the maternal serum samples. Because placental transfer provides the sole route of fetal exposure to these compounds, which are in equilibrium in fat deposits throughout the body, maternal serum and milk concentrations provide alternatives to cord serum for evaluating prenatal exposure (1). To improve reliability and sensitivity in the assessment of fetal exposure, the cord serum and maternal serum and milk values were converted to zscores and averaged to provide a single measure; serum values were included only if they exceeded the detection limit (9). Eleven children whose cord and maternal serum PCB values were both nondetectable and for whom no milk specimen was available were assigned a prenatal exposure score at the 10th percentile of the distribution. The composite z-scores generated by these analyses were converted to their equivalent values in terms of maternal milk PCB concentration (micrograms per gram on a fat basis) by multiplying each z-score by the sample standard deviation for maternal milk PCB concentration (0.39) and adding the sample mean (0.84).
Because virtually all the children in this sample were exposed to measurable quantities of PCBs, we used the dose-response curve to extrapolate the test scores that would correspond to the cutoffs for the bottom 5th and 16th percentiles in a truly nonexposed population. To determine the equivalent of no exposure on our composite measure, the z-scores corresponding to zero on each of the components of that measure were determined and averaged together, yielding a composite z-score of -1.65. The value of each end point on its dose-response curve corresponding to an exposure of -1.65 was considered the mean for a hypothetically nonexposed population. For full-scale IQ, a normal curve derived from that mean and a variance equal to the mean square error of the regression yielded a cutoff of 92.9 for a p 0 of 0.05 and 100.3 for a p 0 of 0.16. Cutoffs for these two values of p 0 were derived for each of the other end points as well.
Outcome measures. We performed BMD analyses on four cognitive outcomes previously found to be related to prenatal PCB exposure in the Michigan cohort (2,8). The three end points from the 11-year follow-up study were full-scale IQ from the WISC-R, word comprehension from the Woodcock Reading Mastery Tests-Revised (19), and average reaction time from Kail's (20) mental rotation task. In the mental rotation task, the child must determine whether a letter displayed on a computer screen is forward or backward (i.e., mirror image). Latency to press a forward or backward button on the computer is tabulated. Task difficulty is varied by rotating the stimulus letter at varying angles (e.g., 30°, 60°, 90°) clockwise from the vertical. One end point, the McCarthy Memory Scale (15), was examined from the 4-year assessment.
Control variables. Twenty control variables were evaluated for the analyses of each of the end points: the 18 indicated in Table  1, age of child when tested, and child examiner. We used correlational analysis to determine which control variables needed to be adjusted statistically to control for confounding (except for examiner, where one-way analysis of variance was used). Because a control variable cannot be the true cause of an observed deficit unless it is related to both exposure and outcome (21), association with either exposure or outcome can be used as the criterion for statistical adjustment. In this study we selected control variables in relation to outcome, as recommended Kleinbaum et al. (22). All control variables even weakly related to each outcome (at p < 0.10) were controlled statistically by regressing the outcome in question on the control variables related to it. We used the residualized outcome scores in the benchmark analyses reported here. We added the mean value for each outcome to its residual score to restore it to its original units.

Calculation of BMDs and BMDLs.
Calculation of the BMDs followed the framework given by Crump (12), a brief outline of which is presented here. For a given dose d, there is a resulting continuous response X governed by a distribution function F. One of the parameters of F, Θ(d), depends upon d, and the remaining parameters, represented by α, do not. Given the distribution function F and the value of p 0, a cutoff score x 0 can be calculated with responses more extreme than x 0 considered abnormal. Using the above notation when larger responses are more adverse, the probability of having an abnormal response as defined by x 0 given a subject's dose d is given by where F is the distribution function that governs the continuous response X; d is the dose the subject received; x 0 is the score on the outcome that corresponds to the chosen p 0 ; Θ(d) is the dose-dependent parameter that describes the dose-response relationship; and α is the vector of parameters that does not depend on d.
The BMD is calculated by solving the following equation: The outcomes assessed in this paper were all approximately normally distributed, and a Gaussian distribution function was used. The dose-dependent parameter Θ(d) was assumed to be a linear function of d, and the standard deviation of the errors α was assumed not to depend on the dose. We used maximum likelihood estimation (MLE) to obtain the regression estimates used to calculate the BMDs.
We used three methods to calculate the BMDLs. The first, MLE, is based on the asymptotic normal distribution of the maximum likelihood estimate of the BMD. The second is based on the asymptotic chi-square distribution of the likelihood ratio statistic, which Crump (12) recommends for computing the BMDL because it is more robust than MLE. For low-dose extrapolation problems, the distribution of the BMD is skewed and the lower limit based on the asymptotic normal approximation is unreliable. We calculated the likelihood ratio-based confidence limits using the profile likelihood method (23), a general approach for calculating confidence limits for parameters such as the BMD that are defined in terms of more basic parameters. The profile likelihood method reduces the log-likelihood function to a function of a single parameter of interest by treating the other parameters as nuisance parameters and maximizing over them. In the third method, a bootstrap approach (24), the BMD is estimated from 10,000 samples drawn at random (with replacement) from the data set. Each bootstrap sample has the same n as the original study sample, but in any given bootstrap sample some subjects are randomly selected for inclusion more than once, while others are omitted. Using the percentile method (25), we calculated the BMDL as the cut point below which 5% of the bootstrapped BMD values fell. Figure 1 presents a scatterplot relating the 11-year full-scale IQ scores (residualized for control variables and rescaled as described above) to the composite prenatal PCB exposure measure. Table 2 compares the BMDs and BMDLs for full-scale IQ for four p 0 -BMR combinations, with the BMDLs derived using the three computational methods described above. Not surprisingly, the BMDs vary considerably depending on the criterion for adverse effect (p 0 ) and the increase in risk (BMR) being stipulated. The BMD for the least protective criteria, a BMR of 0.10 for moderate-to-severe deficit (p 0 = 0.05), is 1.44 µg/g. Only 9.8% of the cohort were exposed above that level. The BMD for the most protective criteria, a BMR of 0.05 for subtle deficit (p 0 = 0.16), is 0.58 µg/g. Using those criteria, 74.1% of the Michigan sample would be considered at increased risk from prenatal PCB exposure. It is interesting that the BMDs for a smaller increase in severe risk (p 0 = 0.05, BMR = 0.05) are very similar to those for a greater increase in the risk of a subtler deficit (p 0 = 0.16, BMR = 0.10). The reason for this similarity is that both sets of criteria represent a 0.35 standard deviation shift on the normal curve.

Results
The three computational methods generated similar BMDLs for the least-protective criteria (p 0 = 0.05, BMR = 0.10), possibly because the values estimated by all three methods are well within the range of the observed data (Figure 1). By contrast, the BMDL estimates were quite different for the most protective criteria (p 0 = 0.16, BMR = 0.05), especially the MLE estimate, which was much lower than those generated by the other two methods. For the two intermediate sets of criteria, the likelihood ratio and bootstrap approaches provided generally similar BMDL estimates, whereas the MLE estimates were lower, confirming Crump's (12) concern that under some circumstances MLE estimates might be unreliable. One major advantage of the bootstrap approach is that it makes no assumptions about the shape of the BMD distributions. For that reason and because it is much less difficult to compute, we used bootstrapping to determine the BMDLs for the other three end points (Table 3). Figure 2 shows the dose-response relations generated when the four end points included in this article were examined in relation to prenatal PCB exposure divided a priori into five groups. For three of these end points, visual inspection of the bar graphs suggests that no adverse effect is seen below 1.25 µg/g, whereas the threshold for reading word comprehension appears to be notably lower (1.0 µg/g). By contrast, with the BMD methodology (Table 3), the BMDLs for reading word comprehension are virtually indistinguishable from those for full-scale IQ. Although the BMDLs for the 4-year McCarthy Memory Scale are consistently the highest across all four sets of analyses, the magnitude of the differences between the 4year and 11-year BMDLs is very small.

Discussion
BMD analysis is particularly well suited for risk assessment based on continuous data from human exposure studies, in which it is often difficult to identify discrete doseresponse thresholds. It has been used, for example, in recent risk assessments for prenatal methylmercury exposure in both the United States (26) and Canada (27). Crump (12) recommends MLE analysis to generate the BMD values, but because standard methods for computing confidence limits based on MLE are often unreliable for calculating BMDLs, he recommends using likelihood ratio-based confidence limits for the BMDL. The latter are computationally complex, however, and not available in standard computer packages. This study is among the first to use a bootstrap approach to generate BMDLs. The likelihood ratio and bootstrap approaches generated similar BMDLs for three of the four sets of p 0 -BMR criteria examined in this study. Two major advantages of bootstrapping are that it makes no assumptions regarding the distribution of the data, and it is relatively easy to program in packages such as SAS (28), S-PLUS (29), and Resampling Stats (30).
A priori division of the continuous distribution of prenatal PCB exposure levels into distinct exposure groups appears to reveal nonlinearities in the dose-response relationships ( Figure 2) that might provide the basis for identifying a NOAEL. These nonlinearities may be misleading, however, because they may be influenced by the selection of   Table 3) to prenatal PCB exposure, based on the asymptotic normal distribution of the maximum likelihood estimate. The confidence limits flare at both ends of the distribution because the regression line is measured most reliably at the center of the dose-response distribution.  the exposure group cut points, which have no biologically meaningful basis. Because the number of children in each group is relatively small, there is also a risk that a few individuals may unduly influence a group mean.
A major advantage of the benchmark methodology is that the BMD and BMDL are derived from the slope of the entire dose-response function. Although most benchmark analyses performed to date have used a linear model, this methodology can also be applied to nonlinear dose-response curves (12). In this study the BMDs and BMDLs derived from benchmark analysis were remarkably consistent across four different end points at two different ages. Benchmark analyses performed on data sets from three different studies of prenatal methylmercury exposure also showed remarkable cross-end point consistency within each of the studies examined (27). The BMDs and BMDLs differed considerably, however, among the three methylmercury studies. Similarly, we would anticipate that BMDs and BMDLs derived from the more recent Netherlands (3) and Oswego studies (4,5), which used more sensitive measures of prenatal PCB exposure than were available for the Michigan study (2), might well be lower than those reported here.
Another methodology that has sometimes been used to identify no-effect levels is nonparametric regression (31,32). Nonparametric regression fits a series of curves in overlapping segments corresponding to small regions of the dose-response relationship, to which a scatterplot smoothing technique is then applied. The resulting curve is often nonlinear, relatively flat at lower levels of exposure, and becomes steeper as exposure increases. However, it is usually not possible to discern a discrete no-effect cutoff in this continuous distribution. Moreover, because each of the segments used to construct the nonparametric curve is based on a relatively small number of cases, the curves generated for different end points frequently bend at quite different thresholds (32).
Besides providing BMD criteria likely to be stable across a range of end points (at least within a given study), the BMD approach enables the risk assessor to derive regulatory standards for exposures for which no reliable NOAEL values have been detected. The absence of a reliable NOAEL is also common in laboratory animal experiments, where adverse effects are often evident even at the lowest doses tested. Moreover, a NOAEL derived from one study is often superseded by evidence of adverse effects at lower exposure levels in subsequent studies that use more reliable exposure measures or more sensitive end points. In most cases, therefore, it may be most reasonable to assume that there is no true no-effect level and to derive BMDs and BMDLs from the dose-response curve.
One important feature of the benchmark methodology is that it focuses the attention of the policy maker more directly on the cost-benefit tradeoffs that environmental  Table 3 and rescaled as described in the text) to prenatal PCB exposure divided a priori into five groups.  regulation almost always entails. Given that it is unlikely to be economically feasible to completely rid the environment of every substance for which there is some evidence of adverse effect, the benchmark approach requires the regulator to determine the level of risk of increased adverse effect he or she is willing to tolerate. The data reported here make clear how dramatically the selection of regulatory criteria can alter the proportion of the population deemed to be at risk. Analyses based on visual inspection of the dose-response data ( Figure 2) suggest that 15-35% of the Michigan cohort was at risk from prenatal PCB exposure. Based on the least-protective benchmark criteria we tested, only about 10% of the Michigan cohort would have been considered at risk, whereas using our most stringent criteria, almost 75% would have been deemed at risk. It is surprising that the least-protective criteria examined in this paper are the ones that have been used most frequently in benchmark analyses performed to date (33,34). These criteria (a p 0 of 0.05 and a BMR of 0.10) are designed to protect against the tripling of the risk of moderate-to-severe deficit (from 5% in a nonexposed population to 15%). It would seem more appropriate to tolerate at most a doubling of the incidence of moderate-to-severe deficit and/or to base risk assessment on the prevention of more subtle deficit.