Characterization of a developmental toxicity dose-response model.

The Rai and Van Ryzin dose-response model proposed for teratology experiments has been characterized for its appropriateness and applicability in modeling the dichotomous response data from developmental toxicity studies. Modifications were made in the initial probability statements to reflect more accurately biological events underlying developmental toxicity. Data sets used for the evaluation were obtained from the National Toxicology Program and U.S. EPA laboratories. The studies included developmental evaluations of ethylene glycol, diethylhexyl phthalate, di- and triethylene glycol dimethyl ethers, and nitrofen in rats, mice, or rabbits. Graphic examination and statistical evaluation demonstrate that this model is sensitive to the data when compared to directly measured experimental outcomes. The model was used to interpolate to low-risk dose levels, and comparisons were made between the values obtained and the no-observed-adverse-effect levels (NOAELs) divided by an uncertainty factor. Our investigation suggests that the Rai and Van Ryzin model is sensitive to the developmental toxicity end points, prenatal deaths, and malformations, and appears to model closely their relationship to dose.


Introduction
Current methodology for quantitative developmental toxicity risk assessment is based on the identification of no-observed-adverse-effect levels (NOAELs) and the application of uncertainty factors (UFs) to them. Other articles in this issue discuss the problems and limitations inherent in this approach (1,2). In particular, emphasizing a single dose level, the NOAEL, to set regulatory target doses ignores most of the information on the relationship between dose and response and does not take into account the variability inherent in the data. This article presents the authors' initial efforts to characterize and modify a doseresponse model that has been proposed as an alternate approach to current risk assessment in developmental toxicity.
Since models of carcinogenic risk cannot be used for data from developmental toxicity studies because they do not allow for the relatedness or nonindependence of observations from the same litter, new modeling approaches are needed that take these points into consideration. In particular, theoretical mathematical models are needed to provide estimates of risk levels at specified low doses and estimates of doses at specified low levels of risk that are not strongly dependent upon the choice of the lowest dose administered in a study and to provide estimates for which confidence (uncertainty) statements can be made.
A model proposed in 1985 by Rai and Van Ryzin (3) for characterizing dose response for dichotomous data in teratology studies was chosen for investigation because of its potential for adaptation and extension to reflect simple mechanistic aspects of the underlying biologic processes. This model can be described as a twostage probability model with flexible distributional assumptions. The model was designed to estimate the probability of teratological outcomes for individual fetuses. However, as originally proposed, the probability statements assumed teratogenicity contingent on maternal toxicity. Moreover, the data that were used to demonstrate the model application were from a dominant lethal study in which males were exposed to X-irradiation prior to mating. In the present paper, the model probability statements are reinterpreted to fit the typical developmental toxicity study with exposure to the dam during the period of major organogenesis and evaluation of the offspring at term. The model was characterized using data from developmental toxicity studies conducted by the National Toxicology Program and the U.S. EPA Health Effects Research Laboratory.

Model Description
Basically, the Rai and Van Ryzin (RVR) model is the product of two component probabilities-the first unconditional and the second conditional upon the first-that represent two related stages in the development of an adverse developmental effect (response) under study. The model estimates the probability that an individual fetus from a litter of specified size s, whose dam was exposed during the appropriate period to dose d, will exhibit the response of interest. The relatedness within litters (intralitter correlation) that is due to having the same dam is addressed by an effort to account for the variability among dams (litters) in the same dose group. This interdam or interlitter variability is represented by a variable for litter size, which is included in the second probability component of the model, structured upon the assumption that larger litter size indicates a healthier dam and hence a lower risk to the individual fetus.
Expression of the RVR model has been extended to the following general form: where P (d,s) is the probability of an adverse developmental effect on a fetus from a litter of size s, whose dam was exposed to dose d; X (d) is the probability of a predisposing change (disturbance) in the litter environment; and P[d,s; k(d)] is the conditional probability of an adverse developmental effect on a fetus from a litter of size s, whose dam was exposed to dose d, given that a predisposing change occurred in its litter environment.
As an initial approach, Rai and Van Ryzin suggested the one-hit model to represent k (d), the probability that the toxicant will produce a change in the litter environment that can lead to a specific adverse effect in the individual fetus. However, other models may be used to represent this probability of a litter effect.
The one-hit model assumes that a response can be induced after a single susceptible target has been hit by a single biologically effective unit of dose (4). The probability of this hit is taken to be proportional to the dose level, i.e., Pr(hit) = (d, for some positive value of (. If the number of hits is assumed to follow the Poisson distribution (a distribution used for small numbers of events or counts) with the parameter of the distribution (the mean number of hits) equal to (d, then the probability of a response at dose d equals the probability of at least one hit, which is one minus the probability of no hits, or 1 -exp(-Pd). This version of the one-hit model is often referred to as the linear nonthreshold dose-response model since at low doses (small values of fdi) the probability of a hit is approximately equal to Pd.
Background risk is incorporated into this model by adding an intercept, oc, to the linear expression in the exponent. The first component of the RVR model, the probability of a predisposing effect on the litter environment, is thus expressed as 1 -expl-(a + Pd) I. The tendency of the one-hit model toward linearity at very low doses produces higher estimates of risk, and hence more conservative safe dose estimates than risk models that have convex (concave upwards) shapes, such as the probit model. Rai and Van Ryzin expressed the second component of the model as an exponential function, with the exponent modified by (multiplied by) litter size, s. Although it could be any function of dose, they chose the exponent to be linear in dose, and again incorporated an intercept term, 01, to allow for a background risk, i.e., the conditional risk of a fetal response in the no-dose group. The exponent in the second component is thus expressed as (-s(01 + 02d)), with the coefficient 02 indicating the importance of dose level in the conditional risk to the fetus.
Since the second component of the model is also a probability and cannot be greater than 1, it reduces the primary probability, that of effect on litter environment, by modifying it through the influence of litter size and dosage to obtain the risk to the individual fetus. This formulation of the model, with the constraint that probability must be positive and no greater than 1, does not allow increased litter size to increase the risk to the fetus, but only to decrease the risk, or to have no effect. The complete model is given as P(d,s) = [1-exp{-(ca+f3d)}] exp{-s(01+02d)} [2] This form of the model estimates the risk to the individual fetus when the litter size is known. In order to generalize the model across all observed litter sizes, while maintaining the adjustment for litter size, a theoretical distribution for litter size must be postulated. The overall risk to the fetus at a specific dose is then determined by integrating the risks for each litter size at that dose over the assumed distribution of litter sizes. This is similar to taking the weighted mean of the risks for each litter size, with weights equal to litter size frequencies based on the assumed distribution. The observed litter size frequencies for a specific dose group are used to estimate the distribution parameters at that dose.
The litter size distribution that is incorporated into this version of the model is the Poisson distribution for counts, and in this case the counts are the numbers of littermates. This model version assumes that the Poisson distributions parameter, E(S), the expected (mean) litter size, changes with dose, but not linearly. Instead, the expected litter size is assumed to have the following nonlinear form in dose d E(S) = 4lexp (-42d) [31 where the parameter°l is the expected litter size in the controls, and the new parameter 02 represents the effect of dose on litter size. This formulation implies larger litter sizes at lower doses if 02 is estimated to be greater than zero, and smaller litters if its estimate is negative.
When the appropriate integration (summation here because litter size is a discrete variable) is carried out, the final model for estimating individual pup risk at dose d is [4] with the parameters, a, ,B 01, 0(2 p1, and 02 appropriately constrained to ensure that neither of the two component probability estimates is greater than 1 nor negative. It can be seen that, while linear functions of dose are employed in the exponents, the risk model itself is nonlinear. If, instead of the one-stage model, a multistage model were used for the first component (the probability of a predisposing change in the litter environment), then the dose function in the exponent would itself be nonlinear. In the model as formulated above, the background (d = 0) probability of disturbance in the litter environment is equal to 1-exp(-a). The overall background risk is expressed as P(O) = {1-exp(-a)lexp[-j{1 -exp(0j)I] [5] Thus, when exposure to the toxicant under study is zero, as in the controls, this model still allows for a nonzero probability of the predisposing litter disturbance and a nonzero conditional probability of an adverse effect on the fetus.

Parameter Estimation and Hypothesis Tests
Estimation of the model parameters was done using the maximum likelihood method that maximizes the fit of the model to the actual data set and assumes a binomial distribution for events within a litter, i.e., the number of affected fetuses out of a litter of size s. Thus, in this model, the variable for litter size functions in two capacities, one as a varying risk set (subsample) size, and another as a covariate to represent differences among litters.
Hypotheses on the parameters are tested by the con-ditional likelihood ratio test. In this version of the model, the hypotheses test the degree of exponential linearity of the dose effect in each of the two components. Testing the hypothesis that 3 = 0 indicates whether there is a relationship between dose and the probability of litter disturbance. Technically, testing 02 = 0 indicates whether the relative advantage to the fetus of being from a litter one size larger varies with dose. Since the model is constructed on the assumption that larger litter size is associated with reduced risk, this test is an indirect way to investigate dose effect on individual fetus risk, conditional upon a predisposing change in litter environment. The parameters a and 01 could also be similarly tested to obtain information on the significance of background risk, as described in the previous section.

Estimation of Low-Risk Doses
Use of a mathematical model enables the estimation of risk at any specified dose and the estimation of dose for any specified risk, as well as the confidence limits on the risk or dose estimates. In particular, a model allows those low doses that are associated with very low specified risks, such as 1 in 10,000, to be estimated, as well as their lower 95% confidence limits. For their model, Rai and Van Ryzin provide two different methods for approximating the lower 95% confidence limit on these low-risk doses: the first is dependent on the distribution of litter sizes, and the second is not. Both methods estimate the lower limit in terms of the added risk over background, since this is the risk of interest at low doses.
In the first method, a low level of the added risk, t*, is specified, say 10-4, and the following equation is solved for d to obtain the estimated dose that will produce no greater added risk: [61 with the maximum likelihood estimates of the parameters substituted into P(d) and P(O). The variance of the estimated dose is obtained from the Information matrix (the Information matrix is the negative inverse of the covariance matrix of the likelihood function). The approximate lower 95% confidence limit of the dose estimate is calculated by assuming that the dose estimate is normally distributed.
The second method uses the fact that the first probability component of the model is actually an upper bound on the model. P(d,s), the risk estimate for a fetus from a litter of size s, decreases as litter size s increases, and thus has a maximum value when s = 0, i.e., where P(d,s) = P(d,O) = X (d), its upper bound. Thus, a conservative method for interpolating to a low-risk dose is to consider the added risk over background only for the first component probability. The formulation is then t(d) = k(d) -k(0), which, for the one-hit model, is 231 ,A* = 74d) = P(d) P(O) = 10-4 approximately equal to (1d) exp(-o). The equation to be solved for d, the low-risk dose in this case, is: The approximate variance of the estimated dose is derived by taking its natural logarithm and applying the formula for the variance of a ratio. Again, the component variances are obtained from the Information matrix based on the likelihood function. The approximate lower 95% confidence limit is calculated for the estimated low-risk dose by assuming that its logarithmic transform is normally distributed.

Application of the Model to Data
The RVR model has the flexibility to allow the end point data to vary in accordance with the study design and the research goal. It can assess dose-response relationships when adverse effects are taken to be non-live implants, total or specific malformations, total adverse outcomes, etc. It requires only that the appropriate measure of litter size be used in each case, e.g., total implantations for non-live implants; number of live fetuses at term for malformations.
In all the NTP data sets analyzed, the risk for two categories of end points were considered: (a) non-live, i.e., the total number of deaths and resorptions out of the total number of implantations; and (b) mal, i.e., the number of fetuses with at least one malformation out of the total number of live fetuses. For the nitrofen data (perinatal toxicity study), a single adverse outcome was analyzed, i.e., the sum of the pre-and perinatal deaths and the cyanotic pups at birth out of the total number of implantations.

Results of Model Application
The model-estimated risk probabilities and their two components are displayed in Table 1 for both nonlive and mal in 10 of the data sets. The second column gives the first component, i.e., the probability of change in the litter environment. The next column gives the second component, i.e., the probability that a fetus displays the effect, given that the precipitating change in the litter environment has occurred. The last column is the overall or final (unconditional) probability of a fetus response and is the product of the two component columns. It should be noted that the two component risk probabilities are not determined independently, since the four parameter estimates upon which they are based are jointly (simultaneously) estimated in the process of maximizing the likelihood function.
In Table 1 a pattern emerges among these studies in which the risk to the litter environment tended to increase in small increments as dose increased, but the conditional risk to an individual litter member increased in sizeable increments only at the top dose(s). While the overall risk to the fetus tended to combine these two probability patterns into one with more moderate, but still notable, increases at the highest dose, in individual studies the final fetus risk pattern was closer either to the litter risk (e.g., EG, mice, mal) or to the conditional fetus risk (e.g., TGDM, mice, nonlive). The actual size of the final risk is a product of both components and will not necessarily be close to either one. However, one can think of the primary probability as being the risk to the litter environment, which is then modified (decreased) by the conditional probability of risk to an individual fetus in that litter.
The results of fitting the Rai and Van Ryzin model to these data sets can be observed more easily when graphed. Figures la-k show three point estimates at each dose in each study for the two end points modeled: one is the RVR model estimate of fetus risk, the second is the average proportion per litter of affected littermates, and the third is the average percentage of litters with at least one member affected. The second and third estimates are from calculations presented in the original reports on these studies (3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13). All three are estimates of risk probabilities. The first two estimate in different ways the probability at a specific dose that an individual fetus is affected. The third estimates the probability at a specific dose that a litter is affected with the criterion being that at least one litter member is affected.
The point estimates of risk from the model are connected in the graphs in order to represent the continuous property of the RVR model. However, the actual graph of the model would be represented more accurately by a curved line of the continuous function expressed by P(d). Since the first two point estimates named previously (RVR model estimate and average proportion per litter of affected littermates) express the risk to an individual fetus, and the third (average percentage of litters with at least one member affected) reflects the risk to a litter, the third point is markedly higher than the other two because it represents a broad range of risk scenarios for any one affected litter, from only one affected fetus to the entire litter affected. Only when every affected litter contains no unaffected fetuses   ship in litter and fetus risk for both end points in the DEHP studies was significant in mice, but only in the fetus risk for the non-live end point in rats.

Comparison of Low-Risk Dose Estimates with the NOAEL/UF Approach
Interpolations were done as part of this analysis in order to compare the model dose estimates at very low risks with the results of the method currently in use, i.e., the NOAEL/UF approach. The model-estimated low-risk doses were determined for a risk of 1 in 10,000 by two methods described earlier. A comparison of doses that are estimated to be acceptable by specified risk standards is given in Table 3 for the non-live end point and in Table 4 for malformations. The NOAEL in each case was determined from the original study reports to be the highest experimental dose tested that did not produce a statistically significant (p = 0.05) adverse effect when compared to controls. The NOAELs were then divided by a UF of 100, i.e., a 10-fold factor for interspecies variability and a 10-fold factor for withinspecies variability (16). In those cases when all doses showed a statistically significant effect compared to controls, the lowest dose was considered the lowestobserved-adverse-effect level (LOAEL) and was divided by an additional factor of 10.
The model-estimated doses were within an order of magnitude or were no more than one order of magnitude smaller than the uncertainty factor-adjusted NOAELs in all but three cases: TGDM mice, non-live by method 2; DYME mice, mal by method 2; and TGDM rabbits, mal by both methods. In the case of TGDM mice, non-live, there was no significant increase by pairwise comparison at any dose tested, so that it was not possible to determine a NOAEL accurately. However, this was not the case for the DYME mice, mal end point. The greatest discrepancy was between the NOAEL/UF and the model-estimated low risk doses for TGDM rabbits, mal. In this case, it can be seen from the graph (Fig. if)  is the individual fetus risk equal to the litter risk. To the extent that only a portion of fetuses in a litter are affected, the risk probabilities between litter and fetus will diverge.
The graphs show that the model reflected strong dose-response relationships, such as those observed for both categories of end points, non-live and mal (Fig. ic,   f, g, and h), for mal only in ethylene glycol-treated mice and rats (Figs. la and b), and for perinatal toxicity in nitrofen-treated rats (Fig. lk). The model also reflected the shallow dose-response relationships observed for nonlive in ethylene glycol-treated mice and rats (Figs.  1 a and b), for mal in DEHP-treated rats and theophylline-treated mice, and for both responses in mice following exposure to TGDM (Fig. le). Figure lj shows examples of instances where there was no doseresponse relationship and the data could not be modeled.

Goodness-of-Fit Test
The test proposed by Rai and Van Ryzin for determining the fit of the model to any particular set of data is appropriate only for large sample sizes. Their suggested remedy, to group litters of different but similar sizes, was not considered to be satisfactory, and a different test that did not require large samples was used. This goodness-of-fit test is based on the pure error obtained from replications, using the litters of the same size from the same dose group as the replicates. The difference between the total sum of squares of residuals (the deviations of the data points from their model predictions) and the replicates' sum of squares provides the sum of squares due to lack of fit. The ratio of the mean square due to lack of fit to the mean square for pure error has an F distribution with the appropriate degrees of freedom.
Of the 19 data sets, this version of the RVR model was found to fit 15; the fit for each of the remaining 4 data sets was rejected at the p = 0.01 significance level. These data sets were the nonlive response for DEHP mice and EG rats, and the mal response for TGDM rabbits and THEO mice (a data set with minimal dose response). However, the results of the model evaluations for these four data sets are included for comparison.
Further study of these data sets with respect to the deviations between the observed data points and their model predictions could provide insight into possible alternate versions of the RVR model that would better represent these four dose-response relationships. Interactive investigation of a hypothesized model and its actual fit to a specific data set can work toward both model improvement and increased understanding of the information in the data.

Hypothesis Tests of Dose-Response Relationships
The results of testing the significance of the parameters that estimate the relationship, as modeled, between the risk probabilities and dose in these data sets are given in Table 2. For the RVR model, this involves testing whether the parameters 3 and 02 could be significantly different from zero, with the significance level set at p = 0.05. If a significant difference from zero is found, it means that a relationship with dose is likely to exist. But if a significant difference from zero is not found, it means only that the relationship between dose and risk that is postulated in the model could not be confirmed by the data. There still may exist a dose-response relationship in the data that was not adequately modeled.
In Table 2 The closest agreement in identification of a significant dose-related effect was between the two estimates of individual fetus risk, i.e., the model estimate and the mean proportion affected per litter. In only three cases was there a disagreement between these estimates, i.e., non-live for EG mice, mal for DEHP rats, and mal for THEO mice. In all three of these cases the model-defined dose-response relationship was not found to be significant, although there was a slight though sufficient dose-related increase that could be identified by the linear trend test in the original study report. However, in none of these cases did the pairwise comparisons show a statistically significant increase in response above control values for any of the doses tested. When P is not significantly different from zero, but 02 is, the dose relationship is indicated to be stronger for the individual fetus than it is for the litter environment. This is seen clearly for the non-live end point in the modeled studies. A significant dose relationship was not often found in the modeled risk to litter environment (,B) for the non-live end point. In five cases where the dose relationship was significant in the conditional risk to the fetus (02), it was nonsignificant in the risk to the litter environment. This may be related to the fact that the background incidence of prenatal deaths is relatively high, and the doserelated effect must be relatively strong to represent a significant increase. For the mal end point, in every case except one (TGDM mice), when the dose relationship was significant in the conditional risk to the fetus, it was also significant in the risk to the litter environment. Conversely, this may reflect the fact that the background incidence of malformations is relatively low so that a relatively moderate increase with dose may be found to be significant.
Comparisons of the results in Table 2 with Figures la-k illustrate the roles of the dose relationships in the  dTaken from results of linear trend tests in original study reports; ND = not done. 'Model parameters were not estimable due to absence of consistent relationship between dose and response. cValue here is highest dose tested; no statistically significant LOAEL was found. True NOAEL could be much higher. dTests of significance for percentage of litters with response were not conducted in the original study. Values here are taken from tests done on the number of litters with response.  cValue here is lowest dose tested; and all doses were statistically significant when compared to control values. This value is assumed to be the LOAEL and an additional 10-fold UF (total of 1,000) was used. dTests of significance for percentage of litters with response were not conducted in the original study. Values here are taken from tests done on the number of litters with response.
eValue here is highest dose tested; no statistically significant LOAEL was found. True NOAEL could be much higher. fInsufficient dose-response relationship. mg/kg/day) was associated with a response level (percentage affected per litter) that was low compared to the next lower dose and was also much lower than the model estimate. If the response level had been as high as expected, based on the incidence at adjacent doses, then a lower NOAEL would have been chosen that, when divided by a UF of 100, would be closer to the modelestimated low risk dose.
Thus, based on comparisons with the RVR model, the current approach for developmental toxicity risk assessment (NOAEL/UF) in most cases results in the identification of acceptable exposure levels that compare with model dose estimates for 103 to 104 excess risk. This is not surprising, as the NOAEL is likely to be in the response range of 1 to 10% and a 100-fold UF would reduce the risk to between 1 in 1,000 to 1 in 10,000 (17).

Summary
Preliminary studies have been conducted to evaluate the applicability of a dose-response model for dichotomous responses in developmental toxicity. The flexibility of the RVR model has been discussed, as well as the potential extension of this model to reflect mechanistic aspects of biological processes. We have shown that multiple hypothesis testing is possible with this model to determine the relationships of risk to dose. An important strength of this model is that lowdose interpolation is carried out using information from individual fetuses and litters. In addition, confidence limits can be calculated both for the estimated risk at any dose, and for the estimated dose at any specified risk level.
Our studies have shown that the RVR model as described here closely models the data from standard developmental toxicity studies for the proportion of non-live implants and the proportion of malformationed fetuses. Although the model may not account for all the variation due to litter differences (18), it uses litter size to account for a portion. Future direction for these evaluations include characterization of additional data sets, the addition of covariates other than litter size, the incorporation of intralitter correlation, and the application of other modeling functions, e.g., multihit, multistage, to express litter or individual fetus risk.