Statistical analysis of Clewell et al. PBPK model of trichloroethylene kinetics.

A physiologically based pharmacokinetic model for trichloroethylene (TCE) in rodents and humans was calibrated with published toxicokinetic data sets. A Bayesian statistical framework was used to combine previous information about the model parameters with the data likelihood, to yield posterior parameter distributions. The use of the hierarchical statistical model yielded estimates of both variability between experimental groups and uncertainty in TCE toxicokinetics. After adjustment of the model by Markov chain Monte Carlo sampling, estimates of variability for the animal or human metabolic parameters ranged from a factor of 1.5-2 (geometric standard deviation [GSD]). Uncertainty was of the same order as variability for animals and higher than variability for humans. The model was used to make posterior predictions for several measures of cancer risk. These predictions were affected by both uncertainties and variability and exhibited GSDs ranging from 2 to 6 in mice and rats and from 2 to 10 for humans.

The recent development of a comprehensive physiologically based pharmacokinetic (PBPK) model of trichloroethylene (TCE) disposition and metabolism in mice, rats, and humans (1) offers us the opportunity to examine issues of variability and uncertainty for that solvent. In particular, uncertainties in prediction of various cancer dose metrics deserve to be computed, since they could be directly used as input for improved risk assessments.
PBPK modeling provides a strong mechanistic basis for prediction of disposition and metabolism of toxicants. Yet much uneasiness remains with the use of these models in toxicology (2). Similarly, as discussed in a recent review and an accompanying commentary, PBPK modeling has not seen the development it promised for therapeutic compounds (3,4). The reason for this essentially lies in the lack of statistical methods for calibrating these models. Because of individual variability and uncertainty, many parameters are difficult to measure accurately even for inbred animal strains. Using input parameters or presenting results in the form of a single value can therefore be very misleading (5). In the absence of rigorous statistical treatment, inference presented by PBPK modeling is largely empirical, hypotheses are left unvalidated, and predictions lack realistic measures of uncertainty. This state of affairs is unfortunate, when considering the consequences (for public health and national welfare) of the decisions made using these models.
Obviously, correct statistical treatment of PBPK models is difficult, since these are large nonlinear models with relatively small data sets and a high degree of uncertainty and biological variability (6). It is also essential to respect the fundamental specificity of PBPK models, i.e., their high prior information content, which they provide through the opportunity to use physiological information on parameter values. Yet, although several parameters have physiological meaning and a narrow range of possible values, others-often specific of the compound studied-lack such definition and need to be identified by the fitting of the model to concentration-time profiles. Finally, most of the time, prior physiological information is simply about population averages and is not directly applicable to any particular individual for which data were obtained. Fortunately, all these problems can be solved in a unified way though a Bayesian population toxicokinetic approach, which is worth implementing even in the case of small numbers of study subjects (7)(8)(9)(10). Bayesian statistics provides a natural way of merging a priori knowledge gained by implementing a physiological model, with the in vivo experimental data. A Bayesian numerical treatment can also deal efficiently with the multilevel error structure of pharmacokinetic data (11). This is achieved through the use of an explicit statistical model, describing the links between the various sources of variance (e.g., measurement errors, population variability) present in the data, in which the physiological model is imbedded as a deterministic component. These techniques are demonstrated here in the case of TCE PBPK modeling.

Methods Data
Mice. Similar to the study of Clewell et al. (1), data from six published reports were used. From the reported experiments, a total of 33 groups of animals were defined. Fisher and Allen (12) exposed groups of 3 or 4 male and female B6C3F1 mice each (body weight [bw] 30 g), by gavage to TCE at concentrations of 487, 973, and 1,947 mg/kg (males, groups 1-3; females, groups 4-6, respectively). Trichloroacetic acid (TCA) concentrations in venous blood were measured at various times in all groups, as well as the venous blood concentrations of TCE for dosing group 2.
Fisher et al. (13) exposed groups of 14 female B6C3F, mice each (bw 26.5 g) by inhalation to TCE in a closed chamber of 9.1 L, at concentrations of 300, 700, 1,100, 3,700, and 7,000 ppm (groups 7-11) and groups of 15 male mice (bw 31 g) each to 1,020, 1,800, 3,800, 5,600, and 10,000 ppm (groups [12][13][14][15][16]. The concentration of TCE in the air chamber was measured. The same study also exposed four groups of 3 or 4 male B6C3F, mice each to TCE (bw 31 g) for 4 hr at concentrations of 110, 297, 368, and 748 ppm (groups [17][18][19][20]; four groups of 3 or 4 female mice each to TCE for 4 hr at concentrations of 42, 236, 368, and 889 ppm (groups [21][22][23][24]. The venous blood concentrations of TCE and TCA were measured at various times. Larson and Bull (14) exposed groups of 4 male B6C3F, mice each (average bw 27 g) by gavage to TCA at concentrations of 20 and 100 mg/kg (groups 25 and 26). The venous blood concentrations of TCA and dichloroacetic acid (DCA) were measured. Given the analytical technique used, it is suspected that the DCA concentrations may be artifactually high (15). Larson  Rats. Ten experimental groups of rats were identified in a subset of the abovedescribed reports. Fisher et al. (13) exposed groups comprising 6 female F344 rats each (bw 186 g) by inhalation to 600 ppm TCE for 4 hr (group 1). The venous blood concentrations of TCE and TCA were measured at various times. Under similar conditions, groups comprising 6 male F344 rats each (bw 236 g) were exposed to 529 and 505 ppm TCE (groups 2 and 3). The venous blood concentrations of TCE were measured in group 2; the venous blood concentrations of TCA were measured in group 3. Larson and Bull (14) exposed groups comprising 4 male F344 rats each (bw 331 g) by gavage to TCA at concentrations of 20 and 100 mg/kg (groups 4 and 5). The venous blood concentrations of TCA and DCA were measured. For the same reasons as above, the DCA concentrations may be artifactually high. Larson and Bull (16) exposed groups comprising 5 or 6 male Sprague-Dawley rats each (bw 404 g) by gavage to TCE at concentrations of 1.5 mmol/kg (197 mg/kg), 4.5 mmol/kg (592 mg/kg), and 23 mmol/kg (3,024 mg/kg) (groups [6][7][8] (21) reported another set of experiments in which a group of volunteers (group 2-average weight, 80.2 kg; alveolar ventilation rate, 16.7 L/hr; fraction of weight as fat, 15%) were repeatedly exposed to 70 ppm TCE for 4 hr/day for 5 days. In addition to the variables measured in group 1, the cumulated quantity of trichloroethanol glucuronide (TCOG) excreted in urine was recorded at various times.
Muller et al. (22) exposed a group of humans (group 3) to 100 ppm TCE for 6 hr. The same variables as for group 2 were followed. Muller et al. (23) exposed a group of volunteers (group 4) to 50 ppm TCE for 6 hr/day for 5 days. The venous blood concentrations of free TCOH and TCA, and the cumulated quantities of TCA and TCOG excreted in urine were recorded. In the same article, Muller et al. report exposure of two groups (5 and 6) of volunteers to 100 ppm TCE for 6 hr. For group 5, the exhaled air concentrations and venous blood concentrations of TCE, and the venous blood concentrations of free TCOH and TCA were measured. For group 6, only the venous blood concentrations of free TCOH and TCOG are reported.
Finally, group 7 comprises volunteers that Stewart et al. (24) exposed to 198.3 ppm TCE, 7 hr/day (with a 30 min break in the middle) for 5 days. In this experiment, the exhaled air concentration and venous blood concentrations of TCE, and the cumulated quantities of TCA and TCOG excreted in urine were recorded.

Toxicokinetic and Statistcal Model
The description of the physiological model used can be found in Clewell et al. (1). The model equations were transcribed to a format suitable for MCSim (25). Three modifications were made to the model: a) One compartment was added to describe closedchamber exposures of mice by Fisher et al. (13). b) The volume of the poorly perfused compartment and c) the blood flow to the richly perfused compartment were computed by difference at each iteration so that the sum of the organ volumes equaled 82% of the body weight and the sum of organ flows equaled cardiac output. Given this reparameterization, the model has a total of 55 independent parameters. Only 45 of these were adjusted for mice and rats and 40 for humans because there is no information in the above described data about the remaining 10 or 15 parameters. Neither do these 10 or 15 parameters influence the fit to the data.
The statistical model describing uncertainties and variabilities was constructed using a hierarchical population approach (10,26), as illustrated in Figure 1. It has two major components: the group level and the species or population level. At the group level, various concentrations or quantities (y) were measured. The expected values of these measurements are a function (f) of exposure level (E), time (t), a set of physiological parameters of unknown values (9), and a set of measured, covariate parameters (q>) such as body weight. E, t, 0, and q are experiment specific. All animal or human subjects in an experiment were supposed to have behaved similarly from a toxicokinetic point of view. The function f is the pharmacokinetic model described above. The concentrations or quantities actually observed are also affected by measurement error and interindividual variability within the group. Since the data are aggregated at the group level, it is not possible to reliably disentangle the two sources ofvariability. The corresponding errors were assumed to be independent and log-normally distributed, with mean zero and variance 02 (on the log scale). This corresponds to a proportional error model commonly used in pharmacokinetic modeling (10,26,27). The variance vector (J2 had nine components for mice (since nine different variables were measured) and eight components for rats and humans.
At the species level each component of the 0 parameter set was assumed to be distributed log-normally, with species averages p and variances 12 (in log scale). Some a priori knowledge of p and 12 is available in the form of "standard" values for some parameters. Uncertainty in these averages and variances was acknowledged under the form of a priori log-normal distributions for the population means p (with hyperparameters M and S) and a standard inverse gamma distribution, with parameters a =1 and 12 = L: for the population variances 12 (see section below on a priori parameter values).
Three types of nodes are featured in Figure 1: * Square nodes represent variables for which the values were observed, such as y or 9p; were fixed by the experimenters, such as E and t; or were fixed by ourselves, such as the prior on p and 12. * Circle nodes represent unknown variables, such as a2, 0, p, or 12.
* The triangle represents the deterministic physiological model f. An arrow between two nodes indicates a direct statistical dependence between the variables of those nodes.

Prior Parameter Distributions
A major advantage of physiological modeling is to provide a priori information on several of the mean parameter values for a species, as well as some idea of the variability of the parameters across individuals. Values for the hyperparameters M were set on the basis of the parameter values used by Clewell et al. (1). For VMTC and KMT (the Michaelis-Menten parameters for the formation of DCA from TCA) Clewell et al. assumed null values for mice and humans. A low value, with large uncertainty, was assumed here. To specify S, the vector of a priori uncertainty (standard deviations [SDs]) on the average parameter values, a distinction can be made between the physiological parameters or partition coefficients (which are quite well known) and the other metabolic or pharmacokinetic parameters (which are model specific and little known a priori). For the first group of parameters, values of S corresponding to coefficients of variation (CVs) of 20-50% were assigned (8)(9)(10)). An exception is the volume of the tracheobronchial compartment, quite uncertain and for which S was set to correspond to 200% CV. For the second group of parameters, a "vague" distribution was assumed and S was set to correspond to 200% CV (quite uncertain) or 500% CV (very uncertain). For those parameters "the data were left to speak." All priors on p were truncated to ±2 x S or ±1.5 x S to avoid reaching unrealistic values. The prior SD, I:, on group variability, was set to 0.47 (corresponding to a CV of 50%) for all parameters. The square of that value was used as parameter P in the inverse-gamma distribution, a default choice for variance components in normal models (28), together with a a of 1, giving a vague shape to this prior. Table 1 gives the values of exp(M)-i.e. the geometric mean-and exp(S), which lie on the natural scale.
The standard no-informative prior distribution P(G12,. .. )an2) _ y 12 X... X an 2 was used for a2. To avoid the risk of overparameterization (a perfect fit could be obtained for some data sets, leading to implausible null estimates of variance), these variance components were constrained to be larger than 0. 29  Abbreviations: BW, body weight; duod., duodenum; Ke, elimination rate; Kf, formation rate; Km, Michaelis-Menten coefficient; ox, oxidative metabolism; TB, tracheobronchial; Vmax, maximal rate. &Scaled parameter = scaling formula x scaling coefficient. Units: weights in kg, flows in L/hr, volumes in L, kinetic parameters in mg/hr, mg/L or hr-1 11). For all parameters the scaling coefficients were assumed to be log-normally distributed with truncations at ± 2 SDs except where indicated. bMeasured values given in the data section were used when available. clruncation at -2 x SDs and +1.5 x SDs to respect summation constraints.
are the parameters of interest) is given by the experimental data and by the species parameters. The species parameters are determined by the 0 variables and their priors, which were set. The variances a2 are also estimated but are of lesser importance to us (however, high posterior variances may indicate a poor fit). From Bayes' theorem, the joint posterior distribution of the parameters to estimate, tO, a2, p, 121y, 9, E, t, M,Y , v4), is proportional to the likelihood of the data multiplied by the parameters' priors: Ro) 02 p) 21y) pyE) t M) S4) -P(yI0, a2, p, E, t)POIap, 12).Pa2) *lRPlM,S) *X2lS2)) [1] The likelihood term is given by the normal measurement model: log(y) -N(logf(0, 9p, E, t)a2), [2] As mentioned above, the prior distribu- The prior distribution of each component of 9 is an independent normal distribution: log(O) -N(,u, 12), [3] with truncation constraints. Finally each component of p, or 12 iS assigned an independent hyperprior distribution, p -N(M,S2) and 12 _ inversegamma(2,1;:), as described above. Current practice in Bayesian statistics is to summarize a complicated high-dimensional posterior distribution by random draws of the vector of parameters. This is currently the most effective way to perform high-dimensional numerical integration. Further simulations can then be performed to compute, under specified conditions, posterior distributions of quantities of interest, such as various measures more closely related to cancer risk than exposure to TCE. Because there are many parameters to estimate, a combination of Gibbs sampling and Metropolis-Hasting sampling was used to perform a random walk through the posterior distribution. These iterative sampling procedures are particularly convenient in the case of hierarchical models. They belong to a class of Markov chain Monte Carlo (MCMC) techniques that has recently received much interest (10,(29)(30)(31)(32)(33)(34). Three independent Markov chain Monte Carlo runs were performed for each species. Convergence was monitored using the method of Gelman and Rubin (35).

Posterior Distribution ofPredictions
The model was used to compute, a posteriori, several surrogate exposure metrics. For lung tumors, the dose metrics proposed are the lifetime average daily area under the chloral concentration-time curve (LAD-A UC, in mg.hr-L-1) in the tracheobronchial region, and the maximal chloral concentration achieved in the tracheobronchial region (Q,:, in mg.I-1); for kidney tumors, metrics computed are the lifetime average daily amount of cytotoxic metabolites (originating from dichlorovinylcysteine) generated by gram of kidney (LAD-A, in mg.g'1); for liver tumors, the LAD-AUCand Cmax of TCA and DCA are the proposed metrics.
To obtain the distribution of surrogate dose measures, several exposure scenarios were simulated for each species (either continuous exposure through inhalation or drinking water, inhalation exposure 8 hr per day, 5 days per week, inhalation exposure 7 hr per day, 5 days per week, or gavage once per day, every day). A population of 1,000 individuals was simulated by sampling for each one a random parameter vector from N(p,Z), for a random set of the final estimates of p and 1. This sampling accounts for covariance between values of the population parameters, since the 1,000 parameter sets are random draws from their joint (multivariate) distributions, not just from the marginal distributions. These simulations required sampling several parameters that could not be estimated with the data at hand. The sampling distributions of these parameters (summarized in Table 2) were defined on the basis of Clewell et al. estimates (1). In the absence of relevant information, no covariance was specified between those parameters.

Results
After a few thousand iterations, the trajectory of each parameter oscillates randomly around a mean value, and these oscillations have stabilized to a stationary regime. Remember that the simulations converge to a distribution, not to a point. For mice and humans, 7,000 iterations were necessary to reach convergence; for rats 12,000 had to be performed. One of every 5 of the last 5,000 simulations of three independent Markov chains were recorded, yielding 3,000 sets of parameter values from which the inferences and predictions presented in the following were made. Quality ofData Adjustment     (all data values are concentrations). Predictions were made with the parameter set of highest posterior density. For a perfect fit, all points would fall on the diagonal (equality of predicted and observed values). Such an adjustment is not expected given the analytical measurement errors in the data, but the deviations are small and the fit seems overall reasonable. The graph is presented on log-log scale, since the errors are assumed to be lognormally distributed and span a wide range. The residuals are evenly spread along the diagonal, in particular for mice. For rats, there seems to be one outlying point in the venous blood concentrations of free TCOH, the model fitting well all other data points in the same experiment. The fit to human data leaves four groups of points with high residuals. There seems to be one outlier in the TCA blood concentration data of Monster et al. (21), and the last points are very variable. More troublesome is the systematic underestimation, by a factor of 5, of TCE concentration in exhaled air during exposure for the experiments ofMiller et al. (22,23).
The posterior means of the intragroup SDs a (representing measurement error and intersubject variability) are mostly between 0.3 and 0.5 (corresponding to coefficients of variation of about 30-50%). Larger values are found in mice for the venous blood concentration of TCE (a = 0.55), caused by the "noisy" data of Fisher and Allen (12) and of Prout et al. (17), for the blood concentration of DCA (a = 0.60) due to the noise and underprediction of the data (14,16,18), and for the blood concentration of free TCOH (a = 0.65) in the data of Prout et  times. For rats, the blood concentration of DCA (a = 1.14) seems to be systematically underpredicted. For humans, as mentioned above, the concentration of TCE in exhaled air in the experiments of Muller et al. (22,23) is systematically overestimated and some noise exists in the Stewart et al. (24) data. This leads to high residual errors (a = 0.79). Figure 3 presents the fit to the mouse data of Fisher et al. (13). The residual error is small, albeit with some degree of autocorrelation. But, by their nature, these data are prone to exhibiting such dependency of errors and are quite difficult to model.
Fits to human data are crucial to risk assessment and human toxicology of TCE. Figures 4-7 show that very nice fits can be obtained with the model to a range of data. The model has been formally fitted and the adjustments are systematically better than those, already reasonable, obtained by Clewell et al. (1). However, as mentioned above, some data remain poorly fitted.  the TCE blood concentration data in the decay portion. The two "misfits" are certainly related. However, the venous blood TCA and TCOH concentrations from the same experiments are well fitted (data not shown). A similar situation is observed in mice and rats, for example, for the DCA data.

Posterior Parameter Distributions
The joint distribution of all parameters is obtained in output of the Markov chain Monte Carlo simulations. This allows consideration of marginal distributions (distributions of the parameters considered individually) but also of correlations of any order. Table 3 summarizes the distributions of the species-level parameter values obtained in the last 1,000 iterations of the three runs performed (results of the three runs are pooled, and the distributions are established with 3,000 values). The geometric means can be interpreted as representing the values for an "average" mouse, rat, or human. Note that the columns of geometric standard deviations (GSDs) represent group variability among the species. Means  Geo, geometric. aGeometric mean and GSD are in parentheses and are given for each estimated population parameter. The GSD, in that case, measures uncertainty in location or spread. could be given for each animal or human group defined in the data section above, but the tables are too large to be presented here. The following summarizes the information the simulations give on the various strains or individuals studied. Overall, the parameters retain physiologically plausible values.
Mice. For mice the posterior mean values for flows, volumes, and partition coefficients and most metabolic parameters are not very different from their prior mean. However, KM (oxidative affinity for TCE) is twice as high as a priori estimated. Noticeable differences are also found for KEHBC (biliary excretion rate of TCOG), which decreases by a factor of 7, KEHRC (enterohepatic recirculation rate of TCOH), VMTC and KMT (Michaelis-Menten parameters for the reduction of reduction of TCA to DCA), KUTC (urinary excretion of TCA), the last four parameters being higher by a factor of 2, KFC (production of DCVC from TCE), twice less important than a priori assumed, VMTBC (V,,,, for TCE in Clara cells) is also lower, but with a large uncertainty, and finally KTSD (transport rate from stomach to duodenum) posterior mean is somewhat higher than its prior estimate. Uncertainties (SDs) about all these geometric means range from a few percents (for physiological parameters) up to a factor of 2 for some metabolic parameters. These uncertainties are usually much lower than the prior uncertainties, showing that substantial information on about all parameters has been gained from the experimental data.
Variabilities, as estimated by the intergroup GSD (Table 3), range from 1.23, which corresponds to a CV of about 20% for some organ volumes, to 2.7 for KAD (intestinal absorption rate, from duodenum to liver). Metabolic parameters have intergroup variabilities of at least 1.34 (about 30% CV).
Differences between groups do not seem to be caused by differing sexes or exposure levels, as no such pattern emerges from examination of group means. Strain cannot be a factor, since only B6C3F1 mice were studied. Differences should therefore be ascribed to interindividual variability or to differences between laboratories (which could have unreported experimental differences, such as animal providers). Note that, a posteriori, there does not seem to be a particular problem with the DCA measurements in Larson and Bull (14,16) (14) may be too high (the model underpredicts them by a factor of 2 or 2.5), but this may be simply due to noise in the data.
Rats. For rats the posterior mean values for most parameters are close to their prior mean. The largest differences are found for the scaling coefficients of the volume of distribution of TCOH (VDBWC, increased by a factor of 1.4), and of DCA (VCDCAC, decreased by a factor of 1.4), the fractional split of TCE to TCA (PO) doubled, and the enterohepatic recirculation of parameter of TCOH (KHERC) tripled. SDs for these geometric means reach a factor of 3 and tend to be higher than those for mice (this can be explained by the smaller rat data set).
Variabilities, as estimated by the intergroup GSD, also tend to be higher for rats: they range from 1.32 to 3.6 for KAD (duodenum to liver absorption rate). Differences between groups, most apparent for KAD, the urinary excretion of TCA (KUTC), and the volume of distribution of TCA (VCTCAC), cannot be ascribed to sex, strain, exposure level, or laboratory, as no particular pattern emerges from examination of group means. Differences are therefore due to interindividual variability. Here also there does not seem to be a particular problem with DCA measurements in groups 4 and 5 (14). Metabolic parameters directly related to DCA are not systematically different for these groups, and the slight underpredictions of the model for early time points (at most 70%) could be due to measurement uncertainty.
Humans. The ratios of posterior to prior mean values for metabolic parameters in humans range from a factor of 0.2 for VMTC (the maximal rate of DCA formation from TCA) to a factor of 6 for KEHBC (the biliary excretion of TCOG). The difference for VMTC is actually a nice result: without imposing a priori a null value to this parameter, the resulting posterior is very low, indicating that according to the human data very little DCA is produced, even if this observation is indirect (i.e., this result is imposed by the implicit mass balance of the other metabolites in humans). SDs for these geometric means are also quite higher than for animals, and reach a factor of 3.2. This, as for rats, can be explained by the small human data set.
Interestingly, variabilities for physiological parameters are systematically higher than for animals but about the same for metabolic parameters. Differences between groups cannot be ascribed to sex, since only males were studied. They may be due to interindividual variability. The subjects studied by Muller et al. (22,23) have much higher blood over air partition coefficients than subjects in other studies. This is linked to the poor fit of the model for the same subjects. The model cannot accommodate the (apparently) differing blood-air partition coefficient observed during exposure and after exposure. This could be linked to an experimental peculiarity in exhaled air concentration measurements for those studies. Table 4 shows the highest covariances between pairs of parameters for human group  Table 4).
2 (21). Similarly, high correlations are obtained for every animal or human group. These covariances can be up to 0.81 (between VMOC and KMO, Figure 8) or even 0.94 (between the parameters VMRC and KMR). Any simulation neglecting to estimate these covariances will produce incorrect predictions, since these parameters cannot be sampled independently without producing highly improbable combinations and hence highly improbable predictions.

Preictions ofCancer Dose Metrics
Additional subjects were simulated by sampling parameter values from the estimated population distributions summarized in Table 3 and from the additional distributions given in Table 2. Sampling took into account parameter covariance for the parameters listed in Table 3, since it was made from 1,000 random samples of the MCMC runs, at equilibrium. Remember that these parameter sets are randomly drawn from their joint (multivariate) distribution not just from marginal distributions. Tables 5 to 8 summarize the posterior distributions of LDA-A UC and Cm,. for TCA and DCA in liver for several exposure scenarios. The results for lung, kidney were also computed (data not shown). The 95% posterior confidence intervals presented are defined as the interval between the 2.5th percentile and the 97.5th percentile. Figures 9 and 10 present histograms of the posterior distributions of TCA LDA-AUC and Cmax in the liver for humans exposed continuously to 1 ppm TCE. The impact of uncertainty and variability is large but not unrealistic; in these extrapolations geometric SDs correspond to factors from 1.8 to 9 (for chloral concentration in human lung when exposed through drinking water). Except for one case, the values found by Clewell et al.
(1) are all contained within the 95% posterior confidence intervals. The only area of disagreement is in the mouse response to low-dose drinking     Results of a stepwise multiple regression of TCA LDA-AUCwith respect to the model parameters in the case of humans continuously exposed to 1 ppm TCE indicates that many parameters condition the results (data not shown). Among them figure key metabolic constants but also physiological parameters or partition coefficients. The number of parameters conditioning TCA LDA-AUC explain the relatively large SDs for the risk estimates presented in Table 5. The same variables, in about the same order, influence TCA C. (data not shown). Discussion Data The grouping by studies is somewhat arbitrary but was imposed on us by the aggregate reporting of the data. It can still be justified, since heterogeneity in batches of animals and differences in laboratory procedures are expected. As a consequence, all individuals were supposed to behave similarly in a given experiment. This is likely to lead to a moderate underestimation of variability. Note also that other data sources could have been considered, in particular for humans (36)(37)(38)(39)(40)(41)(42)(43). These additional data sets might be useful for external validation of the model.

Method
The proposed methodology is gaining interest and is establishing itself for the calibration and validation of PBPK models (8)(9)(10)44,45). A Bayesian analysis allows us to combine two forms of information: a) prior knowledge from the scientific literature, and b) data from Monster's experiments, in the context of the physiological compartmental model. Neither source of information is complete. If prior knowledge were sufficient, experiments would not need to be performed, but data alone are insufficient to pin down all parameters to reasonable values. Our goal was to fit the data using scientifically plausible parameter values. The posterior (i.e., after fitting) uncertainty for such parameters is underestimated if all physiological parameters are considered perfectly known and set to predefined values, a practice too often adopted to alleviate computational burden. Prior uncertainty about physiological parameter values needs to be taken into account, unless it can be proven negligible by sensitivity analysis of the posterior parameter distributions (not by sensitivity of the final predictions to be made). However, such a sensitivity analysis of the fitting process itself, in the case of PBPK models, is more difficult to perform than simply considering all parameters uncertain. To obtain samples from the joint posterior distribution of all parameters, MCMC sampling was used. Figure 11 is an illustration of MCMC sampling compared to simple Monte Carlo sampling. In the former, the values drawn for each parameter start from the prior distribution and converge, as iterations progress, to a data-adjusted, or updated, posterior distribution. The posterior density corresponds to the product of the prior density by the data likelihood. In the case of simple Monte Carlo sampling, all values are drawn from the same distribution (equivalent to a nonupdated prior). Beyond improving the fit, the method used here also provides distributions of estimates directly usable as inputs for uncertainty analysis of cancer dose-response relationships. In addition, the posterior distributions of Table 3 can be taken as new priors in future studies. Finally, it can be checked a posteriori that strong correlations exist between parameters (

Results
Interindividual human variability of TCE metabolism, as estimated here, is not very large-GSDs of metabolic parameters range from a factor of 1.5 to 2. However, only small samples of young Caucasian males were analyzed. New data, developed by Dr. Fisher's group, include females and allow an assessment of potential sex differences (46), although animal data do not point to such differences. Metabolism in animals appears as variable as in young Caucasian males. SDs corresponding to factors of 1.5-3 are to be expected. It should also be noted that the analysis does not a posteriori point to problems with DCA concentrations in the Larson and Bull (14,16) or Templin et al. (18) experiments. There appears to be no conflict between those data and others. The only obvious misfit is in human exhaled air and blood levels. Yet, it is present only for Muller et al. (22,23) experiments and not for Monster et al. (20,21) data. It is one of the strengths of Bayesian PBPK modeling that although the number of parameters is large, overfitting is avoided and discrepancies in the data are left apparent (47). Unfortunately there is no obvious explanation for the discrepancy. The new animal and human data mentioned above may help us gain a better understanding of TCE inhalation kinetics. Despite the problem with Muller et al. (22,23)  for a continuous 1-ppm inhalation exposure (see Table   5). The spread of these model-predicted values is conditioned by uncertainty and variability. The smooth line represents the corresponding normal approximation [geometric mean: log(88) = 4.48; GSD: log(2.5) = 0.921. were not discarded from the data set. The model is not much affected by the misfit, and metabolite levels are well predicted. The posterior predictions for risk measures are affected by expectedly large uncertainties and some degree of variability. Uncertainties depend on species (conditioned by the amount of data available for that species), end points (still conditioned by the data), or exposure levels and patterns (since different parameters or combinations thereof condition outcome in different situations). Variabilities seem to be of about the same magnitude in small groups of humans and animals. For the modeling of animal cancer bioassay internal exposures or human population exposures, the variability found here Cmax TCA liver human 1 ppm Figure 10. Posterior distribution histogram (n= 1,000) of the natural logarithm of TCA Cmax in human liver, for a continuous 1-ppm inhalation exposure (see also Table  5). The spread of these model predicted values is conditioned by uncertainty and variability. The smooth line represents the corresponding normal approximation (geometric mean: log(3.9) = 1.36; GSD: log(2.6) = 0.96).
would be damped by averaging effects. Uncertainty would therefore dominate, and variability could be neglected, in a first approximation. This would require conditioning internal exposure estimates at various dose levels on the same parameter vectors; exposure groups would not be simulated independently.
Finally, one of the challenges to modeling in toxicology is the full exploitation of the numerous data sets collected during epidemiological or occupational hygiene studies, and generally in settings where exposure levels are unknown. Most of the times very simplistic analyses of such data are performed because Iteration Figure 11. Illustration of Markov chain Monte Carlo (MCMC) sampling, compared to simple Monte Carlo sampling. In MCMC sampling, the values drawn for a parameter 8 (circles) start from the prior distribution and converge, as iterations progress, to a data-adjusted, or updated, posterior distribution. The posterior density corresponds to the product of the prior density by the data likelihood. In the case of simple Monte Carlo sampling (crosses), all values are drawn from the same distribution (equivalent to a nonupdated prior). of lack of experience in more powerful methodologies. There is no difficulty, in the above statistical framework, in considering exposure as a parameter or function to estimate. A major problem, however, resides in accounting fully for the uncertainties stemming from unknown time-varying exposures. The impact of particular functional forms for the time evolution of exposure has not yet been thoroughly studied and validated. As progress is made on such questions, toxicokinetic modeling will become a more powerful and widespread tool for drug development and toxicity assessment. A publicly accessible database of individual animal and human data on kinetics and metabolism of major solvents should be gathered and offered to public access. This would allow a standardization of analyses and their improvement.