Application of three approaches for quantitative AOP development to renal toxicity

While hazard assessment of chemicals can make direct use of descriptive adverse outcome pathways (AOPs), risk assessment requires quantitative relationships from exposure to effect timing and magnitude. To seamlessly integrate the data generated by alternative methods or in vivo testing, quantitative AOPs (qAOPs) providing dose-time-response predictions are more valuable than qualitative AOPs. Here, we compare three approaches to qAOP building: empirical dose-response modeling, Bayesian network (BN) calibration, and systems biology (SB) modeling. These methods were applied to the quantification of a simplified oxidative stress induced chronic kidney disease AOP, on the basis of in vitro data obtained on RPTEC/TERT1 cells exposed to potassium bromate. Effectopedia was used to store the experimental data and the developed models in a unified representation so they can be compared and further analyzed. We argue that despite the fact that dose-response models give adequate fits to the data they should be accompanied by mechanistic SB modeling to gain a proper perspective on the quantification. BNs can be both more precise than dose-response models and simpler than SB models, but more experience with their usage is needed.


Introduction
Adverse outcome pathways (AOPs) have become an organizing framework to facilitate the development and integration of alternative test methods for assessing hazard of chemicals to human health and the environment. A dedicated program is currently running under the auspices of the Organisation for Economic Co-operation and Development (OECD). AOPs are intended "to outline and capture existing knowledge concerning the biologically plausible and empirically supported foundations for predicting apical toxicity from mechanistic data" [1]. Practically, an AOP is a chemical-independent description of a linear path from a molecular initiating event (MIE) to an eventual adverse outcome (AO) at the organism or population level [2] (see Fig. 1). In between, there can be any number of intermediate critical and measurable key events (KEs) connected by key events relationships (KERs) [3,4]. In typical AOP diagrams, KEs are represented by boxes and KERs by single one-directional arrows connecting them (Fig. 1).
feed-forward loops between two consecutive KEs can simply be indicated by a symbol and included in the KER). Thus, according to graph theory, in the absence of loops within KERs, AOPs are acyclic directed graphs [5].
AOPs can support the development of integrated testing strategies (ITS) and their application in risk assessment [6,7]. In case of ITS building, the data generated by alternative methods (i.e., in silico, in chemico, in vitro), when combined with existing animal data, are used and assessed by means of a fixed data interpretation procedure [1,8]. For this purpose, quantitative AOPs (qAOPs) that provide dose-response and time-course predictions [9] are likely to be more valuable for ITS construction than qualitative AOPs. Parameter values for a qAOP can be either obtained from legacy data or new targeted experimental work, or by optimizing the fit of model predictions to data [2]. So far, the few published qAOPs use either empirical dose-response models to quantify KERs [e.g., [11]], or are based on an underlying systems biology (SB) model [e.g., [10]]. We propose here the use of Bayesian networks (BNs). Unlike SB models which may contain feedback and feed-forward loops, but like AOPs, BNs use acyclic directed graphs as their organizational structure [11]. The links between their nodes correspond to simple statistical dependencies. Thus, BNs can be viewed as an intermediate approach between empirical models and SB models. They have already been applied to AOPs in the area of skin sensitization to facilitate potency assessment for classification purposes and to support hazard characterization in a semi-quantitative way [12,13]. Here, we demonstrate the application of (dynamic) BNs to AOP quantification. Moreover, we compare the results obtained by using the three modeling approaches mentioned above (i.e., linked statistical dose-response relationships, dynamic BNs, and SB models) to quantify a chronic kidney disease (CKD) AOP.
CKD is a multifactorial progressive syndrome, of which aging, cardiac insufficiency, diabetes and chemical-induced nephrotoxicity are both initiating and accelerating factors. All of these factors have an oxidative stress component: Nrf2 activation and its downstream genes, including Heme Oxygenase-1, are markers of CKD [14,15]. Indeed, pharmaceutical activation of Nrf2 via bardoxolone methyl has been shown to slow CKD progression [16]. In order to develop the BN framework we have taken a relatively simple in vitro model, i.e. differentiated RPTEC/TERT1 cells treated with the oxidant and renal carcinogen potassium bromate (KBrO 3 ) [17,18]. Previous evidence suggests that KBrO 3 is a thiol reactive oxidant and a strong inducer of Nrf2 and mitochondrial injury [17,19]. Mitochondrial perturbation through increased production of reactive oxygen species is a well-described phenomenon [20]. Also we have shown using the same biological model that several compounds both activate Nrf2 and increase glycolysis rates and subsequent lactate production. These compounds include cadmium chloride, chloroacetaldehyde, cidofovir, cisplatin and cyclosporine A [21][22][23][24]. Thus enhanced lactate production is a good surrogate for decreased mitochondrial capacity in RPTEC/TERT1 cells, where HIF-1 alpha is not suspected to be activated (which would also lead to increased lactate production).
Finally, we present the implementation of the developed qAOP in Effectopedia, an OECD software tool that aims to gather experimental data and models in a unified representation, so that they can be compared, further analyzed, and used for hazard and risk assessment purposes. Effectopedia is compliant with the OECD guidance document for the development and assessment of AOPs, according to its supplemental Users' Handbook [4,25].

Chronic kidney disease AOP
The proposed AOP ( Fig. 1) links thiol oxidation to CKD via oxidative and mitochondrial stress. Within the nephron, the proximal tubule is especially susceptible to injury from oxidative chemicals, as they can cause mitochondrial damage, which in turn can result in impairment of active and secondary transport, as well as in cell death. CKD is characterized by a progressive loss of renal function, the onset of which is initiated and/or accelerated by other factors such as diabetes, high blood pressure or exposure to nephrotoxic chemicals [21,26]. Given its high energy demand for active transport, the proximal tubule is especially susceptible to injury from oxidative chemicals and mitotoxins [27]. This AOP fulfills several, but not all, Bradford Hill criteria for weight-of-evidence assessment [6]. It should therefore be considered as preliminary, rather than definitive: -The experimental dose-response relationships corresponding to the various KERs are consistent with the expected effects (depletion of GSH is associated with increased DCF, and increased lactate production, see the data in Figs. 5-7 for example). -There is temporal concordance among the key events and adverse outcome: GSH depletion is fast, DCF production slower, lactate response even slower (Figs. 5-7), and CDK takes years to develop. -The biological plausibility, causal coherence and consistency of experimental evidence for this AOP is strong (note that a full documentation of this evidence would extend beyond scope and length of this paper). This AOP is being ported to the AOPWiki (https:// aopwiki.org) and more supporting information should be available there soon. -Alternative mechanisms are possible, as further discussed in this paper, but they would not exclude the mechanism postulated by the AOP. -A major data gap is that we do not yet have data on a sufficient number of chemicals to assess the strength, consistency and specificity of the association of the AO to the MIE or the potential impact of alternative mechanisms. Consequently, uncertainties are quite high, as will be shown by the SB model, and the proposed AOP is not yet very well characterized.
Here, we quantified this AOP up to the initiation of cell death following induction of oxidative stress, since our analysis is based on in vitro data obtained in human proximal tubule (RPTEC/TERT1) cells exposed to KBrO 3 . The link from cell death to kidney function impairment therefore cannot be modeled based on the available data and we will focus on a set of core early KEs leading to proximal tubule damage.
Oxidative stress was quantified using the Invitrogen™ Carboxy-H2DCFDA test (catalog #C400). In brief, the cell permeant reagent 6carboxy-2′,7′-dichlorodihydrofluorescein diacetate (cH2DCFDA) is first introduced in the culture medium. After diffusion into cells, it is deacetylated by cellular esterases to 6-carboxy-2′,7′-dichlorodihydrofluorescein (cH2DCF), which remains trapped in the cell and is oxidized by hydroxyl, peroxyl radicals and other reactive oxygen species (ROS) to 6-carboxy-2′,7′-dichlorofluorescein (cDCF), which is highly fluorescent. RPTEC/TERT1 cells were grown as described by Aschauer et al. [28] and exposed to various concentrations of KBrO 3 (control, 0.75, 1.5. 3, and 6 mM) as described by Limonciel et al. [17]. Briefly, cells were grown and matured into a mature monolayer in 96-well cell culture plates kept at 37°C/5% CO 2 and were fed fresh medium 24 h before chemical exposure. Cells were incubated with 40 µM cH2DCFDA 4 h before washing out the excess extracellular dye and starting exposure to KBrO 3 dissolved in culture medium. cDCF production was measured over time (approximately every 15 min, up to 24 h, in 8 replicates) as relative fluorescence units (RFU) by fluorescence spectroscopy using a Tecan Pro M200 microplate reader.
Mitochondrial injury was estimated by lactate concentration in collected RPTEC/TERT1 cell culture supernatants, measured in quadruplicates at the start of the experiments and then every 24 h, up to 3 days, following exposure to various KBrO 3 concentrations (control, 0.25, 0.5, 1, 2, and 4 mM) (see Fig. 6). Supernatant lactate production rate is a measure of glycolysis rate, and increased glycolysis can be due to a decrease in mitochondrial respiration [29]. The culture medium, with the given KBrO 3 concentrations, was changed every day after an aliquot was taken for lactate measurement using the absorbance-based assay described in Limonciel et al. [14].

Dose-response based qAOP
In the empirical dose-response approach, dose(-time)-response equations were fitted to data on the effect of KBrO 3 on GSH, cH2DCF, and lactate. With such data, linking chemical exposures to KEs, the corresponding equations need to be mathematically inverted to obtain chemical-independent KERs. Only the exposure to MIE relationship can be used as is. For example, if we have three data sets for the activity at dose D X of chemical X on each KE of an AOP, we need to fit three doseresponse equations: The relationship between KE 1 and D X is given directly by Eq. (1) However, the relationship between KE 1 and KE 2 needs to be derived from Eqs. (1) and (2): where f −1 denotes the inverse function of f. Similarly, for the relationship between KE 3 and KE 2 we have: For dose-time-response relationships, the principle is the same, with time as an extra variable in the above functions. However, in some cases the function may not be monotonic and therefore will not be invertible.
The relationship between the concentration of KBrO 3 (C KBrO 3 ) and the percentage of GSH (Pct GSH ) remaining in vitro after one hour, representing the MIE, was modeled with a modified exponential decrease equation (Eq. (6)): Its parameters are the GSH degradation rate constant k, and power b (which increases the degradation rate if b > 1).
The inverse of Eq. (6) is: The relationship between C KBrO 3 , time t and the quantity of cDCF formed (Q cDCF , reflecting the amount of oxidative stress) was modeled empirically by Eq. (8): Its parameters are A (baseline response), B (maximum increase above baseline), δ (maximum increase modulation by dose), The solution of Eq. (8) for C KBrO3 is: Replacing C KBrO3 in Eq. (8) by the expression given in Eq. (7), we obtain the following KER between Pct GSH and Q cDCF : To model the C KBrO 3 -time -lactate concentration (C lac ) relationship, we used a polynomial equation which adequately fitted the data: A relationship (even more complex) between GSH and lactate concentration could be obtained by replacing Q cDCF by Pct GSH , using Eq. (10).

Bayesian network qAOP
A BN is a probabilistic model whose underlying structure is a graph (equivalently, a network) where each node represents a variable of the problem (i.e., for an AOP: chemical substance, MIE, KEs and AO), and each arc between two nodes represents a direct dependency (ideally, a causal relationship) [30]. The AOP shown on Fig. 1 can be taken as a BN structure. Within such a BN, a probabilistic relationship (specifically, a component of a conditional distribution function) is defined by each arc linking two variables. For example, if an arc joins variables A and B, a relationship such as "A is distributed normally around k⨯B, with a variance equal to s 2 " has to be defined. As a result, every node of the network has a probability distribution conditioned by other network variables. This implies that a variable cannot depend upon itself, even indirectly, and therefore cycles are not allowed in BNs. Evidence on a set of nodes (for example, measurement of some KEs) affects the probability distributions of all their dependent nodes [31]. Training a BN from data means that one searches for those dependencies (and associated distributions) between variables that best explain the data. On the other hand, calibrating a BN implies estimating the parameters of the distribution functions that link variables. In our case, we do not need to learn our BN's structure, since it is given by the AOP in Fig. 1, but we need to calibrate it.
However, standard BNs do not provide a direct mechanism for representing temporal dependencies. Given that we have dose-time-response data on cDCF and lactate production, and that their time evolution is progressive rather than instantaneous, it is natural to use a dynamic Bayesian network (DBN) to integrate those data [32]. DBNs, typically, replicate an underlying structure at several (discrete) times corresponding to measurement time points. Fig. 2 shows the DBN we constructed to quantify the chronic kidney disease AOP. Each node of a given time slice may depend on nodes in the previous time slice and on nodes in the same time slice [33]. In this way, the value of a node at time t i may depend on its own value at time t i-1 , without introducing a loop in the graph. For example, in Fig. 2, the cDCF readout at a given time point depends on its previous value (indeed, in the in vitro system cDCF accumulates with time). The same applies to the lactate concentration. There are also some instantaneous or constant dependencies: We considered that C KBrO 3 was constant with time throughout the experiments (note that this is an approximation, but we have no information on the kinetics of KBrO 3 in the in vitro system). The thiol depletion readout (GSH level remaining after 1 h) is simply an indicator of KBrO 3 potency and was also taken to be constant. The DBN structure being defined, we now turn to the form of the conditional distributions linking the nodes.

Node to node relationships
For the dependence of observed Pct GSH on C KBrO 3 we use a simplified probabilistic version the dose-response based qAOP (cf. Eq. (6)): with depletion rate constant k GSH and variance σ 2 GSH . Note that for simplicity we set parameter b to 1.
The conditional distribution of Q cDCF observations at time t, given Pct GSH and the Q DCF observation at the previous time t − h, is given by an extension of the standard DBN model in which Pct GHS,t influences the equilibrium value (E cDCF,t ) for Q DCF,t to which it converges over time at exponential dampening rate ν: = where E cDCF,t is the equilibrium value of Q cDCF (a linear function of Pct GSH ), h is the (positive) time interval between two consecutive observations, ν cDCF (positive), β 0,cDCF , β cDCF , and variance σ 2 cDCF are the parameters to estimate. The cDCF RFU value at time zero, cDCF 0 , was not measured, but it should be different from zero given the 4-hour pretreatment phase in the protocol and was therefore also estimated. Positive values of ν and h ensure that e −νh is bounded between 0 and 1.
A similar relationship was used for lactate by replacing Q cDCF,t by C lac,t , and Pct GHS by Q cDCF,t in Eqs. (14) and (15). Given the recurrent experimental change of medium during the experiment, lactate concentration was set to zero at the start of the experiment and reset to that value every 24 h.

Systems biology model
We used a SB model to analyze the oxidative stress (cDCF) data. The model does not describe lactate formation and hence we did not consider the lactate data in this approach. It focuses on the control of the oxidative stress by Nrf2 and glutathione, one of the major toxicity pathways studied in systems toxicology [29,34,35]. Therefore, we used it only to study the relationship between KBrO 3 exposure, time, and cDCF fluorescence in detail (the model is in fact a detailed representation of the KER linking the MIE and first KE of the AOP in Fig. 1).
The Nuclear Factor (Erythroid-derived 2)-Like 2 NFE2L2 pathway, commonly known as Nrf2, is an important adaptive response to oxidative stress [36]. In homeostatic conditions, Nrf2 is mostly bound to the cytoskeleton-associated Kelch-like-ECH-Associated Protein 1 (Keap1) in an inactive complex within the cytoplasm, which facilitates Nrf2 degradation. Upon oxidative stress, Keap1 is oxidized and the complex dissociates, and Nrf2 can migrate to the nucleus [37], where it activates the transcription of a set of target genes implicated in the metabolism and transport of xenobiotics, and ROS scavenging by GSH [38]. When the intra-cellular level of ROS exceeds the capacity of this defense system to replenish GSH through new synthesis, GSH depletion occurs and the remaining ROS cause extensive cellular damage, cell death, nephron attrition and CKD. Fig. 3 shows the SB model we developed to study the transcriptional regulation of the GSH pathway by the Nrf2 -Keap1 complex, which merges variants of the Hamon et al. model for RPTEC/TERT1 cells [39] and a model developed by Geenen et al. [40]. Whereas Hamon's model is more elaborate with respect to genes transcription, the Nrf2-Keap1 interaction, and the role of ATP, Geenen's model details GSH synthesis. Coupling the two models greatly improves the description of the regulation of oxidative stress in RPTEC/TERT1 cells. When merging the two models, we simplified the description of the transcription/translation process, without loss of precision in the predictions: We replaced a cascade of differential equations operating at quasi-steady-state (for gene/receptor binding/unbinding, transcription induction by activator (s), translation and degradation of mRNA) by a single Hill equation. We also simplified the folate cycle (which was at steady-state), replacing it by a constant tetrahydrofolate concentration. We removed acetaminophen-and cyclosporine-specific reactions as these were irrelevant for the current application. We finally included the description of the dynamics of HMOX1 and SRXN1 genes, which are often used as activation markers for the Nrf2 pathway. The simplification protocol, its validation and the resulting Hill-based equations are presented in Supplemental Material (Fig. S1, Supplemental Protocol and Table S8).
In order to calibrate the model with our experimental data on the effect of KBrO 3 on GSH and cH2DCF, we added several first order reactions to the model (Fig. 4): a. Action of KBrO 3 on extra-cellular GSH (parameter k GSHe,KBrO 3 ); b. Formation of cDCF from cH2DCF by ROS-mediated oxidation (parameter k cDCF,ROS ); c. MRP-driven efflux [41,42] or passive bleaching of cDCF (the same parameter, k bl , can describe the two processes); d. Formation of cDCF from cH2DCF by direct action of KBrO 3 (parameter k cDCF,KBrO 3 ); e. Action of KBrO 3 on intra-cellular GSH (parameter k GSHc,KBrO 3 , which is multiplied by k GSHe,KBrO3 to yield the reaction rate constant, and is in fact the ratio of the external to internal reaction rate constants).
The complete model code (with 57 differential equations and 335 parameters) is given as Supplemental Material.

Parameter estimation
Parameter calibrations for the three types of qAOP investigated were done in a Bayesian statistical framework, using Markov chain Monte Carlo (MCMC) simulations [43,44], or Hamiltonian MCMC [45]. Basically, for each parameter to calibrate, a prior distribution summarizing existing knowledge was updated on the basis of the likelihood of the current data to yield a posterior distribution. Those distributions were obtained by random sampling from several simulated Markov chains. The convergence of the simulated chains was checked using the R hat criterion of Gelman and Rubin [46]. The complexity of the various qAOP models differed and slightly different sampling strategies were used. For the dose-response based qAOP, we used a Metropolis-Hastings MCMC algorithm, as implemented in the GNU MCSim software [47]. Two Markov chains of 50,000 iterations were run in parallel, keeping one in four of the last 40,000 iterations. For each estimated parameter, non-informative uniform prior distributions were used (note that the boundaries of those prior distributions were never reached) (see Table S1 in Supplemental material). As usually done for measurements at different concentrations, the data were considered to be log-normally distributed with geometric means given by the corresponding model predictions and geometric standard deviations (σ GSH , σ cDCF , and σ lac ), sampled from half-normal distributions (with a priori about 5%, 20% and 20% precision respectively, see Table S1 in Supplemental material). Note that in this qAOP, the statistical error model (i.e., the likelihood of the data) is clearly separated from the structural equations.
For the BN qAOP, posterior parameter distributions were obtained by Hamiltonian MCMC, using the Stan software [48]. Three simulated Markov chains were run in parallel for 12,000 iterations, keeping the last 6,000 iterations. Non-informative uniform prior distributions were used for each parameter except for the parameters in the cDCF -timelactate portion of the model where weakly informative Gaussian priors were used to stabilize inference (see Table S2 in Supplemental material). In this qAOP model, the data likelihood is embedded in the model formulation. There is one clear constraint for this model: time and exposure conditions must match for all the variables describing a particular node to node relationship. For example, lactate was measured every 24 h and depends on cDCF, which was measured every 15 min, but for different KBrO 3 concentrations. Therefore we need to statistically "impute" (randomly draw from their conditional distribution) the expected cDCF values at the concentrations used in the lactate experiment. Note that the alternative solution of describing the cDCF dynamics only at time points zero and 24 h would discard most of the cDCF data and is thus unsatisfactory. To reduce the number of data points to be imputed, we chose to use only one in four cDCF data points (one per hour).
For the SB model, parameter calibration was done with Metropolis-Hastings MCMC with GNU MCSim. Three Markov chains of 30,000 iterations were run in parallel, keeping the last 5,000 iterations. For each estimated parameter, non-informative uniform prior distributions were used (see Table S3 in Supplemental material). The data were considered to be log-normally distributed with geometric means given by the corresponding model predictions and geometric standard deviations σ cDCF (see Table S3 in Supplemental material). The data likelihood is clearly separated from the structural equations. To calibrate the model with our experimental data on the effect of KBrO 3 on GSH and cH2DCF, we proceeded step by step, increasing the complexity of the model by introducing reactions according to the following schedule: To compare the eventual improvement in fit brought about by the various model refinements we used various measures of model fit to the data: the data log-likelihood, the residual geometric standard deviation (GSD), the Akaike information criterion (AIC) (twice the difference between the number of parameters and the data log-likelihood), the Bayesian information criterion (BIC), and the Deviance information criterion (DIC) [49].

Uncertainty propagation
The output of MCMC simulations is a sample of parameter sets (or parameter vectors) drawn from their joint distribution. Those sets of parameter values were used to rerun the corresponding model to make predictions for unobserved values. This is a type of Monte Carlo simulations in which the MCMC sampler acts as a random parameter values generator. We obtained distributions of predicted values that reflects the uncertainty of all parameter values. For example, when using Eq. (12) to compute a lactate concentration, the uncertainty of all parameters entering the equation was convolved by Monte Carlo sampling and their uncertainty was fully propagated to the result. The same applies to the other models we used.

Dose-response based qAOP
The empirical dose response models given by Eqs (6), 8, and 11 described the KBrO 3 -GSH, KBrO 3 -time-cDCF, and KBrO 3 -time-lactate relationships reasonably well (see Fig. 5 and Fig. 6, top row). Equivalent 2D representations of the time course of cDCF and lactate at the various KBrO 3 concentrations are given in Supplemental material Figs. S2 and S3, respectively. The uncertainty of the model predictions is low for GSH (Fig. 5), and it amounts to about 0.5% to 1.5% for cDCF and 5% to 12% for lactate (this cannot be usefully visualized in Fig. 6 for reasons of readability). Residual uncertainty (an estimate of measurement error) is about 22% for GSH, 20% for cDCF and 30% for lactate. Table 1 summarizes the posterior distributions of the parameter values obtained by Bayesian calibration.
By inversion of the empirical models, we can deduce the relationship between GSH, time, and cDCF or GSH, time, and lactate production (Fig. 6, bottom row). These relationships should, in theory, be independent of the thiol-reactive chemical. They can be used to make predictions, including full parametric uncertainty propagation since we used a Bayesian statistical framework for parameter inference. For example, a dose causing 80% reduction of GSH after 1 hr (i.e., 20% GSH left), in the test conditions described in Methods, should lead to a lactate concentration of 4.6 ± 0.3 [4.1, 5.1] mM (mean, SD, 5 and 95 percentiles) after 3 days of exposure.

Bayesian network qAOP
The fit of the DBN qAOP to GSH, cDCF, and lactate data is shown on Figs. 5 and 7. Equivalent 2D representations are given in Supplemental material Figs. S4 and S5. The fits for GSH and cDCF are less good than those of the empirical models. The fit to the lactate data (Fig. 7) looks very different for the DBN model, compared to the empirical model,  Table 1  is about 50% for GSH, 25% for cDCF and 10% for lactate. The error model, however, is different (normally distributed residuals, rather than log-normally distributed as in the empirical model). Table 2 summarizes the posterior distributions of the parameter estimates obtained. The model parameters have some physical interpretation: Parameter ν controls the speed at which plateaus are reached in Fig. 7.  Table 2 were used to draw the figures.  The β parameters condition the height of the plateaus. However, there is a subtle interplay between convergence speed, plateau level, time and dose, as can be seen in Fig. S5. All parameters are significantly different from zero. The DBN qAOP model does not need mathematical inversion to produce chemical-independent predictions of the levels of cDCF and lactate as a function of GSH depletion and time, because they can be directly simulated (Fig. 7, bottom row). The resulting relationship for cDCF is quite similar to that obtained with the previous qAOP, except for the linearity of the GSH -cDCF relationship. However, the GSHlactate relationship is very different, even though constant exposures to KBrO 3 are simulated in both cases (the simulation is now considering a single medium change at time point zero). Note that lactate starts at zero to reach a plateau in about three days. The relationship between GSH and lactate is predicted to be linear by the DBN model, instead of being strongly nonlinear in the empirical qAOP. As before, predictions with uncertainty estimates can be easily made. For example, the DBN qAOP predicts that a chemical dose causing 80% reduction of GSH after 1 h (i.e., 20% GSH left) leads to a lactate concentration of 5.8 ± 0.4 [5.2, 6.5] mM (mean, SD, 5 and 95 percentiles) after 3 days of exposure. This is significantly different from the prediction of the empirical qAOP.

Systems biology model
The fit of the SB model to the GSH data (calibration step 1) is shown in Fig. 5 (red line). It is better than the fit of the DBN qAOP (residual uncertainty for the GSH data is about 40%), despite the fact that both use the same decreasing exponential relationship between KBrO 3 and GSH. However, k GSHe,KBrO 3 was calibrated to the data independently of the others parameters and its fit is not constrained by the cDCF data. The fits obtained for the KBrO 3 -time-cDCF data at the various model calibration steps (parameters were re-calibrated at each step) are shown on  Table S4. Note that the model takes into account the 4 h of cells pre-incubation with cH2DCFDA, and the simulation therefore starts already before exposure to KBrO 3 (which is defined to occur at time point zero). During that period of time, ROS already starts forming cDCF, explaining the relatively high level of fluorescence at time point zero. At step 2, with just a depletion of extra-cellular GSH by KBrO 3 and the formation of cDCF by ROS the model is unable to explain the data (Fig. 8A). The depletion of extra-cellular GSH has only a minor effect on the intra-cellular GSH level (Supplemental Material Fig. S6). Therefore, only background cellular ROS produces cDCF, at a constant rate, and the accumulation of cDCF is predicted to be linear (according to the experimental protocol cH2DCFDA is expected to be in excess, and not depleted). Allowing cDCF efflux or bleaching offers an explanation for the leveling off of its fluorescence, yet the effect of KBrO 3 is still not explained satisfactorily and the data fit is very poor (Step 3, Fig. 8B), despite the fact that efflux is mediated by MRP, which is under Nrf2 control (the MRP level stays at its background level and is not affected by KBrO 3 , since intra-cellular GSH itself is not significantly lowered). Adding the possibility that KBrO 3 directly oxidizes cH2DCF improves the fit markedly (Step 4a, Fig. 8C), and the residual error σ cDCF goes down to about 20% (see Table 3). However, the effect of KBrO 3 is linear, which is not what the data show. In this case, also, the effect of KBrO 3 on ROS is close to zero. Instead of a direct oxidation of cH2DCF by KBrO 3 , we also tested the possibility that KBrO 3 acts on intra-cellular GSH (Step 4b, Fig. 8D). This has a clear effect on cDCF production, but it is extremely nonlinear and does not lead to a reasonable fit to the data. Finally, in step 5, we put all the above parameters in the model and re-calibrated them. This did not lead to an improvement compared to step 4a (see Supplemental Material Table 4) and the effect of KBrO 3 on intra-cellular GSH was estimated to be nearly absent (data not shown). Table 3 lists the best value (maximum posterior), the mean, the standard deviation and the confidence interval [2.5 percentile, 9.75 percentile] of each of the four parameters calibrated at step 4a (yielding the best and most parsimonious model). The values of the parameters directly related to cDCF do not have an explicit biological interpretation because cDCF is measured in RFU (which should be proportional to concentration, but with an unknown proportionality constant). Note that the cDCF efflux/bleaching rate constant corresponds to a half-life of about 6 h. The SB model can also be used to make predictions, with full uncertainty propagation. For example, a 4 mM concentration of KBrO 3 is predicted to lead to a cDCF fluorescence of 16600 ± 250 [16200, 17100] RFU (mean, SD, 5 and 95 percentiles) after 24 h.

Effectopedia implementation
Effectopedia provides a graphical user interface to build an AOP diagram, which in turn gives easy access to relevant descriptions, data and models. In addition to a qualitative description of the AOP, Effectopedia provides structure for representation of test methods, collected data and executable models implemented in the supported programming languages (R, MATLAB, Java). Effectopedia was used to create several iterations of the AOP diagram. Initially, the sequence of KEs included relevant feedback mechanisms or parallel processes (branches). However, in the following step of identification of measurement methods, some of these events did not have a separate method of observation and were therefore combined into a single KE. Other events were determined to be modification factors rather than being causally related to the AO and were removed from the pathway diagram. The current version of the AOP diagram implemented in Effectopedia is shown in Supplemental Material Fig. S10. Each of the elements in the diagram can be expanded and details can be added to their description. Models were implemented in R and their source code contributed to the description of the in silico models, allowing other users to execute them with the same and/or different data and model parameters.

Discussion
In this paper, we explored various options for quantifying an AOP and deriving chemical independent KERs. Quantitative AOPs have been described previously [9,10], but we strove for a rigorous statistical treatment of the models, which is particularly important for quantifying uncertainties associated with predictions and extrapolations. For that purpose, we used MCMC simulations in a Bayesian framework [43]. Dealing with dose-time-response data significantly complicates the problem and very few off-the-shelf software provide adequate tools and models for such data, despite the fact that time is a key variable in disease progression. While spatial structure is evident in AOP representations (from molecules to cells, tissues etc.), time is implicit, masking large time-scale differences: Molecular reactions typically take seconds, cells respond in hours, tissues in a matter of days, and the whole body can take years to be significantly affected. A qAOP considering only dose and assuming instantaneous or fixed-delay effects would be of limited usefulness for risk assessment. This is particularly true for chronic renal disease, as humans have a large renal functional reserve and ill health is only apparent when that reserve is breached. The time-course of exposure to stressors is also important and should be considered during qAOP calibration, because in vitro cellular concentrations of test chemicals are usually different from the nominal medium concentrations and change with time [52]. To that purpose, qAOPs can be linked with pharmacokinetic models, but only if they are time-consistent. Nevertheless, in the absence of kinetic data on KBrO 3 concentrations in vitro, we considered here the nominal KBrO 3 concentrations to be an adequate measure of exposure. Note also that the AOP we used is illustrative and not OECD approved. We deliberately focused on a short sequence of KEs to demonstrate what can be achieved with different modeling approaches. The link to cell death and the subsequent link to kidney function impairment have not been included in our models given the absence of data on these downstream KEs. A final general comment is that for a complete AOP quantification, data on the effects of several chemicals on the KEs should be studied, in order to make sure that the KERs derived are fully chemical-independent. Such data are currently not available and our qAOP thus only serves to demonstrate quantification methods. Table 4 summarizes the principle, as well as the pros and cons of the three approaches taken. All three require proper statistical treatment to propagate the uncertainty implied by imprecise data measurements through the AOP. An excellent way to propagate uncertainty, and translate it to risk assessment in the form of Monte-Carlo samples, is to use Bayesian model calibration [43,44,49]. In any case, relatively complex and specific software is required. It will be interesting to follow the development of Effectopedia, as it offers a user-friendly and toxicology-specific AOP quantification environment. Conditional on proper statistical treatment and user-defined modeling assumptions, all three methods can describe the data well, albeit with different constraints (discussed below in more detail). Note that consistent dosetime-response relationships found by any of the three methods do support causality and concur a posteriori to other Bradford-Hill criteria. Therefore, qAOP modeling could provide further validation of those criteria.
Dose-response based qAOPs may seem the easiest to develop, as most toxicologists understand what is dose-response modeling. However, such modeling is less user-friendly than it seems. It requires either modeling skills (to find "good" dose-time-response equations) or black box curve-fitting approaches. Given the immense number of possible choices, finding the "best" model given the data is a very difficult task, and the question of structural model uncertainty is acute.   [1][2][3][4][5][6][7][8][9][10][11][12][13] This is compounded by the fact that the data are taken at "face value" and cannot be critically evaluated (except through residual analysis such as outlier detection, but these depend on the model adopted, which is still arbitrary). All this makes the domain of application of empirical qAOPs strictly limited to the time and dose range of the data, and strongly dependent on how relevant the experimental protocol is towards the actual disease process, without providing any indication of that relevance. Furthermore, for correct statistical inference and chemical-independent KERs, some or all dose-time-response relationships fitted must be mathematically inverted. Simply chaining such relationships (that is, using the best predictions for one KE as input to the next KER, as it is often done) does not account for uncertainties in the "independent" variable at each step and does not correctly propagate uncertainty through the AOP. The result would be a largely over-optimistic precision for predictions. DBN qAOPs offer an automatic or standardized way to develop semi-empirical qAOPs, while tuning simply the complexity of the KERs. They can nicely describe complex time dependencies in the data, e.g., they successfully modeled a fairly complex time-dose-relationship for the lactate readout (cf. Fig. 7). The end-results differ visually from those of the dose-response qAOP, because in our DBN the KER links for cDCF and lactate are linearly related to GSH levels (we are currently working on nonlinear extensions of the DBN model). That DBN qAOP is, to our knowledge, the first attempt to use such a model for a continuous dosetime-response predictive model. To accommodate the time-dependency of the data, we used a special formulation of the DBN where time enters the KERs. The work of Jaworksa et al. [12,31,53] pioneered the application of BNs for qualitative (i.e., hazard) assessment of chemicals, and we extend it here to qAOPs and risk assessment. The largest constraint for (D)BNs lies in the design of the experiments providing the data. The same doses and observation times should be used as much as possible. Otherwise, statistical imputation has to be used to obtain uniform dose and time schedules across experiments, and the statistical estimation problem may become overwhelming. From an experimental point of view, however, it might not be feasible to observe the different KEs with the same time frame and on the same time scale, even though it might be possible to simplify time dependencies by considering some effects to be instantaneous in comparison to others. Finally, it is possible to couple PK models with DBN models, either by pre-computing the value of the dose nodes in the DBN with a pharmacokinetic model, or by extending the DBN to simulate the pharmacokinetic data available.
The SB model we developed addresses only part of the CKD AOP: It does not include mitochondrial injury. But, for chemicals causing oxidative injury in the renal proximal tubule, it describes the links between GSH, the control of oxidative stress by Nrf2 and the formation of fluorescent cDCF in a very detailed way. The model is complex though, with 57 differential equations and 335 parameters. However, since it has been already parameterized for RPTEC/TERT1 cells, only the five parameters specific to KBrO 3 and cDCF reactions needed to be calibrated with the current data. We essentially found that a reasonable fit could be obtained if KBrO 3 acts directly on cH2DCF, and that cDCF is transported out of the cells or bleaches significantly with time. We also found that modeling the pre-incubation period gives important information about the cellular background rate of oxidative stress. Such informative modeling is easy to do with a mechanistic model and impossible to do with the previous two approaches. The non-linearity of the effect of KBrO 3 on cH2DCF is not well explained by a first-order reaction, but we did not introduce ad hoc equations or further hypotheses, because the mismatch already leads to the following point of discussion: According to our SB model, neither action on extra-cellular nor on intra-cellular GSH can entirely explain the cDCF data. This questions the application of the GSH readout as a measure of KBrO 3 effect in this AOP. While it is well accepted that thiol depletion can induce oxidative stress, the model suggests that this may not be the main mechanism of action of KBrO 3 in the readout test. Thus, KBrO 3 may not be well suited to quantify our AOP, which also calls into question the results obtained with the other two models. SB models force us to think mechanistically about the data, asking which biochemical reactions could explain them. They can also simulate particular details of the experimental protocols and background cellular processes, improving our understanding of the biology and of the tests themselves. However, we cannot entirely exclude that the model may be misleading, because its many parameters have not all been calibrated perfectly. SB models can also naturally integrate pharmacokinetic models, since those are based on the same principles and same mathematical objects. Therefore, complicated SB models should be seen as investment for the future rather than as quick answer to urgent questions. An Effectopedia implementation of both BN and SB models faces challenges, of which the most important is matching the internal structure of the models to the conceptual structure provided by the AOP. Currently, Effectopedia allows "global models" in which one BN or SB model can cover several KEs. Such models need to have specific outputs matching the AOP KEs. A problem in that approach is the derivation of reusable KERs. If the global model contains complex time or variable dependencies between non-adjacent KEs, they need to be explicitly represented in the AOP as feedbacks, feed-forwards or modifying factors. However, extracting such dependencies is non-trivial. Alternatively, the AOP can be re-designed if the global model indicates that some tightly coupled KEs can be merged.

Conclusion
The three approaches tested have different advantages. Dose-response based qAOPs may seem the easiest to develop at first sight, but they have very limited extrapolation and explanatory power. Bayesian networks are in fact easier to develop, once the technology is mastered, but they impose either strong constraints on experimental design (fixed dosing and observation schedules) or require complex statistical treatment (imputation). Systems biology models are more complex to develop, but one can strive for parsimony, as when we simplified the gene regulation part of our model. Importantly, they offer insight in the data relevance and biology that the other approaches cannot afford. In any case, the three approaches we presented can all fully propagate uncertainty about qAOP predictions, which is essential for proper risk assessment. The contrasted results we obtained demonstrate that the choice of approach is not neutral. They also emphasize the importance of data collection on: -In vitro kinetics, to understand and take into account the fate of the chemicals in the test system; -Baseline behavior of the cells, in the absence of chemical exposure.
For that, raw experimental data should be delivered to modelers without pre-processing such as normalization to background values. If such normalization had been applied to our cDCF data, for example, we would have lost important information on background ROS production. Correcting for background may impair essential mechanistic understanding of AOPs, which are as much about the underlying biology as about the effects of stressors; -Different readouts, to select the most relevant one for the underlying KE or to better understand a complex KE (such as oxidative stress); -Other chemicals to check whether the parameterized KERs are robust and really chemical-independent.
To avoid pitfalls in qAOP development, we suggest to take at least two approaches in parallel: First, a mechanistic modeling path, able to help test hypotheses, design experiments and deeply understand the results; Second, because we cannot always wait to have a fully mechanistic model developed, a lighter statistical approach. At the moment dose-response based modeling is the simplest, but we hope that we can contribute to a more wide-spread dissemination of dynamic Bayesian networks in this area. In this spirit, one of the goals of the Effectopedia platform is to facilitate the creation of qAOPs by integrating and comparing the results brought by various modeling approaches.