Validation of in vitro methods for human cytochrome P450 enzyme induction: Outcome of a multi-laboratory study

CYP enzyme induction is a sensitive biomarker for phenotypic metabolic competence of in vitro test systems; it is a key event associated with thyroid disruption, and a biomarker for toxicologically relevant nuclear receptor-mediated pathways. This paper summarises the results of a multi-laboratory validation study of two in vitro methods that assess the potential of chemicals to induce cytochrome P450 (CYP) enzyme activity, in particular CYP1A2, CYP2B6, and CYP3A4. The methods are based on the use of cryopreserved primary human hepatocytes (PHH) and human HepaRG cells. The validation study was coordinated by the European Union Reference Laboratory for Alternatives to Animal Testing of the European Commission's Joint Research Centre and involved a ring trial among six laboratories. The reproducibility was assessed within and between laboratories using a validation set of 13 selected chemicals (known human inducers and non-inducers) tested under blind conditions. The ability of the two methods to predict human CYP induction potential was assessed. Chemical space analysis confirmed that the selected chemicals are broadly representative of a diverse range of chemicals. The two methods were found to be reliable and relevant in vitro tools for the assessment of human CYP induction, with the HepaRG method being better suited for routine testing. Recommendations for the practical application of the two methods are proposed.


Introduction
The toxicity profile of an exogenous chemical (xenobiotic) to which the body is exposed depends not only on the toxicity of the parent compound, but also on any toxicologically relevant metabolites that may be formed during metabolism and on the xenobiotic's ability to induce biotransformation enzymes that affect its rate of metabolism (Tsaioun et al., 2016). Information on metabolism, including metabolic activation by CYP induction, is useful in toxicity testing strategies, for example to support in vitro to in vivo extrapolation (Coecke et al., 2005;Coecke et al., 2006;Wilk-Zasadna et al., 2015).
Over the last two decades, considerable progress has been made in developing in vitro metabolism methods based on human test systems (Coecke et al., 2013;Donato et al., 2008;Vermeir et al., 2005;Vinken and Hengstler, 2018). However, there were no formally validated test systems based on intact functional human hepatic cells capable of maintaining key metabolic activity functions for up to 3 days in culture.
Of all xenobiotic-metabolising enzymes, the Cytochrome (CYP) P450 enzymes are of particular importance due to their abundance and functional versatility (Raunio et al., 2015). They may transform a xenobiotic into a harmless metabolite (detoxification) or, vice versa, a nontoxic parent compound into a toxic metabolite. Besides detoxifying https://doi.org/10.1016/j.tiv.2019.05.019 Received 6 March 2019; Accepted 29 May 2019 xenobiotics, CYP enzymes play a key role in the biosynthesis of endogenous substrates such as steroid hormones, prostaglandins and bile acids. Therefore, xenobiotic CYP enzyme induction may cause dysregulation of normal metabolism and homeostasis, with potential toxicological effects (Staudinger et al., 2013;Amacher, 2010).
CYP enzyme induction has been selected as the biological endpoint to validate cryopreserved primary human hepatocytes and the cryopreserved human HepaRG cell line (hereafter referred to as PHH and HepaRG cells, respectively) as reliable hepatic metabolically competent test systems, as it requires the whole molecular machinery (i.e. receptor and transporter expression, transcription, translation and expression of functional CYP enzymes) to be present and functional in the test system.
At the molecular level, CYP enzyme induction is a rather slow process, controlled by a set of nuclear receptors associated with downstream signal transduction pathways. The process is initiated by the binding of endogenous or exogenous ligand(s) to specific nuclear receptors/transcription factors 1 , namely the aryl hydrocarbon receptor (AhR), the constitutive androstane receptor (CAR), and the pregnane X receptor (PXR). AhR, PXR and CAR are primarily responsible for inducing transcription of the CYP1A, CYP3A and CYP2B families, respectively (Hakkola et al., 2018). In addition to mediating detoxification, CAR, PXR and AhR have been implicated in the regulation of a broader range of physiological functions (Kretschmer and Baldwin, 2005;Sueyoshi et al., 2014;Wang and Tompkins, 2008), where dysregulation can lead to adverse effects (Hakkola et al., 2018) such as inflammation (Rubin et al., 2015;Christmas, 2015), cholestasis, steatosis (Gómez-Lechón et al., 2009), hepatotoxicity (Woolbright and Jaeschke, 2015), carcinogenesis (De Mattia et al., 2016;Pondugula et al., 2016;Fucic et al., 2017), and thyroid disruption (Patrick, 2009). Thus, in these cases, CYP induction serves as a biomarker for key events associated with adverse health effects.
In the area of regulatory toxicology, validation plays an indispensable role by independently establishing the relevance and reliability of a method for a specific purpose (Hartung et al., 2004), thereby promoting its regulatory acceptance. With a view to promoting the uptake of CYP induction methods in the regulatory assessment of chemicals, the European Union Reference Laboratory for Alternatives to Animal Testing (EURL ECVAM, 2014) of the European Commission's Joint Research Centre (JRC) organised a multi-laboratory validation study to assess the ability of two human in vitro metabolically competent test systems, namely PHH and HepaRG cells, to reliably predict the induction status of CYP1A2, CYP2B6 and CYP3A4 upon exposure to selected chemicals (i.e. known human in vivo inducers and non-inducers, see Table 1). The selected CYP enzymes are globally accepted as alternative biomarkers of CYP induction in the regulatory guidelines of pharmaceutical agencies (FDA, 2017;EMA, 2012). They are also expressed in human liver and human intestine (Lindell et al., 2003) and are inducible by well-established reference chemicals (Lehmann et al., 1998;Gibson et al., 2002;Chen et al., 2004;Wang et al., 2004;Sueyoshi et al., 1999; ). This paper describes the organisation and execution of the EURL ECVAM validation study and presents the results obtained. The validation study was conducted in accordance with several reliability considerations (e.g. serum free medium, inclusion of reference items) detailed in the recently published OECD guidance document on good in vitro method practices (GIVIMP) (OECD, 2018).

Choice of test systems for in vitro metabolism methods
The in vitro test systems available for human metabolism studies include human intact cells (tissue slices, isolated and cultured hepatocytes, liver cell lines) and subcellular fractions (microsomes, recombinant enzymes) (Coecke et al., 2006;Donato et al., 2008;Vermeir et al., 2005). Among these possible test systems, PHH and HepaRG cells were identified as the most promising to include in the validation study (Mandenius et al., 2011; Vitrocellomics project https://cordis.europa. eu/project/rcn/85218/reporting/en). PHH contain all drug/xenobiotic metabolising enzymes and cofactors, and are considered relevant for a variety of toxicokinetic and toxicodynamic applications where it is important to account for inter-individual variation (Godoy et al., 2013). Similarly, the HepaRG cell line maintains metabolic capacity comparable to human hepatocytes, including expression of liver metabolising enzymes, nuclear receptors, and hepatic xenobiotic transporters (Aninat et al., 2006;Le Vee et al., 2006;Turpeinen et al., 2009;Andersson et al., 2012).
In addition to biological relevance, practical considerations were also important in the choice of test systems. Recent developments in cell cryopreservation and optimisation of seeding conditions have facilitated continuity of commercial supply for both PHH and HepaRG cells. The responses of both cryopreserved PHH and HepaRG to specific inducers are similar to those of freshly isolated PHH (Abadie-Viollon et al., 2010;Aninat et al., 2006;Alexandre et al., 2012;Andersson et al., 2012). Availability of chemically defined culture media allows these test systems to be maintained without the use of foetal calf serum thereby avoiding undefined media components and increasing the reproducibility of culture conditions (OECD, 2018).

Organisation of the validation study
The validation study was conducted as a ring trial among six laboratories. Four were already technically proficient in one of the methods (KaLy-Cell and AstraZeneca for PHH; Pharmacelsus and Janssen Pharmaceutica for HepaRG cells). In addition, EURL ECVAM participated for both methods, effectively acting as two laboratories ( Fig. 1). Independent experts with a supervisory role formed a validation management group (VMG), with international organisations for validation of alternative methods were also represented [NICEATM/ ICCVAM (USA) and JaCVAM (Japan)]. Management of the set of validation chemicals (i.e. acquisition, coding and distribution) and the statistical data analysis were carried out by EURL ECVAM.
Following the modular principles of validation (Hartung et al., 2004;OECD, 2005), the scope of the study included test definition, within-laboratory reproducibility (WLR), transferability, and between laboratory reproducibility (BLR). The methods were first evaluated by the lead laboratories (KaLy-Cell for PHH, and Pharmacelsus for HepaRG cells) to confirm standard operating procedures (SOPs). The subsequent training and transfer to the other participating laboratories was then followed by testing a validation set of 13 selected chemicals by all six laboratories for BLR. For both PHH and HepaRG cells, three biological replicates (different cell batches) were run by each laboratory.

Chemical selection
To provide insight into the predictive capacities of the evaluated in vitro methods, a principal requirement for chemical selection was the availability of adequate (relevant and reliable) human in vivo data sets. Key data sources used were comprehensive review articles (Pelkonen et al., 2008;Hukkanen, 2012) and a database hosted by Washington University (https://www.druginteractioninfo.org/). Original references cited in the review articles were also compiled for the record.
An important practical consideration was that there was only a 1 PXR, CAR and AhR are often called 'nuclear' receptors, although AhR belongs actually to the family of basic-helix/loop/helix (bHLH)-receptors (bHLHe76) and only PXR and CAR belong to the family of nuclear receptors (NR1I2 for PXR and NR1I3 for CAR).
( limited set of pharmaceuticals with adequate in vivo human reference data for each of the CYP isoforms investigated. This is the reason why the chemical selection was limited to pharmaceuticals with in vivo human data available from clinical monitoring. For the purpose of the CYP induction validation study, a validation set of 13 selected chemicals (Table 1) was used to assess the predictive capacity of the two in vitro methods. Furthermore, specific reference items with known induction potential for CYP1A2, CYP2B6, and CYP3A4 are indicated in Table 2. These reference items are used to calculate the response to the blind coded validation set of chemicals. For both CYP induction methods acceptance criteria for the reference items (acting also as positive control items) were established to provide evidence that the PHH and the HepaRG cells are responsive under the actual condition of the two in vitro methods (OECD, 2004(OECD, , 2018; https://tsar.jrc.ec.europa.eu/testmethod/tm2009-13 and https://tsar.jrc.ec.europa.eu/test-method/ tm2009-14).

Chemical space analysis
The chemical space covered by the validation set of 13 chemicals and the 3 reference items (with rifampicin being both a blind coded validation chemical and a reference item) is represented by showing the position of these in a similarity space formed by other chemicals found in relevant lists (e.g. REACH registered substances downloaded from ECHA (2017), approved drugs from the Drugbank database (2018), and Tox21 chemicals NIH (2018)).
To determine chemical similarity, Tanimoto similarity analysis (Bajusz et al., 2015) was applied. The Tanimoto similarity metric between two chemicals is based on the number of 2D structural features they have in common compared with the total count of structural features that are used by the selected fingerprints that represent the molecules.
The chemical space analysis was performed post-hoc and did not affect the CYP induction validation data analysis or the in vitro classification of validation set of chemicals.

The human CYP enzyme induction in vitro method
The Standard Operating Procedures (SOPs) for the two methods are available from the EURL ECVAM Tracking System for Alternative methods towards Regulatory acceptance (TSAR) where they are indicated as in vitro method No. 193 (https://tsar.jrc.ec.europa.eu/testmethod/tm2009-13 for PHH) and No. 194 (https://tsar.jrc.ec.europa. eu/test-method/tm2009-14 for HepaRG cells). TSAR provides an overview of alternative (non-animal) methods that have been proposed for regulatory safety or efficacy testing.
The validation set of 13 selected chemicals and reference item stock  For each chemical of the validation set, only soluble and non-cytotoxic concentrations were tested in triplicate in the CYP enzyme induction experiments. Therefore, before assessing CYP enzyme induction potential, solubility and cytotoxicity were separately and independently assessed by the six laboratories (with EURL ECVAM representing two test facilities for HepaRG cells and PHH).

Solubility
The SOPs prescribed 40 mg/mL as the initial concentration for determination of solubility. The SOPs relied on visual inspection for solubility observation. In case of apparent insolubility in DMSO or precipitation in medium, dissolution was attempted by incremental twofold dilution (20, 10, 5 mg/ml) as necessary. The absence of precipitation in the medium was checked pre-and post-incubation (24 hours) by centrifugation of the sample and observation of any precipitation (pellet residue).
In addition to the visual inspection performed by the validation laboratories as described in the SOP, EURL ECVAM introduced nephelometry for systematic solubility determination of the 13 validation set chemicals. Nephelometry uses a laser beam and the principle of Tyndall effect light scatter to detect turbidity due to insolubility. The nephelometer method used formazin as reference item (1, 5, 10 and 20 nephelometric turbidity units (NTU)) to calculate the relative turbidity (RTU) of the validation set of chemicals. To the naked eye, 20 NTU is perceptible, while the nephelometer was sensitive to 5 NTU, with 1 NTU equivalent to background (solvent/medium blank).
A definition for solubility was adopted by setting 5 and 10 NTU formazin reference items as turbidity thresholds. Effectively, for stock solutions and medium dilutions, relative turbidity equivalent to < 10 NTU was defined 'soluble' while > 10 NTU was defined as 'insoluble'. Considering instrument sensitivity, turbidity between 5 and 10 NTU was refined as 'solubility limit' (still effectively 'soluble').
However, since the nephelometry was an extension to the project, with definitive results only available at a later stage, solubility of the validation set of chemicals for the in vitro experiments was concluded only from the visual inspections done by the validation laboratories.

Cytotoxicity
Potential cytotoxicity of the validation set chemicals for PHH and HepaRG cells was determined starting from the highest soluble concentration, followed by a 1:1 or 1:3 dilution for PHH and HepaRG cells, respectively. The incubation time reflected the conditions used for the induction assays (72 hours and 48 hours of incubation for PHH and HepaRG cells, respectively). The cytotoxicity assay was based on the conversion of redox dye resazurin to fluorescent resorufin by living cells. Non-viable cells, without metabolic capacity, yield no fluorimetry signal.
Results were expressed as fractional survival (FS %) with respect to untreated controls and were calculated based on measured relative fluorescent units (RFU), corrected for the background signal:

treated cells background untreated cells background
Cytotoxicity was evaluated from the dose-response curve, where the mean FS (%) of three technical replicates was plotted versus the corresponding concentration.

CYP enzyme induction assay
The CYP induction assays involved exposure to the validation set of chemicals at 6 serial dilutions (1:3 ratio) over 72 hours (PHH) or 48 hours (HepaRG cells) with medium renewal every 24 hours. The HepaRG cells and PHH assays included three technical replicates (triplicate of validation set chemicals) repeated with three biological replicates (different cell batches or donors). Parallel assay of the reference items at appropriate concentrations (Table 2) provided experimental positive controls. Cells exposed to solvent (i.e. 0.1 % DMSO) diluted in medium served as the negative control.
CYP enzyme activity was determined by applying fresh medium containing a combination ("cocktail") of the CYP-selective probe substrates phenacetin (CYP1A2), bupropion (CYP2B6) and midazolam (CYP3A4) 3 (Fig. 2). Plate formats (48-well for PHH and 96-well for HepaRG cells) and exposure times (72h for PHH and 48h for HepaRG cells) were previously optimised for sensitivity of the test systems to potential inducers. The plate layouts allowed triplicate testing of two validation set chemicals at six concentrations.
For quantitative analysis of CYP enzyme activity, the formation of specific products by the respective isoenzyme (Table 2) namely acetaminophen (CYP1A2), hydroxybupropion (CYP2B6) and 1-hydroxymidazolam (CYP3A4) was quantified by liquid chromatography-mass spectrometry (LC/MS) analysis. Different LC/MS systems (e.g. Varian, Thermofisher, Waters) were used by the participating laboratories. Prior to routine operation, the different LC/MS instruments in use by the laboratories were to be confirmed as compliant with required the performance criteria. LC/MS analytical method protocols for metabolite quantification were validated for accuracy, precision, lower and upper limits of quantitation (LLOQ and ULOQ, respectively) and method linearity, consistent with guidelines of the European Medicines Agency (EMA, 2011), the updated guidelines of the Food and Drug Administration (FDA, 2018) and the OECD (OECD, 2018). LLOQ of 2.30 nM for acetaminophen, 1.15 nM for hydroxybupropion and 1.15 nM for 1-hydroxymidazolam were required before proceeding with sample analysis. Griseofulvin or 5.5-diethyl-1.3-diphenyl-2-iminobarbituric acid were included as reference items allowing correction for any loss of analyte during sample preparation and sample injection.
Analytical assay acceptance criteria were adapted from FDA Guidance for Industry Bioanalytical Method Validation (2001) and from Shah et al. (2000) and Viswanathan et al. (2007).
Quantitative analytical data for the specific products were normalised per protein content per well. Protein quantification was assessed by the Pierce Bicinchoninic Acid method. The protein concentration of the tested sample was interpolated from a bovine serum albumin standard curve in 0.1 M NaOH using the linear regression where the standard curve is a plot of the average blank-corrected absorbance for each standard vs. its concentration in mg/ml. Interpolated data are accepted as long as the coefficient of determination (R 2 ) for the linear regression is equal or greater than 0.9, in accordance with the SOP.

Data analysis
The inclusion of relevant reference and control items, and setting of acceptance criteria for performance on the basis of historical data, is essential for regulatory applicability of in vitro methods (OECD, 2018). The acceptance criteria used in the validation study are explained in detail in the specific method SOPs (https://tsar.jrc.ec.europa.eu/test-method/ tm2009-13 and https://tsar.jrc.ec.europa.eu/test-method/tm2009-14).
2 HepaRG: GlutaMAX™ with serum-free supplement PHH: HMM (hepatocyte maintenance medium) 3 probe substrates are not 100% selective for the assigned CYP enzyme, but each CYP enzyme is the primary and major catalyst of the probe reaction.
C. Bernasconi, et al. Toxicology in Vitro 60 (2019) 212-228 2.6.1. Evaluation of CYP enzyme induction Each induction plate included wells for the measurements of basal CYP1A2, 2B6 and 3A4 activities (i.e. cells exposed to the negative control (0.1% DMSO)) and of reference items induced activities. The CYP enzyme induction activity results were expressed as CYP activities in pmol of specific products/min/mg protein.
The induction potential of the validation set chemicals and the reference items was calculated as n-fold increase relative to the negative control (0.1% DMSO) averaged over the three replicates: = n fold CYP induction Validation set chemical CYP activity or Reference item CYP activity Negative control CYP activity One key acceptance criterion for each assay was that reference items (positive controls) were required to produce ≥ 2-fold CYP induction with respect to enzyme basal activity. A validation set chemical with ≥2-fold induction potential has been described as an in vitro positive inducer . However, based on the validation study results, to ensure consistency (avoiding false positives), it was also required to have at least two consecutive concentrations in the dose-response generating ≥ 2-fold induction to classify test items as in vitro positive inducers.

Reproducibility
The capability of an in vitro method to provide reliable results is an important characteristic evaluated in validation studies. For the CYP enzyme induction method the focus was mainly on comparison of assigned classifications across different batches (between-batch reproducibility; BBR) and across laboratories (between-laboratory reproducibility; BLR).
For a given CYP, two reproducibility measures based on assigned classifications were evaluated. First, the reproducibility of results across three batches (BBR) was evaluated for each laboratory. Secondly, the reproducibility of results across three participating laboratories (BLR) for a given batch was assessed.
More precisely, we define BBR L and BLR B measures as follows: • BBR L represents the percentage of validation set chemicals that have concordant classifications across three batches tested in laboratory L.
• BLR B represents the percentage of validation set chemicals that have concordant classifications across three participating laboratories for batch B.
In addition to the measures above, the aggregated measure BBR and BLR is constructed as an average across three laboratories and batches, respectively.
The BBR was used a proxy for within-laboratory reproducibility (WLR) which could not be directly evaluated in certain cases. Particularly in the case of PHH, batches were provided only once. However, this is not considered to be a shortcoming, since the BLR can be regarded as the more conservative (lower) estimate of reproducibility.

Relevance (predictive capacity)
Comparison of the study results to human CYP induction used a ratio of in vivo plasma concentrations (Cmax) to in vitro concentrations producing 2-fold induction (F2 values) (Weiss and Haefeli, 2006;Grime et al., 2010). A ratio of > 0.5 was the criterion used to predict an in vivo CYP enzyme induction response. This is a rather conservative threshold, implying that an in vitro concentration resulting in 2-fold induction was significant at half the Cmax value. Alternatively, Cmax/EC 50 values could be used, but for some of the 13 validation set chemicals a full dose-response curve for calculation of an EC 50 was not attained. The human in vivo classifications (inducer, non-inducer) for the validation set chemicals were based on literature data (Table 1).
To classify each chemical as a positive in vitro inducer, a positive induction result in one donor (PHH) or batch (HepaRG cells) in each of the three laboratories was required. This criterion is just a very cautious interpretation of FDA guidance (FDA, 2017).

Solubility
The six laboratories uniformly reported 40 mg/ml in DMSO stock solution as soluble for 12 of the 13 validation set chemicals. The exception, phenytoin, was soluble at 40 mg/ml using a 1:1 blend of DMSO with water. The stock solution observations were confirmed by nephelometry, where only background signals equivalent to solvent blank were measured.
For the dilutions in medium, 40 μg/ml was consistently observed C. Bernasconi, et al. Toxicology in Vitro 60 (2019) 212-228 among the laboratories to be stable for 8 of the validation set chemicals, while some discordance of solubility was reported for the others (Table 3). In particular, nephelometry detected insoluble suspensions for indole carbinol, efavirenz and phenytoin, illustrated by turbidity graphs for the two media (Fig. 3). Precipitation of indole carbinol, notably intense in HepaRG medium at 40 μg/ml was also persistent at 20 μg/ml, and even perceptible at 10 μg/ml in both media based on nephelometer measurements. Relative turbidity at 10 μg/ml was 5 NTU < RTU value < 10 NTU equivalents.
At 40 μg/ml significant precipitation was also observed for efavirenz in HepaRG medium and phenytoin in PHH medium, although only initially (time zero). Repeat measurements indicated borderline solubility limit for efavirenz at 40 μg/ml in PHH (post-incubation). Also in PHH, trace turbidity was evident for phenytoin at 20 and 10 μg/ml (pre-incubation). Based on the nephelometry measurements, efavirenz was concluded soluble at 20 μg/ml in HepaRG medium with a solubility limit at 40 μg/ml in PHH medium. Conversely, phenytoin was concluded soluble in HepaRG medium at 40 μg/ml and in PHH medium at 20 μg/ml.
Based on the visual inspection observations available at the time of the in vitro method implementation, the VMG excluded indole carbinol from the testing program due to uncertain solubility (Table 3).

Cytotoxicity
The maximum soluble and non-cytotoxic concentrations applicable to the CYP induction in vitro methods for all 13 validation set chemicals are shown in Table 4. For HepaRG cells, rifabutin and efavirenz were cytotoxic based on the SOP acceptance criteria and therefore excluded from the CYP enzyme induction assay. For PHH, rifabutin, bosentan and efavirenz were tested for in vitro CYP induction at starting concentrations of 20, 10, and 2.5 μg/ml, respectively.

Chemical space analysis
The chemicals used in the CYP induction validation study are drugs. However, these are not the only type of chemical that may interact with CYP receptors, as it is not the chemical use that gives them the "ability" to interact with CYP receptors but their chemical structure. The chemical space covered by the 13 test chemicals and the 3 reference chemicals (with rifampicin being both a blind coded (test) chemical and a reference chemical) used in the validation study (Fig. 4) is represented by showing the position of these in the similarity space formed by other chemicals found in relevant lists (i.e. REACH, Drugbank, and Tox21). Once duplicates and chemicals without a defined structure were filtered out, the total number of chemicals conforming the similarity space was 7461.
The chemical space in Fig. 4 shows the chemicals of the lists mentioned above positioned with respect to their structural similarity. The axes of the plot correspond to the first 2 principal components of the similarity matrix calculated using the atomic pairs fingerprints used to describe the chemicals (Landrum G. RDKit: Open-source informatics. 2015. http://www.rdkit.org). In Fig. 4, structurally similar chemicals are placed next to each other and chemicals that are increasingly different are placed further from each other. Indole-3-carbinol and rifabutin, for instance, are placed at the top and bottom of the chemical space. The rest of the validation set chemicals are distributed between these two. This indicates that the structural diversity of the validation set chemicals is high. Chemicals in Fig. 4 have been coloured by their list of origin. In order to facilitate the visualisation, chemicals that were not similar to any of the CYP induction validation study chemicals, i.e. Tanimoto similarity < 0.5, were plotted in grey, regardless of the chemical list to which they belonged.

Evaluation of CYP induction potential
Based on the solubility and cytotoxicity acceptance criteria, 10 chemicals were further tested with HepaRG cells and 12 with PHH (Table 4). A validation set chemical was classified as an in vitro inducer if the CYP induction was ≥2 fold at two or more consecutive concentrations tested. The assigned classifications are reported in Tables 5a for HepaRG, and in Tables 5b for PHH.

Reproducibility
Results for between batch reproducibility BBR L and between laboratory reproducibility BLR B are summarised in Tables 6a-6d. For all three CYP enzymes, a consistently higher reproducibility for both BBR L and BLR B was obtained for HepaRG cells compared to PHH. This is likely due to the single donor source of the HepaRG cell batches, while PHH originated from three different donors.
For HepaRG cells BBR L values are similar for a given batch except in the case of CYP1A2. The BBR L for CYP3A4 is 90-100%, for CYP2B6 60-70%, and for CYP1A2 between 50% and 100% (Table 6a).
The lower BLR for CYP1A2 in the case of PHH may reflect its higher variation in expression across individuals coupled with the 2-fold threshold definition for induction. The use of a higher cutoff value for induction (e.g. 5-fold) would decrease sensitivity to background noise and probably increase reproducibility for this enzyme.

Predictive capacity
An overview of the predicted and reference classifications for CYP1A2, CYP2B6 and CYP3A4 in PHH and HepaRG cells is presented in Tables 7-12. Omeprazole is not a CYP1A2 inducer in vivo at normal doses (20-40 mg) while it has been found to be a weak inducer at 120 mg doses or in poor metabolizers (40 mg dose) reaching high plasma levels (Andersson, 1996). The Cmax values used in Tables 7 through 12 refer to a normal dose. In vitro omeprazole has been used in other studies as a positive control at concentrations not relevant for a normal dose.
HepaRG cells predicted human in vivo CYP1A2 induction for the four true positives (Table 7). Four in vivo negatives were also correctly classified. For bosentan and artemisinin, in vivo data were lacking. For bosentan CYP1A2 induction would be expected from the Cmax/F2 ratio (0.45-0.74) > 0.5, but artemisinin would not be expected to induce CYP1A2 in vivo.
PHH correctly predicted CYP1A2 induction for one of the four in vivo inducers: phenytoin (Table 8). All five of the in vivo negatives were correctly predicted.
For artemisinin, PHH predicted no CYP1A2 induction. However, the correctness of the prediction cannot be evaluated due to the in vitro variability and absence of human in vivo data. Clinical studies were also lacking for bosentan and efavirenz (both indicated as non-inducers by PHH) precluding verification of in vitro predictive capacity for CYP1A2 C. Bernasconi, et al. Toxicology in Vitro 60 (2019) 212-228 induction. PHH misclassified rifampicin as a non-inducer of CYP1A2, contrary to published in vivo data (Köhle and Bock, 2009;Hoffmann et al., 2014;Derungs et al., 2016). However, rifampicin induction of CYP1A in vivo is weak and difficult to capture in PHH (Moscovitz et al., 2018;Rae et al., 2001). The discrepancy may be also related to apparent variability of individual hepatocyte batches (Abadie-Viollon et al., 2010;Yajima et al., 2014). PHH misclassified also sulfinpyrazone and carbamazepine as a non-inducer of CYP1A2, contrary to published in vivo data. For CYP2B6, the four inducers carbamazepine, phenytoin, artemisinin, and rifampicin were correctly classified by HepaRG cells; the three in vivo negatives (penicillin, metoprolol and sotalol) were also correctly predicted (Table 9).
In the absence of human in vivo data on CYP2B6 induction for sulfinpyrazone and bosentan, both were predicted as positive inducers by HepaRG cells at clinically relevant doses. In HepaRG cells omeprazole was predicted as a non-inducer of human CYP2B6 consistent with observations in vivo at clinically relevant doses. CYP2B6 induction by sulfinpyrazone has been demonstrated in human hepatocytes  supporting the positive prediction by HepaRG cells. Assuming validity of the in vitro results for omeprazole and sulfinpyrazone, the positive outcome for bosentan may similarly be inferred as true.
The results for CYP2B6 induction by PHH are the same as for HepaRG. Rifabutin (only tested in PHH) also induced CYP2B6, although the absence of in vivo human data precluded direct comparison (Table 10). The positive predictions for carbamazepine, phenytoin, artemisinin, efavirenz and rifampicin, and the negative results for  Chemicals depicted in "grey," regardless of the list to which they belong, correspond to chemicals with a Tanimoto similarity < 0.5 with respect to the validation study chemicals.
Predictions of CYP3A4 induction were correct for both HepaRG cells and PHH, except for artemisinin (Tables 11 and 12). The four non-inducers were also correctly predicted by both test systems.
Although artemisinin was indicated as negative in vitro inducer by HepaRG cells and as positive in vitro inducer by PHH, the latter was observed at concentrations above Cmax for human in vivo. On this basis, artemisinin was predicted to be a non-inducer. Variability is also evident in clinical studies on CYP3A4 induction by artemisinin: a study of midazolam metabolite/parent ratio indicated CYP3A4 induction (Asimus et al., 2007), whereas no CYP3A4 induction by the omeprazole sulfone formation and cortisol metabolic ratio was reported (Svensson et al., 1998).

Discussion
CYP induction, requiring de novo protein synthesis, is a sensitive biomarker for phenotypic hepatic metabolic competence. For the first time, PHH and HepaRG cells have been formally validated as metabolically competent test systems for the functional assessment of CYP1A2, CYP2B6 and CYP3A4 induction. The measurement of functional CYP enzyme induction (i.e. catalytic activity) is considered more informative than measurements of mRNA, since correlations between the CYP-selective activity and the specific CYP mRNA level are frequently poor or lacking (Abass et al., 2012;Choi et al., 2013;Mwinyi et al., 2011;Nakajima and Yokoi, 2011;Surapureddi et al., 2011).
The ring trial results show adequate (Table 6a-6d) within and between laboratory reproducibility, demonstrating that both methods are transferable to laboratories experienced in cell culture techniques and analytical chemistry. The design and conduct of the validation study followed best practices. For example, the methods avoided the use of serum, which has a complex composition and introduces undefined components into the medium, thereby affecting reproducibility of results. Provisions such as this are now explicitly documented in the recently published OECD guidance on Good In Vitro Method Practices (GIVIMP; OECD, 2018).
The results also show that the two in vitro methods provide reasonable predictions of the in vivo CYP enzyme induction of chemicals (Table 6a-6d), allowing the choice of test system to depend upon the assessment context (discussed further below). Both test systems correctly responded to the reference inducers (BNF, PB, and RIF) and correctly predicted in vivo human CYP induction for all the blind coded chemicals tested, except for carbamazepine, sulfinpyrazone and rifampicin in PHH (Table 13). In some cases, the absence of adequate human data (i.e. the available in vivo data for CYP3A4 induction by artemisinin were inconsistent) precluded an assessment of predictivity (yellow boxes in Table 13).
Although the validation set of chemicals was of limited size due to the availability of human data, the chemical space analysis shows that these chemicals span a relative large area of the chemical space formed by REACH registered substances, Drugbank approved drugs and some Tox21 chemicals. This suggests that that the CYP induction methods may be applicable to a structurally diverse range of chemicals.
The mechanistic relevance (metabolic competence) of the in vitro methods is based on the fact that the entire catalytic machinery Table 5a HepaRG. Assigned classifications (1=Positive=inducer, 0=Negative=non-inducer). The batch is identified by two digits reported above.
In general, the functional measurement of CYP induction should be sufficient. However, the parallel measurement of mRNA might be warranted in some specific cases, for example when the chemical is both a CYP inhibitor and inducer (Einolf et al., 2014). Xing et al. (2012) describes the auto-induction phenomenon for artemisinin following observation of more induction of CYP3A4 transcripts than activity as the artemisinin concentration increased. This supports the hypothesis that at higher artemisinin concentrations weak or slow inactivation may dampen the increase in CYP3A4 activity relative to mRNA transcripts.
CYP induction is a concentration-dependent process. Therefore, the assessment of the predictive capacity of human in vitro methods needs to take into account realistic human in vivo concentrations of a chemical. This should preferably be the concentration at the site of action, but plasma concentration generally serves as a more convenient and suitable exposure metric. A case in point was omeprazole, indicated by PHH and HepaRG cells as an in vitro inducer of all three CYP isoforms. However, the concentrations producing 2-fold induction significantly exceeded the clinical Cmax, and consequently omeprazole was concluded as non-inducer. Omeprazole has been demonstrated to induce CYP1A2 in humans, measured by caffeine metabolism or phenacetin clearance, but only at elevated doses above the clinical norm, or in      Bernasconi, et al. Toxicology in Vitro 60 (2019) 212-228 subjects with poor omeprazole metabolism (Rost et al., 1994 andRost et al., 1999). Nevertheless, the omeprazole example illustrates the need for rational comparison. With chemicals other than drugs, concentrations could be obtained from human biomonitoring studies or, if an actual measured concentration is not available, from a calculation based on external exposure assumptions, for example by using physiological biokinetic (e.g. PBPK) models (Bessems et al., 2014). For the three validation set chemicals acting as negative controls in vivo, i.e. penicillin, metoprolol, and sotalol, in vitro results were concordant, although metoprolol demonstrated some response in isolated cases with PHH. Despite the fact concentration-response curves were consistent in these isolated cases, concentrations for induction were    Bernasconi, et al. Toxicology in Vitro 60 (2019) 212-228 much higher than clinical Cmax concentrations. Therefore, metoprolol was predicted to be a non-inducer in vivo. PHH missed the in vivo prediction for CYP1A2 induction by carbamazepine, sulfinpyrazone and rifampicin, possibly related to variability of individual hepatocyte batches. The choice of PHH or HepaRG cells as test system is largely dependent on the application. PHH have long been limited by availability. With the current successes in cryopreserving (Abadie-Viollon et al., 2010;Alexandre et al., 2012) progress has been made allowing quality control and generation of test system characterisation data by commercial providers. Inevitably, PHH are subject to variability among individual donors (Costa et al., 2014) which might be desirable for certain applications where population variability data are necessary. For other routine chemical testing applications, immortalised cell lines have been proposed. Among these, HepaRG cells provide an immortalised hepatocyte human cell line with relevant in vivo functions (Aninat et al., 2006;Le Vee et al., 2006) and continuity of batch consistency. The biotransformation enzyme composition of HepaRG cells can be sustained over weeks (Guillouzo et al., 2007).
Human derived metabolically competent test systems are of particular relevance for human safety assessment since there are well described species differences in Phase I enzyme induction and metabolism (Martignoni et al., 2006, Kedderis andLipscomb, 2001), metabolic stability and metabolite identification , and in CAR, PXR and AhR receptor activation (Kretschmer and Baldwin, 2005;Kiyosawa et al., 2008;Köhle and Bock, 2009;Abass et al., 2012 andFujiwara et al., 2012).
Colour shading key: green: correct prediction (true positive or true negative); yellow: unconfirmed (no or unreliable or inconsistent in vivo data) or ambiguous; red: incorrect prediction. C. Bernasconi, et al. Toxicology in Vitro 60 (2019) 212-228

Conclusions and recommendations
The present validation study shows that cryopreserved PHH and cryopreserved HepaRG cells are reliable and relevant in vitro methods for the assessment of human CYP enzyme induction. These methods may play a role in regulatory risk assessment by contributing information on metabolism, thyroid disruption, or as indicators of nuclear-receptor mediated dysregulation of biochemical pathways. Assessing the toxicological relevance of the two methods, and in particular the more standardised HepaRG cell method, in specific regulatory assessment contexts was not within the scope of the present validation study. This should, however, be the focus of further investigations.
CYP induction is a nuclear receptor-mediated process and following AhR, PXR and CAR activation, xenobiotics may dysregulate an array of fundamental cell functions (Sueyoshi et al., 2014;Dingemans et al., 2016;Ovchinnikov et al., 2018;Hakkola et al., 2018;Sanders et al., 2005). CYP induction may therefore serve as a biomarker of nuclear receptor activation. The induction of Phase I enzymes in the liver is considered a potential key event in endocrine (thyroid) disruption in the recently published ECHA/EFSA Guidance (ECHA and EFSA, 2018). In particular, when there is evidence that these receptors are involved in pathways for which specific measurement methods are lacking (e.g. induction of Phase II enzymes for glucuronidation and sulfation; Kodama and Negishi, 2013), the two validated human CYP induction in vitro methods could be used as surrogates (EFSA, 2019). To study thyroid hormone metabolism following chemical exposure, a battery of validated in vitro methods is needed to investigate the effect of inhibition and induction of Phase I and Phase II biotransformation enzymes and the clearance levels of thyroid hormones (OECD, 2014;OECD, 2017 and Fig. 5) Following the analysis of data generated and additional peer reviewed evidence, the following recommendations are proposed for the practical conduct of CYP enzyme induction studies: a) CYP induction can be measured at a phenotypic level (i.e. enzyme activity), b) CYP enzyme induction should be measured in human derived metabolically competent test systems; c) cryopreserved HepaRG cells are comparable to cryopreserved PHH in predicting CYP enzyme induction, representing a substitute/complementary in vitro system for CYP induction studies; d) 2-fold induction is an acceptable threshold for positive identification of in vitro CYP inducers; and e) to reduce the risk of false positives, a concentration-dependent response (i.e. at least two consecutive concentrations generating 2-fold induction response) should be observed to classify a compound as an in vitro inducer.

Disclaimer
The views expressed in this article are those of the authors and do not necessarily represent the views or policies of the European Commission.