Overview of an interlaboratory collaboration on evaluating the effects of model hepatotoxicants on hepatic gene expression.

DNA microarrays and related tools offer promise for identification of pathways involved in toxic responses to xenobiotics. To be useful for risk assessment, experimental data must be challenged for reliability and interlaboratory reproducibility. Toward this goal, the Hepatotoxicity Working Group of the International Life Sciences Institute (ILSI) Health and Environmental Sciences Institute (HESI) Technical Committee on Application of Genomics to Mechanism-Based Risk Assessment evaluated and compared biological and gene expression responses in rats exposed to two model hepatotoxins--clofibrate and methapyrilene. This collaborative effort provided an unprecedented opportunity for the working group to evaluate and compare multiple biological, genomic, and toxicological parameters across different laboratories and microarray platforms. Many of the results from this collaboration are presented in accompanying articles in this mini-monograph, whereas others have been published previously. (Italic)In vivo(/Italic) studies for both compounds were conducted in two laboratories using a standard experimental protocol, and RNA samples were distributed to 16 laboratories for analysis on six microarray platforms. Histopathology, clinical chemistry, and organ weight changes were consistent with reported effects. Gene expression results demonstrated reasonable agreement between laboratories and across platforms. Discrepancies in expression profiles of some individual genes were largely due to platform differences and approaches to data analysis rather than to biological or interlaboratory variability. Despite these discrepancies there was overall agreement in the biological pathways affected by these compounds, demonstrating that transcriptional profiling is reproducible between laboratories and can reliably identify affected pathways necessary to provide mechanistic insight. This effort represents an important first step toward the use of transcriptional profiling in risk assessment.

The liver is the major organ involved in drug and toxicant metabolism and as such is one of the main targets for the adverse effects of such substances (Mitchell et al. 1976;Zimmerman 1976). Hepatic toxicity is the most common single adverse effect leading to label warnings, use restrictions, and market withdrawals for pharmaceuticals (CDER-PhRMA-AASLD 2000). Consequently, evaluating the toxicity of new drugs and chemicals involves assessment of the livers of exposed animals for adverse end points. These end points are biological markers (biomarkers) such as histopathology and measuring of serum chemistry parameters and are used to quantitatively measure any deleterious effects of a toxicant to an organism or individual. Although traditional experimental approaches and biomarkers used for preclinical safety assessment can reveal dosedependent hepatic toxicity in animal models, they rarely provide an indication of toxic mechanism and are equivocal predictors for potential human responses.
As a result of recent technological advances in molecular biology, one of the more important developments in the biomarker field has been the realization that gene expression profiles obtained using microarrays may be useful biomarkers for evaluating toxicity in animal models. To date, specific gene expression profiles have been evaluated for several types of toxicological studies including a) identifying exposure to specific chemical classes such as aryl hydrocarbon receptor agonists, cytotoxic anti-inflammatory agents, DNAdamaging agents, enzyme inducers, hypoxia-inducing agents, noncoplanar polychlorinated biphenyls (PCBs), and peroxisome proliferators (Burczynski et al. 2000;Hamadeh et al. 2002a; Thomas et al. 2001); b) identifying toxic end points, for example, histopathology, clinical chemistry (Waring et al. 2001b); c) predicting or classifying exposures that later produce a toxic outcome (Kier and Nolan 2003); and d) identifying mechanisms of toxicity (Waring et al. 2001a). Thus, it is apparent that gene expression profiles can play a key role in the risk characterization component of the risk assessment and management process.
The use of genomic biomarkers to evaluate toxic effects has been termed "toxicogenomics." Although originally viewed as the use of genomic data to interpret and understand toxicological findings, the definition of toxicogenomics has gradually evolved to encompass other fields. One commonly used definition of toxicogenomics is as follows: a scientific field that elucidates how the entire genome is involved in biological responses of organisms exposed to environmental toxicants/ stressors. Toxicogenomics combines information from studies of genomic-scale mRNA profiling (by microarray analysis), cell-wide or tissue-wide protein profiling (proteomics), genetic susceptibility, and computational models to understand the roles of gene-environment interactions in disease. (National Center for Toxicogenomics 2000) To identify and address some of the issues, challenges, and opportunities encountered by the use of toxicogenomics in the context of risk assessment, the Health and Environmental Sciences Institute (HESI) of the International Life Sciences Institute (ILSI, http://hesi.ilsi. org/) formed a project committee to develop a collaborative scientific program to study these areas [see Pennie et al. (2004)

Overview of an Interlaboratory Collaboration on Evaluating the Effects of Model Hepatotoxicants on Hepatic Gene Expression
of Genomics to Mechanism-Based Risk Assessment was divided into several work groups conducting large-scale cross-laboratory studies in hepatotoxicity, nephrotoxicity, and genotoxicity.
The Hepatotoxicity Working Group (HWG) comprised representatives from 25 companies or subsidiaries, the U.S. Environmental Protection Agency, U.S. Air Force, the National Institute of Environmental Health Sciences (NIEHS) and the University of Surrey in England (Table 1).
The goal of the HWG was to evaluate and compare biological, gene, and protein expression responses in rats exposed to well-studied hepatotoxins, using a standard experimental protocol and to address the following questions: • How comparable are the biological and gene expression data from different laboratories conducting in vivo studies? • How reproducible is the data generated across laboratories using the same microarray platforms?
• How do data compare using different microarray platform? • How do data compare using RNA from pooled and individual animals? • Do the gene expression changes demonstrate time/dose-dependent responses that correlate with known biological markers of toxicity? To this end, members of the HWG developed standard experimental protocols for two prototypical hepatotoxicants: methapyrilene and clofibrate. Methapyrilene, a histamine H1 receptor blocking agent (Noguchi 1992), produces periportal necrosis in rats (Steinmetz et al. 1988), and is a nongenotoxic hepatocarcinogen in rats (Mirsalis 1987). The hypolipidemic agent clofibrate signals through the peroxisome proliferator-activated receptor-alpha (PPAR-α) to induce the proliferation of hepatic peroxisomes [reviewed by Greene (1995)]. Clofibrate produces hepatomegaly and is a nongenotoxic hepatocarcinogen in rodents (Reddy and Qureshi, 1979;von Daniken et al. 1981).

Experimental Overview
Pilot study. Before conducting full-scale studies, we conducted a pilot study for an estimate of interlaboratory variation in results from the same RNA sample using a single microarray platform. A reference RNA sample was generated from the liver of male Sprague-Dawley rats treated with the high dose of clofibrate (250 mg/kg/day) or vehicle control for 3 days at Abbott Laboratories(Abbott Park, IL). RNA samples from three individual animals from each group were pooled and distributed to five laboratories for analysis using Affymetrix (Santa Clara, CA) microarrays. Analysis of the data using multidimensional scaling indicated that the results tended to segregate by site of analysis ( Figure 1). Nevertheless, profiles within each laboratory were always quite distinct between treated and control samples and on the whole, data from control samples tended to segregate away from the data from treated samples. This pilot study served as a basis for full-scale studies.
Full-scale studies. The in vivo dosing phase for the full-scale studies was  ; Cont, pooled RNA from control animals; Trt, pooled RNA from treated animals. In a pilot study, microarray data using Affymetrix microarrays were generated at five different laboratory sites using RNA pooled from three control rats compared with RNA pooled from three clofibrate-treated rats. This figure represents a single multidimensional scaling analysis of these data, and illustrates clustering (dashed ovals) of the control and treated samples by laboratory. Clustering in this manner was thought to be due to technical variations between sites rather than biological differences and provided a preliminary look at the difficulties to be encountered in the fullscale studies. However, within all-site-clusters there was a clear distinction between RNAs derived from control and treated animals, and all sites showed the same biological effect of treatment in the form a shift down the y axis (arrows). conducted in two different laboratories for each test compound, using identical protocols and test compound. Further details of these studies are provided in the articles by Baker et al. (2004); Chu et al. (2004); and Waring et al. (2004) elsewhere in this mini-monograph. Methapyrilene was administered at 0, 10 or 100 mg/kg/day by gavage for 1, 3, and 7 days (Abbott Laboratories; Boehringer-Ingelheim Pharmaceuticals, Inc., Ridgefield, CT). Clofibrate was administered at doses of 0, 25, or 250 mg/kg/day by gavage for 1, 3, and 7 days (Abbott Laboratories; GlaxoSmithKline (GSK; Hertfordshire, UK). For both compounds, the high dose levels were selected based on reports indicating that they would elicit hepatotoxicity without the introduction of secondary pathologies such as inflammation that that might severely influence gene expression data (clofibrate, Karbowski et al. 1999;methapyrilene, Graichen et al. 1985). The low doses were selected as one-tenth of the hepatotoxic dose and at which no gross hepatotoxic effect was anticipated. Total RNA samples from treated and control animals (both pooled (n = 4) and individual) were distributed to members of the HWG for analysis on different microarray platforms (Table 2). Clofibrate samples were analyzed at 16 sites using six different platforms; methapyrilene samples were analyzed at 12 sites using six different platforms. Because most research groups analyze microarray data differently using different algorithms, each site was requested to analyze data according to its own in-house procedures. qRT-PCR analysis. To investigate the source of certain discrepancies in direction of gene change between two different microarray platforms (Affymetrix and Incyte [Palo Alto, CA]), quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) and in silico sequence analyses were performed for selected genes. Further details are provided in the article by Goodsaid et al. (2004) in this minimonograph.

Overview of Findings
The specific details of one aspect of the HWG study (Hamadeh et al. 2002b) have already been published, whereas several others appear in this mini-monograph (Baker et al. 2004;Goodsaid et al. 2004;Chu et al. 2004;Waring et al. 2004). It is anticipated that additional reports will also be generated from our consortium when final experiments and data analyses have been conducted. In reviewing the experimental data published previously and in this mini-monograph, we can make a number of general comments and conclusions regarding the outcome of the work group's studies.
In vivo studies. For clofibrate, study results measured by traditional parameters were as expected and in agreement with those already published, although differences in the levels of biological response induced in the two separate in vivo studies for clofibrate were suggested by variance in the percent liver weight increases. At day 3 the increases in liver weight of treated animals compared with those in control animals were 15 and 3%, and at day 7 they were 31 and 11% for the GSK and Abbott Laboratory studies, respectively. Differences in biological response were also supported by observed differences in clinical chemistry parameters and in the level of upregulation of two well-characterized markers of clofibrate exposure, namely, cytochrome P450 (CYP)4A1 and acyl-CoA oxidase as measured on the Affymetrix platform. However, other markers such as acyl-CoA hydrolase, acyl-CoA transferase, and ApoA-IV, were similarly regulated in the two studies, and enzymatic content of acyl-CoA oxidase was increased (though to somewhat different extents) in both studies (Figure 2).
Histopathological and clinical chemistry observations on the methapyrilene-exposed animals were consistent with expected toxicities associated with methapyrilene treatment. Specifically, significant increases of enzyme levels (asparate aminotransferase, alanine aminotransferase, sorbitol dehydrogenase  were observed in the high-dose groups (100 mg/kg/day) at all the time points (1, 3, and 7 days) and drug-related microscopic changes, including portal mononuclear infiltrate, periportal hydropic degeneration, periportal hepatocellular necrosis, and bile duct hyperplasia, were observed in the livers of male rats treated with methapyrilene. Microarray gene expression analysisclofibrate studies. Differences in the probe sets and annotations across platforms made precise cross-platform comparisons challenging. However, changes in similar biological pathways were indicated on each of the platforms run by the working group members. In general, a dose-related response was observed, with the low dose demonstrating little deviation from control levels of gene expression and evaluation of differential gene expression from pooled liver samples from rats treated with high-dose (250 mg/kg/day) clofibrate for 1, 3, or 7 days, demonstrating typical PPAR-α-mediated responses associated with established literature for this class of compound (Amacher et al. 1997;Corton et al. 2000;Gerhold et al. 2001;Latruffe et al. 2001;Reddy et al. 1986). Affected pathways included fatty acid metabolism (e.g., acyl-CoA oxidase), cell proliferation (e.g., topoisomerase II-α) and fatty acid oxidation (e.g., CYP4A1). Gene expression at the high dose supported the interpretation of a burst of cell proliferation, and DNA repair during the first 3 days of highdose exposure followed by a low level of oxidative stress associated with increased β-oxidation of fats (mitochondrial and peroxisomal), increased peroxisome proliferation, and protein folding stress responses that partially but not completely subside with continued dosing.

Days of treatment
The upregulation of a variety of cell proliferation-associated genes began on or before day 1 and peaked at some point between days 3 and 7. By day 7, cell proliferation genes were downregulated. The chronology of these gene expression changes agrees well with the histologic diagnoses of mitotic figures in the tissue [see Baker et al. (2004) in this mini-monograph).
The Affymetrix microarray platform (specifically the RGU34A expression probe array containing 8,799 probe sets from the rat genome) was the most commonly used platform for analyzing the clofibrate RNA samples. Comparisons were thus conducted for the Affymetrix data a) between the RNA samples generated from a single in-life study, and b) across all the RNA samples. Across all probes sets, agreement between contributors and within samples from a single in vivo study was good, with greater than 92% concordance for samples originating from one of the in vivo study laboratories, and greater than 96% for samples from the other laboratory. To be considered "concordant," a probe set showed either no change or no discrepancy in the direction of change for all data sets. Less apparent agreement was observed for changes above or below a certain threshold of fold-change for all data sets. For example, only four probe sets were identified as upregulated > 5-fold in all data sets on samples from a single in vivo laboratory. This result may be explained in part by the use of different data capture algorithms by the contributors. It can also be attributed to the somewhat arbitrary nature of selecting a particular fold-change cutoff level where pooled samples were used. This is an inescapable outcome when data are presented that are derived from pooled sample comparisons, as the number of biological replicates in a pool is effectively one, making statistical analysis impossible. In addition, cutoff criterion might be too stringent, as a larger number of probe sets were regulated in the same direction across the data sets regardless of thresholds.
Microarray gene expression analysismethapyrilene studies. In the comparison of RNA samples from methapyrilene-treated rats, analyses were performed at five sites using an Affymetrix platform and at five sites using either a membrane or glass slide cDNA microarray platform. Agreement was found across all platforms with reasonable but varying degrees of congruence in the results. These findings are discussed in more detail in an article published on cDNA platform results (Hamadeh et al. 2002b) and in this mini-mongraph in an article on Affymetrix array results . In general, these two articles show that analysis of RNA samples from separate studies conducted in different laboratories and analyzed by different methods ultimately revealed the same affected biological pathways and hence the same risk determinants.
Results of gene expression analysis on the NIEHS cDNA platform of RNA from livers of rats treated with the high dose of methapyrilene for 7 days at both of the in vivo study sites showed good agreement. Scientists from NIEHS also reviewed the microscopic observations resulting from methapyrilene treatments and collected additional data, such as serum enzyme levels and body and organ weights. The microscopic data were entered into a numerical model, the output of which was compared with the clustering outcomes of the gene expression data showing remarkable agreement between the outputs of the two analyses. Finally, analysis of results generated from Affymetrix arrays [see Waring et al. (2004) in this mini-monograph] showed that low-dose-treated animals could be distinguished from high-dose and control animals, a differentiation that could not be made based on histology.
qRT-PCR studies. In a small number of cases, contradictory data were obtained from different platforms running samples from the clofibrate study. Two of these discrepancies (Caldesmon and Pctaire) were investigated further using in silico sequence analysis and qRT-PCR analyses of the genes. In both cases, cDNA platform results showed decreases in expression levels for the 7-day high-dose pooled samples compared with those for controls. In contrast, the Affymetrix platform showed increases in expression levels compared with those for controls. The outcome of the studies, which are described in detail by Goodsaid et al. (2004) elsewhere in this mini-monograph, confirmed that the source of the discrepancies was errors in the sequences of the gene as defined by UniGene (http://www.ncbi.nih.gov/entrez/ query.fcgi?db=unigene).

Commentary
The Hepatotoxicity Working Group selected two well-characterized chemicals for analysis, as this would permit us to determine whether the in vivo exposures had been effectual and allow the group to corroborate clinical, histopathological, and gene expression data by comparison with published studies. Histopathology, clinical chemistry, and organ weight changes were indeed consistent with previously reported effects of these agents, albeit with some differences between sites, as evidenced by discrepancies in liver weights and other clinical chemistry parameters. Analysis of gene expression also indicated that changes in mRNA levels for known transcriptional reporters for these compounds had occurred, although there were discrepancies in some individual genes [as discussed, for example, in the accompanying article by Waring et al. (2004)]. The discrepancies between sites and between platforms observed for individual transcripts could be attributed mostly to differences in RNA labeling protocols, sequence differences on different platforms, and statistical analysis tools. qRT-PCR studies demonstrated that selection and characterization of correct probes on microarrays are essential if microarray data are to be perceived as being robust and consistent [as discussed in the accompanying article by Goodsaid et al. (2004]); such discrepancies will be alleviated with time as the accuracy of gene sequences and annotations evolves. It will also be important to understand the relationship between transcriptional events and changes in protein expression; proteome analysis on samples derived from this collaboration is currently in progress. Perhaps more important than individual genes, however, is the accurate identification of affected biological pathways. This outcome, which was consistent between laboratories, is essential for the application of toxicogenomics in the understanding of toxic mechanisms. Thus, to the extent that data analysis has been completed, this collaboration has revealed that RNA samples generated from rat studies conducted for the same compound in different laboratories and analyzed using different microarray platforms can yield comparable results regarding the affected biological pathways and key pathway-associated genes. It is not yet clear, however, how these data can be applied to risk assessment; this is the ultimate aim of the HESI Technical Committee on the Application of Genomics to Mechanism-Based Risk Assessment.