Profiling Chemicals Based on Chronic Toxicity Results from the U.S. EPA ToxRef Database

Background Thirty years of pesticide registration toxicity data have been historically stored as hardcopy and scanned documents by the U.S. Environmental Protection Agency (EPA). A significant portion of these data have now been processed into standardized and structured toxicity data within the EPA’s Toxicity Reference Database (ToxRefDB), including chronic, cancer, developmental, and reproductive studies from laboratory animals. These data are now accessible and mineable within ToxRefDB and are serving as a primary source of validation for U.S. EPA’s ToxCast research program in predictive toxicology. Objectives We profiled in vivo toxicities across 310 chemicals as a model application of ToxRefDB, meeting the need for detailed anchoring end points for development of ToxCast predictive signatures. Methods Using query and structured data-mining approaches, we generated toxicity profiles from ToxRefDB based on long-term rodent bioassays. These chronic/cancer data were analyzed for suitability as anchoring end points based on incidence, target organ, severity, potency, and significance. Results Under conditions of the bioassays, we observed pathologies for 273 of 310 chemicals, with greater preponderance (> 90%) occurring in the liver, kidney, thyroid, lung, testis, and spleen. We observed proliferative lesions for 225 chemicals, and 167 chemicals caused progression to cancer-related pathologies. Conclusions Based on incidence, severity, and potency, we selected 26 primarily tissue-specific pathology end points to uniformly classify the 310 chemicals. The resulting toxicity profile classifications demonstrate the utility of structuring legacy toxicity information and facilitating the computation of these data within ToxRefDB for ToxCast and other applications.


Research
The U.S. Environmental Protection Agency (EPA) and other regulatory agencies are inves tigating novel approaches to predict chemical toxicity, with the major goals being to enable the rapid screening of thousands of chemi cals that have not previously been character ized, to increase mechanistic understanding of chemical toxicity, and to reduce the number of animals required for toxicity testing. All of these goals initially require highquality in vivo toxicity data in order to test and validate these new approaches. To support U.S. EPA's ToxCast effort (Dix et al. 2007), we have created the structured and curated Toxicity Reference Database (ToxRefDB) to tabulate information from guideline in vivo toxicity studies. ToxRefDB and related databases will help support computational analysis and mod eling of the links from molecular interactions through cellular and organ phenotypes all the way to wholeanimal toxicity. This transfor mation of existing toxicity data will facilitate a transition to the National Research Council's (NRC) vision for Toxicity Testing in the 21st Century (Collins et al. 2008;NRC 2007). The NRC envisions a focus on toxicity pathways that will link molecular assays to toxicity out comes in humans and ecological species.
Traditional toxicity testing for risk assess ment of single compounds or limited groups of compounds can cost millions of dollars per chemical and years of effort. Since 1970, the U.S. EPA has accumulated a vast store of highquality regulatory toxicity information on thousands of compounds, most of which has been inaccessible for computational analy ses. The curation and structuring of chemical toxicity information into the readily accessible ToxRefDB have created a valuable resource for both retrospective and prospective toxicologic studies. ToxRefDB initially focused on captur ing developmental rat and rabbit, multigenera tion reproduction rat, and chronic/cancer rat and cancer mouse studies. In addition to the data model, we developed a detailed toxicity based controlled vocabulary for all the study types spanning clinical chemistry, pathology, reproductive, and developmental effects.
An important initial application of ToxRefDB is to provide anchoring of in vivo toxicity data for the U.S. EPA's ToxCast research program, which has been designed to address the agency's needs for chemical prior itization by using stateoftheart approaches in highthroughput screening (HTS) and toxicogenomics (U.S. EPA 2008b). Nearly all of the ToxCast phase I chemicals are food use pesticide active ingredients that have undergone a full suite of mammalian toxic ity tests, creating an unparalleled reference set of toxicologic information. The complete and highly standardized data set provided by ToxRefDB facilitates analysis of the ToxCast phase I chemicals across chemical, study type, species, target organ, and effect. Additionally, ToxRefDB serves as a model for other efforts to capture quantitative, tabular toxicology data from legacy and new studies and to make these data useful for crosschemical computa tional toxicology analysis.

Methods
Data characteristics. We collected reviews of registrantsubmitted toxicity studies, known as data evaluation records (DERs), for roughly 400 chemicals from the U.S. EPA's Office of Pesticide Programs (OPP) within the Office of Pollution Prevention and Toxic Substances (OPPTS). The file types of the DERs include TIFF, Microsoft Word, Word Perfect, and PDF formats, some of which are not directly textreadable. We indexed every DER file based on a file name convention that consisted of the pesticide chemical (PC) code, study identification number (MRID), study type identification number [based on 870 series OPPTS harmonized health effect guidelines (U.S. EPA 1996)], species code, review identification number (TXR), and a review version code. The latter code identi fied the review as a primary review, secondary review, supplemental review, updated execu tive summary, or a deficient review.
For the initial build of ToxRefDB, we collected and indexed a total of 4,620 DERs from OPP. These included five types of studies Background: Thirty years of pesticide registration toxicity data have been historically stored as hardcopy and scanned documents by the U.S. Environmental Protection Agency (EPA). A significant portion of these data have now been processed into standardized and structured toxicity data within the EPA's Toxicity Reference Database (ToxRefDB), including chronic, cancer, developmental, and reproductive studies from laboratory animals. These data are now accessible and mineable within ToxRefDB and are serving as a primary source of validation for U.S. EPA's ToxCast research program in predictive toxicology. oBjectives: We profiled in vivo toxicities across 310 chemicals as a model application of ToxRefDB, meeting the need for detailed anchoring end points for development of ToxCast predictive signatures. Methods: Using query and structured data-mining approaches, we generated toxicity profiles from ToxRefDB based on long-term rodent bioassays. These chronic/cancer data were analyzed for suitability as anchoring end points based on incidence, target organ, severity, potency, and significance. results: Under conditions of the bioassays, we observed pathologies for 273 of 310 chemicals, with greater preponderance (> 90%) occurring in the liver, kidney, thyroid, lung, testis, and spleen. We observed proliferative lesions for 225 chemicals, and 167 chemicals caused progression to cancerrelated pathologies. conclusions: Based on incidence, severity, and potency, we selected 26 primarily tissue-specific pathology end points to uniformly classify the 310 chemicals. The resulting toxicity profile classifications demonstrate the utility of structuring legacy toxicity information and facilitating the computation of these data within ToxRefDB for ToxCast and other applications. from a variety of species: developmental in rat and rabbit, reproductive in rat, subchronic in mouse and rat, and chronic or cancer in rat and mouse. Approximately 1,000 DERs pro vided chronic and cancer data, and we selected a subset of these for curation into the data base to yield data on 310 unique chemicals: rat chronic/cancer studies on 283 chemicals, and mouse cancer studies on 267 chemicals. Each study assessed a single technicalgrade chemical's toxicity potential in a single species and study type. The first portion of the DER outlines the test substance, purity, lot/batch numbers, MRID, study citation, OPPTS test guideline, and reviewers of the study. The execu tive summary captures all of the basic study design information, including species and strain, doses, number of animals per treatment group, and any deficiencies in study protocol.
Dose levels are listed in parts per million and through food consumption and body weight calculation or standard conversion as milligrams per kilogram body weight per day. Where possible, dose levels were listed as milli grams per kilogram body weight per day in ToxRefDB. The executive summary also describes adverse effects observed at all dose levels in the study. No observed adverse effect level (NOAEL) and lowest observed adverse effect level (LOAEL) are established based on adverse effects. The adverse effects used to derive NOAEL and LOAEL are referred to as "critical effects" in this article, regardless of their role in establishing reference dose levels in regulatory determinations for a chemical.
The body of the DERs provides detailed test material, animal information, and full dose-response data in text and tables for a variety of "effect types", including mortality, clinical signs, clinical chemistry, hematology, urinalysis, gross pathology, nonneoplastic pathology, and neoplastic pathology. For each effect type, we also specified an "effect target" (e.g., liver as target organ) and "effect descrip tion" (e.g., hypertrophy).
ToxCast phase I chemicals also included nonpesticidal chemicals such as perfluorinated compounds, phthalates, and other industrial chemicals. Although DERs and pesticide regis tration studies were not available for these chemicals, there were often highquality and standardized chronic and other types of toxicity studies available from the National Toxicology Program, peerreviewed literature, or other sources. We organized and evaluated data from these study reports and publications consistent with the information from the DERs.
Information on chemical identity and structure was provided by the U.S. EPA DSSTox (Distributed StructureSearchable Toxicity) program (U.S. EPA 2007). ToxRefDB outputs are linked to informa tion from other sources through the U.S. EPA ACToR (Aggregated Computational Toxicology Resource) database (Judson et al. 2008b;U.S. EPA 2008a). ACToR will also serve as the primary portal for public access to ToxRefDB and related outputs. ACToR stores the HTS data being generated by the ToxCast program and will link these HTS data with traditional toxicity data from ToxRefDB and other sources.
Relational model. In the development of ToxRefDB, a relational model approach was taken with input from other toxicity data base standards, including ToxML (Yang et al. 2006). The resulting data model is semi hierarchical in nature: a single compound can be tested in multiple studies, each study can contain multiple treatment groups, and mul tiple effects can be observed in each treatment group. The data model is organized from a chemicalcentric viewpoint to allow data inte gration and exchange with other data sources and to facilitate the linkage of the reference toxicity information to chemicalspecific data generated using in vitro technologies (i.e., ToxCast). The relational model was then implemented into a table structure with estab lished relationships that ensure data integ rity, update ability, and standardization [see Supplemental Material, Figure 1 (http://www. ehponline.org/ members/2008/0800074/ suppl.pdf]. Development of a toxicity-based controlled vocabulary. The development of a controlled vocabulary within ToxRefDB was neces sary for the standardization of data captured across various studies and study types per formed over roughly 30 years. The nonredun dant list of terms across various information domains provided data integrity and search ability. We based study type terminology on the unique study types harmonized by the Organisation for Economic Cooperation and Development and the OPPTS (U.S. EPA 1996). Specificstandardized terminology for study design was established for species/strain, method/route of administration, and units for dose and dosing duration. Treatment group related vocabularies were developed to estab lish the generation, gender, and dosing period.
A primary goal in evaluating the registrant submitted toxicity studies is to establish NOAEL and LOAEL values for a variety of categorical end points, including systemic, off spring, maternal, parental, developmental, and reproductive toxicity across the various study types. These categorical end points are captured and normalized across studies for each effect responsible for deriving the NOAEL/LOAEL. The development of a toxicologic effect vocabulary was approached in a domain specific manner. For example, we derived clinical pathology terms from OPPTS guide lines and collected clinical pathology labo ra tories and organ pathology terms from various public resources, including the National Toxicology Program's Pathology Code Tables (2007). The vocabulary under went further standardization by mapping all synonymous terms to a single nonre dundant value. We took a taxonomical approach for establishing the finalized effect vocabulary based on a threetiered hierarchi cal model, with the effect type at the top, followed by effect target and then effect description. Examples of effect type include clinical chemistry, hematology, urinalysis, body weight, mortality, gross pathology, nonneoplastic pathology, neoplastic pathol ogy, and developmental and reproductive effects. Subclasses of these types include spe cific target organs (e.g., liver, lung, spleen) or measured analytes (e.g., alanine aminotrans ferase, aspartate aminotransferase, choles terol). The specific combinations of effect type and target are then further subclassed based on a nonredundant descriptive term (e.g., increase, decrease, hypertrophy, atro phy). For organ pathology terms, each target organ has a set of regions, zones, and cell types that characterize the site of toxicity. The full effect vocabulary is available on the ToxRefDB home page (U.S. EPA 2008c).
Data input. The ToxRefDB Data Entry Tool was developed with Microsoft Access providing the user interface for all initial data input and is also available at the ToxRefDB home page (U.S. EPA 2008c). After the initial quality control (QC) steps discussed below, the data are migrated to ToxRefDB, which is implemented using the opensource MySQL platform. Data entry followed a series of pro tocols outlined in the ToxRefDB Standard Operating Procedure (SOP) documents that define mapping of toxicologic information to standardized fields, use of a standardized vocabulary, and extraction of biologically and statistically significant treatmentrelated effects.
Data QC and management. QC con sisted of 100% crosschecking of studies, systematic updates of ToxRefDB to ensure consistency across the studies, expert review of data outputs, and external review by stake holders. All data entered into ToxRefDB have undergone crosschecking, which entailed a second person validating each entered value based on the source information (primarily DERs). Systematic QC involved querying the database for potential inconsistencies (e.g., maleonly effects being assigned to female treatment groups, or systemic LOAEL being set at multiple dose levels) along with updat ing vocabularies and related records. Expert review was performed on data outputs of the chronic/cancer rat or mouse studies, includ ing all of the end points captured in the data tables of this publication. In addition to inter nal QC, an ongoing process allowing stake holders the opportunity to review ToxRefDB volume 117 | number 3 | March 2009 • Environmental Health Perspectives records is in place. The companies or regis trants that sponsor the data or support the registration of the chemical are reviewing the accuracy of the data relative to DERs and other risk assessment documents. To date, studies on 235 chemicals have been reviewed by registrants, and comments from these reviews indicate greater than 99% accuracy in capturing treatmentrelated effects from DERs. The stakeholder review process has facilitated additional information from addi tional studies, DERs, and other risk assess ment documents to be collected and entered into ToxRefDB.
Data output and analysis. The structured toxicity information stored within ToxRefDB can be extracted in various formats using MySQL queries. For the purpose of provid ing computable outputs, that is, quantita tive outputs amenable to statistical analysis, we used a consistent data output. The cross tabulated data output consisted of rows of chemical information (e.g., CAS registry number and chemical name) and columns of end points or effects, with the cross sec tion being the lowest dose at which the effect or end point was observed, that is, lowest effect level (LEL) in mg/kg/day. Even though NOAEL/LOAEL values can be queried from the database, the current analysis uses LELs, which do not reflect the NOAEL/LOAEL regulatory determinations derived from the studies and refer only to the minimum dose at which a specific effect or group of effects occurs. We used administered dose levels rather than molar concentrations to represent the chemically induced effects and end points, because of uncertainties in the pharmacokinetics linking administered dose to tissue concentrations reinforcing the fact that molecular weight alone cannot substi tute for dosimetry. Additional transformation of the dosing information was performed, including logbased and binning methods for potency. For example, we developed a bin ning method for illustrating relative potency to provide information into the sensitivity of the end point from the perspective of treat ment dose. To derive nonarbitrary dosing intervals, LEL for body weight changes were analyzed and separated into equivalent quin tile bins (data not shown). The resulting bins, ≤ 15, ≤ 50, ≤ 150, ≤ 500, and > 500 mg/kg/ day, were then applied to all end points. For instance, a chemical that caused liver hyper trophy at 5 mg/kg/day would be assigned a 5, at 25 mg/kg/day a 4, and so on. If the effect was not observed, then a zero was assigned.
Additionally, logtransformed potency values were derived using -log 2 of LEL. We used log 2 to reflect the minimal dose spacing, that is, doubling, typically used for in vivo toxi cology studies. A constant value of 12 was then added to zerocenter the data, allow ing for zero to represent no observed effect. Therefore, a value of 1 would be equivalent to an effect at 2,048 mg/kg/day and 18 would be equivalent to 0.015625 mg/kg/day. The resulting data formats are highly amenable to statistical data analysis, including descriptive and predictive datamining algorithms.
We carried out unsupervised twoway hier archical clustering across all chemicals of all effects with incidence greater than 5, as well as selected end points, based on logtransformed potency values using Pearson's dissimilarity measure for both chemicals and effects. This analysis used Ward's method for linkage (Ward 1963) and the agglomerative clustering method as implemented in the Partek Discovery Suite (Partek Inc., St. Louis, MO). In order to assess statistically significant species concordance across different effects, a permutation study was carried out. For each effect, the associa tion between chemical and effect for the cor responding rat and mouse study was randomly permuted 1,000 times. We recorded the cross species concordance for all simulations (per mutations) and compared it with the observed concordance, thus giving an estimate of the concordance due purely to chance. Analyses were carried out using R version 2.6.1 (Ihaka and Gentleman 1996).
An initial 10% incidence cutoff was used to filter out individual and groups of effects for potential use in predictive modeling. This cutoff was chosen following the results of a related simulation study that demonstrated high levels of sensitivity and specificity for various machine learning methods on data with at least a 10% hit rate for predicted end points (Judson et al. 2008a). For other appli cations, it may be useful to add less frequently occurring effects and end points.

Summary profiles of the ToxRefDB chronic/ cancer data set.
To date, ToxRefDB has captured in vivo mammalian toxicity study information from DERs for 411 conventional pesticide active ingredients. This present analysis focuses on the systemic toxicity and cancer end points culled from chronic/cancer rat or mouse studies on 310 of the chemicals entered into ToxRefDB. ToxRefDB enabled analysis to be performed along toxicologically related axes, including by chemical, study type, species, and effect. Study duration, dos ing methods, data quality, guideline adher ence, and sex were additional parameters for data filtering. In looking across all chronic/ cancer rat and mouse studies, we assigned   19,537 effects to 3,082 different treatment groups in a total of 577 studies on 310 chemi cals (Table 1). Effects are a combination of study type, species, effect type, effect target, and effect description for a given chemical, for example, chronic/cancer, rat, neoplastic pathology, liver, and adenoma. Across the 19,537 effects, 1,135 unique effects were observed, of which 484 were deemed criti cal effects, that is, criteria for establishing NOAEL/LOAEL, in at least a single study.
The ToxRefDB chronic/cancer data set on 310 chemicals contained approximately 20,000 observed effects in rat or mouse studies. We achieved a highlevel view of a subset of these data, and the relationships among chemical, effect, and potency, by unsupervised twoway hierarchical clustering of 207 rat ( Figure 1A) and 112 mouse ( Figure 1B) effects. For the rat, the 283 chemicals separated into seven dis tinct clusters or classes of the chemicals based on these toxicity profiles. Approximately 70 chemicals formed a cluster with an overall low incidence of toxicity, whereas the remaining chemicals displayed a unique set of toxicologic properties. More than 80 chemicals clustered as hepatotoxicants, and a subset of these also caused thyroid toxicity. Ten of the 15 conazole fungicides analyzed were in this hepatoxicity cluster. Clusters of chemicals exhibiting kid ney, spleen/anemia, or testicular toxicities were not enriched for a specific chemical structural class. Cholinesterase inhibitors clustered sepa rately from other chemicals and were enriched for organophosphates. In mouse, the 267 chemicals included clusters of cholinesterase inhibitors, spleen/anemia toxicants, and hepa totoxicants comparable with that observed for rat. Of the 112 total effects clustered in the mouse, 28 of these were liver toxicities, dem onstrating the predominance of the liver as a target organ in the mouse. The unsupervised clustering of rat and mouse effects identified concentrations of effects and chemicals that were emphasized in subsequent, expertdriven approaches to chemical classification.
Toxicity-based classification of chemicals. The distribution of effects across effect types (Figure 2A) revealed that nonneoplastic pathologies dominate determination of sys temic NOAEL/LOAEL, demonstrating the potential importance of this class of effects or end points to chemical regulation. The percentage of chemicals positive for an end point in both rat and mouse, over the total positive for the same end point in only the rat or mouse, was defined as "species concor dance." Species concordance for nonneoplastic pathology was 68%. Of the 167 chemicals that caused neoplastic lesions in rat or mouse chronic/cancer studies, 35% caused neoplastic lesions in both rat and mouse. We observed one or more pathologies in 273 of the 310 chemicals. The incidence of pathologic response, analyzed by target organ and species, was used to identify target organs for further investigation ( Figure 2B). More than 90% of those 273 chemicals caused pathologies in the liver, kidney, thyroid, lung, testis, or spleen.
Whereas individual effects relating to highly detailed pathologic outcomes would provide classifications with the highest bio logical specificity, the limitations of classifying chemicals based solely on specific individ ual effects was apparent early in the analysis of ToxRefDB data. Only 11 specific, indi vidual pathologic effects were observed for more than 10% of the chemicals (Table 2).
Liver hypertrophy is the only common effect across both species based on a 10% inci dence cutoff. In addition to low incidences of detailed pathologic effects, biases based on study design and pathology nomenclature limited the overall ability to compare chemi cal toxicities when we used individual effects. Grouping or aggregating related or nearsyn onymous terms, such as liver adenoma, com bined adenoma/carcinoma, and carcinoma, resulted in more informative and statistically powerful sets of effects. Thus, the limitations of classifying chemicals based solely on spe cific individual effects were addressed by cre ating biologically related groupings of effects.
Grouping tumor end points and extending to include proliferative lesions. This aggre gative approach was illustrated by creating groups of neoplastic end points and the exten sion of these groups to include nonneoplastic proliferative lesions. The aggregation of neo plastic effects for each target organ resulted in an increase in the number of useful group ings beyond the individual mouse liver tumor effects shown in Table 2. However, the end points were still limited to mouse liver and rat thyroid neo plasia, based on an initial > 10% incidence cutoff. Associating the neoplastic end points with proliferative lesions increased the number of target organs to include liver, kidney, thyroid, lung, and testes. In general, only neoplastic lesions are considered indica tive of rodent carcinogenicity. However, including nonneoplastic proliferative lesions provides a conservative model for assessing and predicting rodent tumorigenic poten tial, based on the assumption that prolonged proliferative response leads to eventual tumor formation. A simulation study was performed to assess whether the concordance between rat and mouse effects occurred at a rate greater than chance across neoplastic and prolifera tive classifications. Extending tumori genicity groupings to include proliferative lesions significantly increased species concordance across numerous target organs, including the liver and kidney [see Supplemental Material,

Mapping of toxicity end points to a cancer progression schema.
Relationships between effects and the relative severity of those effects are not inherent to the database structure. Figure 3A presents a conceptualization of the end point progression schema in which chemi cals were scored from 0 to 5 for each target organ, based on the severity of the effect, rang ing from no observed pathology (0) to neoplas tic lesions (5). Endpoint progression scoring reduced the possible chemical classifications to a single ordinal score (i.e., scores 0-5) for each target organ. Figure 3B presents the dis tribution of endpoint progression scores for rat and mouse, liver and kidney. Examples of the impact of this scoring system include resmethrin, which caused treatmentrelated increases in a preneoplastic lesion (i.e., hyper plastic nodules) in the liver without progressing to a tumor. In contrast, metaldehyde caused treatmentrelated increases in liver tumors but was not identified as causing any preneoplas tic lesions, even though preneoplastic lesions can be assumed to have occurred as a precur sor event to liver tumor formation. Using the endpoint progression scoring system allowed reasonable comparison of these two chemicals, if desired, by linking the preneoplastic score of 4 for resmethrin, to the neoplastic score of 5 for metaldehyde, along the continuum of endpoint progression. The incidence of liver pathology between rats and mice was com parable when we grouped endpoint progres sion scores. More than 50% of the chemicals tested resulted in a range of nonneoplastic to neoplastic lesions (i.e., scores 2-5). However, the relative severity for liver pathologic pro gression in mice was higher than in rats: 25 chemicals caused rat liver tumors, whereas 80 chemicals caused mouse liver tumors.
Selected end points for predictive modeling. In addition to end points specific to various target organs, chemicals were classi fied with respect to multigender, multisite, or multispecies tumorigenicity (Table 3). Of the 310 chemicals in the chronic/cancer data set for which 240 chemicals were tested in both species, 167 chemicals were classified as tumorigens; 109 of those chemicals were multi gender, multisite, or multispecies tumori gens. Of the 283 chemicals tested in the rat, 42 chemicals were classified as multigender and multisite tumorigens. Of 267 chemicals tested in the mouse, 57 and 25 chemicals were classified as multigender and multisite tumori gens, respectively. Of the 240 chemicals tested in both species, 49 chemicals were classified as multispecies tumorigens. The distribution of relative potency values indicated that the rat was commonly more sensitive than the mouse for multigender and multisite tumorigenicity. In the rat, 38% of the multi gender and 45% of the multisite incidences were at ≤ 50 mg/ kg/day (i.e., relative potency values of 4-5), compared with 23% and 28% in the mouse. Conversely, 39% multigender and 28% multi site tumorigenicity occurred in the mouse at > 500 mg/kg/day (i.e., relative potency value 1), compared with 17% and 10% in the rat. Multispecies tumori genicity was not achieved at doses ≤ 15 mg/kg/day, and 41% of inci dences occurred at > 500 mg/kg/day. Unsupervised and expertdriven approaches to endpoint selection and subsequent chemical classification yielded near identical sets of target organs from which to select specific effects or aggregated effects. Based on incidence, severity, potency, and significance, 25 end points from chronic/cancer rat and mouse studies were selected for subsequent ToxCast predictive modeling ( Figure 4A). The addition of multi species tumorigens raised the total to 26 end points, each caused by 20 or more chemicals. Besides the multispecies tumorigen end point, 16 of the end points were from rat studies and 9 end points were from mouse. The same four end points were characterized in both rat and mouse liver, affording direct comparisons across species for tumors, proliferative lesions, apoptosis/necrosis, and hypertrophy. The only other frequent target organ common to both species was the kidney. Frequent ratspecific target organs included thyroid, testis, and spleen, whereas the only target organ specific to mouse was the lung. Unsupervised hierarchical clustering of the 16 rat end points ( Figure 4B) and the 9 mouse end points ( Figure 4C) dis played the relative distribution of the selected end points and chemicals. Of the 283 chemi cals with a rat chronic/cancer study, 218 were positive in at least one of the selected end points, whereas 155 of 276 chemicals with a mouse cancer study were positive in at least one selected end point. Rat and mouse end points clustered primarily by target organ, with distinct clusters of thyroid, spleen, kidney, and liver toxicants in the rat. The high incidence of liver tumorigens in the mouse drives chemical groupings. However, chemicals causing or not causing liver hypertrophy and necrosis appear to segregate into two large groups of liver toxi cants. In both species, the selected chronic/can cer end points represent the robust patterns of toxicologicresponse shown in Figure 2A and B. A full listing of the chronic/cancer end points derived from ToxRefDB for ToxCast predic tive modeling, with their associated LELs, logtransformed potency, and relative potency values, are available on the ToxRefDB home page (U.S. EPA 2008c).

Discussion
Advancing alternative testing methods for assessing chemical safety requires an informed transition from the current toxicity testing to systems that are higher throughput, more pre dictive, and not as dependent on the extensive use of animals. To support this transition, we created ToxRefDB to capture a rich set of existing in vivo laboratory animal toxicity data on a group of environmentally relevant, wellstudied chemicals. Pesticide active ingre dients have comprehensive toxicity profiles that are opportune data sets for creating a bridge from in vivo to in vitro toxicology. ToxRefDB digitizes and stores toxicity data This schema was used to derive a severity score for each chemical based on the maximum value within a target organ. (B) Based on end-point progression, 310 chemicals were scored for liver and kidney pathology in rat and mouse chronic/cancer studies. Clinical chemistry used in this analysis is limited to target-organ-specific analytes (e.g., alanine aminotransferase for liver, and urea nitrogen for kidney). No. of chemicals in a structured and searchable format, and using structured data mining methods makes these data a computable resource for predic tive toxicology efforts such as the U.S. EPA's ToxCast program (U.S. EPA 2008b). Individual toxicity effects based on unique type, target, and description yielded only a small number of in vivo end points across a significant number of chemicals supportive of robust pre dictive modeling. However, grouping effects by effect type and target often collapsed hundreds of individual effects into a single end point, common to dozens of chemicals. The goal was to strike a balance between maintaining biologi cal specificity across a group of related effects while increasing total incidence for effects across a critical mass of chemicals. For example, extending tumor end points to include prolif erative lesions increased not only total incidence but also species concordance and thus increased confidence in characterizing a chemical's poten tial toxicity. Grouping proliferative lesions also addressed other potential factors, such as changes in pathology nomenclature over time (Wolf and Mann 2005) and reporting incon sistencies. Deriving end points based on groups of effects yielded organ and speciesspecific end points in the liver, kidney, thyroid, testis, spleen, and lung in rats or mice with a high enough incidence across ToxRefDB chemicals to support predictive modeling.
Another approach for addressing the limitations of profiling chemicals based on  individual toxicity effects was to compare the severity of these effects across a continuum of pathophysiology. Because the progression to cancer (Hanahan and Weinberg 2000) and organspecific progression to tumori genicity (Cohen and Arnold 2008) have been well characterized, we created a fivepoint sever ity scoring system to encode this. Using this approach, ToxRefDB provides a quantitative value associated with the key events in the progression to tumor formation and cancer. Incorporating additional information on the severity of in vivo effects in ToxRefDB may be fruitful in future modeling and predictive toxi cology efforts. Additional data not cur rently in ToxRefDB, including incidence data, would have to be added for more detailed dose-response analyses and assessment of the magnitude of change for specific effects. Because many of the tumors caused by chemical exposure in ToxRefDB occur at high doses that are many orders of magnitude removed from potential human exposures, it is useful to also consider multigender, multi site, and multispecies tumorigenicity in the course of evaluating chemicals. Current U.S. EPA cancer risk assessments use multisite and multispecies tumorigenicity as indicators of increased significance for tumor findings (U.S. EPA 2005). Thus, the tumorigenic end points selected for ToxCast predictive modeling included multigender, multisite, and multi species tumorigens. Additional analyses of these multiplicities in the tumorigenicity data of ToxRefDB are under way, with the goal of improving hazard assessments, chronic/cancer study protocols, and future data requirements.
Success in predicting targetorgan-specific effects in ToxCast will depend on numerous factors, including the target, species, and dose response of the effects that are being predicted. In the present analysis of ToxRefDB, we iden tified effects in the liver, kidney, thyroid, tes tis, spleen, and lung in rats or mice that we will now attempt to predict using in vitro data from ToxCast. Because species concordance of the in vivo effects in ToxRefDB was fairly limited, success in predicting speciesspecific versus multispecies effects will be an interest ing outcome of ToxCast. The dose responses for selected end points are also provided by ToxRefDB, including logtransformed potency values conducive to computational analysis, and relative potency values that facili tate comparisons across chemicals and end points. These quantitative data should facili tate development of new in vitro and in silico methods to predict in vivo chemical toxicity.
Although numerous studies have evalu ated the use of biochemical, cellbased, and genomic assays to build predictive models of toxicity, these efforts have usually been lim ited to only a partial view of the complex biol ogy underlying tissue, organ, or wholeanimal toxicity. By probing such a broad spectrum of biology in the hundreds of ToxCast assays, the "toxicity signatures" will be optimally pre dictive and representative of a broad range of in vivo toxicity end points. A variety of statistical techniques and machine learning approaches will be used to mine this com plex data set for toxicity signatures with high sensitivity and specificity. These include linear discriminant analysis, support vector machines, and neural networks. In addition to these automated approaches, more hypoth esisdriven, biologically based signatures will assist in filling the large gap between molecu lar and phenotypic end points. It is expected that assays of multiple types, probing multiple pathways, will be required to predict in vivo toxicity across a wide range of chemicalsthis is the approach taken within ToxCast and ToxRefDB.
ToxRefDB continues to develop, add ing toxicity end points from additional study types, including multigeneration reproductive and prenatal developmental tests, for predictive modeling in the ToxCast research program. Besides expanding toxicity coverage to other study types, ToxRefDB will expand in chemi cal coverage to include more non pesticide chemicals. As each of these ToxRefDB data sets pass through U.S. EPA quality and clear ance processes, they will be made publicly available through peerreviewed publications, ToxRefDB home page, and ACToR. The con tents of the entire database will be viewable and searchable in the future through a Webbased  ToxRefDB offers unparalleled amounts of legacy toxicity information on environ mental chemicals captured in a structured format, providing a platform for repeated and updated chemical characterizations. Creating the ability to search and filter across 30 years' worth of toxicity data required extensive amounts of data normalization, annotation, and curation and was made possible through the development of a robust standardized vocabulary for the fields and data elements within ToxRefDB. In the present study, we used chronic toxicity data in ToxRefDB to derive toxicity profiles for the ToxCast phase I chemicals, yielding a set of toxicitybased and predictable end points. In future applications of ToxRefDB, researchers, risk assessors, and regulators will use the database for retrospec tive and modeling projects looking across a large landscape of chemical and toxicity space.