A proposed method for assembly and interpretation of short-term test data.

The genetic toxicology databases for chemicals that have been tested extensively are generally composed of inconsistent responses from a diverse set of assays. Consequently, difficulties arise when the data are evaluated for classifying the agent or for assessing the chemical's hazard potential. Several years ago, the International Commission for Protection against Environmental Mutagens and Carcinogens (ICPEMC) established a committee to construct a process for compiling and interpreting diverse data sets. The Committee has developed a weight-of-evidence approach that combines test data into a series of scores for test type, class, family, and a consensus score defining the relative mutagenic activity of the agent compared with other chemicals in the database. This report describes the method and preliminary results from 113 chemicals.


Introduction
Committee 1 ofThe International Commission for Protection against Environmental Mutagens and Carcinogens (ICPEMC) was established in L979 to review the status ofshort-term tests for mutagenicity and the degree to which these tests are concordant with results from three mammaLian in wvo tests (dominant letal, heritable translocation, and specific locus) measuring germ cell damage (1). The mission ofcommittee 1 was broadened in 1983 to develop, ifpossible, a method that would integrate and interpret results from heterogeneous data typical ofmutagenicity test batteries.
Committee 1 members began with a weight-of-evidence scheme proposed by Brusick (2). This system was based on a method of weighted averages of both positive and negative test results from a battery consisting of both in vitro and submammalian assays. Allthough the committee retained the weight-ofevidence portion of the approach, range of assays and the mechanics ofdata handling for the current method have evolved substantially.
There were three primary objectives that committee 1 set out to accomplish in the design of a data analysis method. The first goal was to develop a method that would extend the use of a database beyond listing tests and results. For mutagenicity there was a need for a process to assemble the test results for a chemical in a manner that would produce a consensus regarding the mutagenic activity ofthe agent. The second goal was to use the results ofthe evaluations to rank chemicals and compare that rank order with other properties ofthe same chemicals such as cancer or germ cell mutation. The third goal was to use the data analysis with a large database to understand mutagenicity tests and their relationships to each other and to chemicals and chemical classes. *Hazleton Washington, Inc., 9200 Leesburg Pike, Vienna, VA 22182.

Comparison with Other Methods of Data Analysis
Several other investigators have developed or proposed approaches to accomplish many ofthe objectives stated above. One of the earlier uses of the data in this manner was proposed by Squire (3) in which he suggested a semiquantitative approach that estimated carcinogenic potential using a point system for various characteristics ofa chemical. Mutagenicity was highest weighted ofall components ofhis carcinogen prediction scheme.
In the mid-1980s, Waters et al. (4) developed a linear profile of mutagenic activity that illustrated the positive and negative results for all tests conducted on a chemical (Fig. 1). This plot, identified as a Genetic Activity Profile (GAP), has undergone several improvements and is currently available with an extensive database on PC-based software (4). GAPs facilitate direct comparison of test responses for chemicals of similar classes and/or structural relatedness.
Other investigators have attempted to used statistical (5) and structure-activity analyses (6) (4). Only 85 tests are used. Critrion has been that in the total database a test had to be used for at least five chemicals. bAll strains of Salmonella included. The highest dose negative or lowest dose positive in any one of the strains involved in one entry is taken.
test batteries for detecting mutagenic carcinogens. Parodi et al. (7) have proposed a method using several parameters to predict both qualitatively and quantitatively the carcinogenic activity of chemicals. The success ofthis approach was found to be chemical-class dependent. The committee 1 activity to date has been directed toward ranking formutagenic activity. Future efforts are planned forcomparing the ICPEMC mutagenicity rankings to animal carcinogen standards such as those proposed by Gold etal. (8). Inan activity related to this end, Nesnow (9) constructed a multifactor ranking scheme for comparing the carcinogenic activity of chemicals. This scheme was produced in collaboration with committee 1 and used a similar process to weight factors that influence potency to the one used in the mutagenicity ranking approach.
Each of the methods described has attributes that make it useful for specific purposes, but the methods are all primarily oriented toward carcinogenesis. GAPs are similar to the committee 1 approach in both graphic output and in the fact that they are directed toward mutagenicity per se.

Data Evaluation Methods Developed by ICPEMC
Once the basic structure of the committee 1 approach had been determined, data collection and analysis programs were written in FORTRAN 77 for a Digital VAX 750 computer. The software program was designed to be flexible and amenable to adjustment (fine tuning) as data entered into the database were evaluated. An alternate version of the program is being prepared for IBM-AT compatible personal computers. The ICPEMC approach has been identified as the mutagenic activity profile (MAP) method because of the graphic output format and because the scheme ranks chemicals according to their activity. Details of the data evaluation system and the techniques employed to maximize use of the method are currently in press (10,11).
In summary, the approach uses a weight-of-evidence concept combined with unweighted averaging of modified test results. The qualitative test responses (positive or negative) are modified by two factors: dose and assay replication. Defining doses are selected from the lowest effective dose (positive results) and the highest ineffective dose (negative results). Dose modifiers, which have been corrected for bias introduced by characteristics associated with the test system (11), are then applied to the calculations.
Each test system for which data can be entered into the scheme is uniquely identified by a three-letter code (Table 1) proposed by Waters et al. (4). Trials of individual tests are transformed to produce test scores. Scores from individual tests are combined into class scores by simple unweighted averaging. Test classes have phylogenetic and end point traits in common (e.g., gene mutation tests in prokaryotic cells, chromosome aberrations in cultured mammalian cells); a class such as A6 consists of tests that are presumed to detect gene mutation in cultured mammalian cells. Results from the L5178Y mouse lymphoma assay, HGPRT assay in Chinese hamster ovary or V79 cells, or gene mutation tests using human cell types would be combined in the A6 class. In vivo classes were constructed in a similar fashion. For example, class B6 consists of bone marrow metaphase cytogenetic analysis in mice, rats, hamsters, and humans.
Merging data into classes is performed by simple averaging. Class scores are combined into family scores, again by simple averaging. There are two family scores, one for in vitro results and one for in vivo results. Figure 1 summarizes the steps in the process for assembling and merging data into test, class, family, and agent scores. The process determines a score for each trial of a given test and then merges them into a score for the test, a score for the class, a score for the family, and finally, a single agent score (Sa) representing the consensus (weight-of-evidence) for the chemical. The consensus score defines the overall mutagenic activity based on all the test results.
The results of the evaluation process are expressed in both tabular and graphic formats. The tabular output lists each of thescores identified above, the calculations producing the scores, and reference citations for each of the data entries. The graphic format for ethylene oxide (Fig. 2) is used as an example and can be compared to the GAP graphics in Figure 3. The ICPEMC profiles are presented in diagrams with upper and lower plots. The upper portion of the diagram gives (in the two hemispheres) modified test scores for each trial (with a mean and confidence limits if the replicate number is three or greater), along with the three-letter identification code. Agent scores (Sa) can theoretically range from -100 to +100 with the 0 separating the active (+) or inactive (-) responses.
At each step of the process, scores are averaged with negative results down-weighting positive scores. The major determinants for location of the scores on scale are sign (+ or -), defining dose, and replication ofthe test. The final merging represents a consensus ofall entries. The test codes are arranged so that the tests within a given class (e.g., Al, A2, or Bi, B2) are clustered together. The lower portion of the diagram provides the class scores, family scores, and agent score. The name of the chemical, current date, and CAS number for the agent are also provided on the plot. The rationale for including a graphic as well as tabular outputs are a) to provide all data in a convenient, informative manner on a single page for quick reference and b) to permit users to follow the influence ofthe data reduction steps on the initial test results.
The data analysis and merging program has continued to evolve as more insight about test performance and data analysis has been gained. Consequently, there have been several versions of the agent scores, which have resulted in slight shifts of the chemical ranking. The system is approaching a point where the committee believes that it is working sufficiently well that final settings for the modifiers can made and the system should be released for general use. Because ofthe design ofthe program, additional information gained during use of the system can be used to "educate" the process by fine tuning the modifiers or by weighting some of the variables (10,12).
In developing the process in this manner, certain assumptions were made by the members of the committee: a) there were no established procedures available for using test results to classify chemicals as nonmutagens, but one was needed; b) there was insufficient information available to set weights for different tests. Therefore, all tests were assumed to be equally relevant to the process of determining mutagenic activity; c) both in vitro and in vivo data would be required to provide an accurate assessment of the genetic activity ofa chemical; d) replication of the agent in a test (up to a point) should provide, on the average, a better estimate ofthe mutagenic activity for the chemical than a single trial; e) merging test results, especially replicates of a test and tests measuring the same end point in similar types oforganisms, would not significantly violate scientific principles because a similar process is performed intuitively by most toxicologists when evaluating multitest results for a chemical.

Source of Data in the Database
The current database used to evaluate the approach and perform the statistical analyses consists of 4490 results for 113 chemicals. The primary data was provided to ICPEMC by the U.S. Environmental Protection Agency and contained results from many ofthe chemicals in the IARC Supplement 6 (13). The chemicals in the MAP database all have at least three in vitro tests and at least two in vivo tests. The committee set these minimums as requirements to evaluatetheability ofthemethodtohandlelarge heterogeneous data sets and because most ofthe test batteries in common use generally contained both in vitro and in vivo tests.

Concerns and Limitations of the Approach
The committee realized that developing a data evaluation scheme would involve treating genotoxicity data in ways that are different from treatments typically used to evaluate groups 84- 6    scores was seriously questions because of the concern that a single, possibly highly relevant, test result would be diluted by larger numbers ofnegative results. This potential problem was emphasized because ofanother limitation expressed and that was that input ofdatadoes not requirepriorexpert review, thus a positive result from a well-performed test may be maskedby several studies not properly performed with negative results. There was less concern that the converse of this situation might occur.
Another concern expressed by committee members as well as commission members reviewing the approach was the decision to give equal weight to in vitro and in vivo tests. In vivo data are generally viewed as more relevant to hazard identification and typically given more weight. Many individuals reviewing the process questioned the rationale for merging data by simple averaging ofmodified scores. This not only raised the potential ofdiluting unique test responses as indicated earlier but was also of concern because there was a general belief that tests measuring different genetic end points (gene mutation, aberrations, sister chromatid exchange, tansformation, etc.) measure quite different mechanistic phenomena that cannotbe merged by simple averaging. There were other concerns ofa lesser nature that were identified and recognized by the committee during its deliberations over the past several years.
The committee members considered all ofthese concerns and other likely limitations during the construction ofthe MAP scoring system. Resolution ofall questions was not possible, but the output of the scoring system with the existing data suggested in several cases that the potential limitations did not seriously flaw the evaluation scheme.

Results
Even with the limitations encountered, the MAP system produced by ICPEMC appears to accomplish many ofthe goals initially stated by the committee. Table 2 is a listing of the rank order 113 chemicals used in constructing the database. Some additional fine tuning of the system is expected, and before final release there could be some minor changes in the rank order of agents. In this latest version, ethanol, with an agent score of -27.70 (Fig. 4), was the least genetically active agent in the database, and triaziquone (Trenimon) with an agent score of +49.67 (Fig. 5), was the most genetically active. The rank order, with a few exceptions, seems consistent with an intuitive ranking ofmutagenic activity or with rankings from other experts or expert systems. 6 studies for mestranol to 275 studies for cyclophosphamide. Among the 113 data sets, 108 (96%) had mixed test results (both positive and negative). From the data available at the time ofthis report, only C.I. acid red (11 entries), melamine (8 entries), mestranol (6 entries), and polybrominated biphenyls (13 entries) consisted of entirely negative test data. Only chloroethylcyclohexyl-nitrosourea (9 entries) had all positive test results.

Data Interpretation
To fully use the MAP system, a practical application ofagent (Sa) scores must be developed. One can define, on a limited basis, the activity ofa chemical (e.g., mutagen, clastogen) from the unequivocal, reproducible data from a single test system such as the Ames test, the Drosophila sex-linked recessive lethal assay, or the mouse micronucleus assay; however, such a definition . The ICPEMC mutagenic activity profile for asbestos. carries little information concerning how generalized the activity might be across other test methods or other species and has no quantitative indication ofpotency. The ICPEMC scoring system attempts to introduce these two attributes into the mutagenicity definition. Several uses for the agent score have been considered as discussed below. The agent score could be viewed as an indication of the level of confidence (probability) that a chemical is a "general" mutagen across test and species boundaries. In other words, how likely is the chemical to produce a positive or negative response in the next assay to which it is subjected? The higher the agent score, the greater the probability that the chemical is a "general" mutagen and represents a human hazard. Agents that show potent but highly test-method-specific responses (i.e., a single test positive) will not generate a high agent score. Consequently, the agent score from a test battery could serve as a quantitative estimate of the genetic hazard of a compound. The agent score might be used in a qualitative manner to establish potential for germ cell hazard. Among the 113 chemicals in the database, 8 have been reported positive in rodent tests for heritable germ cell effects (14,15). Seven of the 8 (88%) germ cell mutagens showed positive agent scores. The one compound designated a germ cell mutagen which had a negative agent score was isoniazid (Fig. 6). A weak positive effect was reported in the mouse heritable translocation assay (1). Some consideration has also been given to the use of the agent score as an indicator of carcinogenic potential. Fifteen ofthe 113 chemicals fall into the IARC group I human carcinogens (16). Thirteen of the 15 (87 %) have positive agent scores ( Table 3).
The two human carcinogens with negative agent scores are asbestos (Fig. 7) and benzene (Fig. 8)  though many rdent carcinogens fall among the chemicals with high agent scores, some highly active rodent carciogens such as chloroform (Fig. 9), amitrole (Fig. 10), and TCDD (Fig. 11) all exhibited low agent scores. These agents belong to a heterogenous group of chemicals whose mechanisms of carcinogenesis are believed to be other than genotoxic (17). A subset of the 113 chemicals with these characteristics is listed in Table 4. Seventeen of the 19 agents in this nongenotoxic category have negative agent scores consistent with their assumed mechanisms and are also not mutagenic in the conventional Ames assay. The committee is currently evaluating the alternative uses of the agent scores. The relative ranking of chemicals in Table 2 coincides reasonably well with an intuitive assessment of their genetic hazard. This is especially true for those with very high or very low agent scores. There appear to be a few anomalies among the chemicals in the database, for example, procarbazine.  Hydrochloric acid (Fig. 12) has a relatively low agent score of 3.36. This chemical is highly mutagenic in rodent germ cells (18), yet ranks lower than other agents that would presumably pose less ofa genetic risk (e.g., acetaldehyde, nickel and formaldehyde). Benzene, which is quite active as a clastogen in vivo, has an agent score of -7.15. 'Ibis anomaly appears to result from the fact that a large number of negative studies have been conducted in vitro and these have diluted the limited number of positive results in vivo. This is an example related to some ofthe concerns expressed earlier. Both procarbazine and benzene appear lower in the agent score rankings than might be presumed generally.
Few instanccs of this situation were found upon an extensive analysis of thie database.

Conclusions
In spite of the early stage of development, it is clear that the ICPEMC committee I MAP avvroach of integzratingz and vro-cessing genetic toxicology data is capable ofmeeting many ofthe initial requirements set forth by the committee. The approach is able to cope with redundant, disparate, and missing data in the published literature. From the current database of 113 chemicals, the scoring method in its current configuration was capable of correctly assigning scores to almost all ofthe known heritable mutagens. Most human carcinogens in the database were assigned positive agent scores, and the category ofrodent carcinogens presumed to induce tumors by nongenotoxic mechanisms were all assigned negative agent scores by the method.
A crucial element in this exercise was to compare the mutagenic ranking of chemicals with their ranking as rodent carcinogens. To accomplish this, a parallel system for rank-ordering rodent carcinogens was developed by Nesnow (19). Once this new database is filled with sufficient chemicals to make a comparison meaningful, the results will be published.
A comprehensive statistical analysis has been performed with the existing database (11). Several preliminary findings have produced important insight into mutagenicity testing: a) In vitro and in vivo tests appear to respond similarly to a broad range of chemicals. b) Chemicals do not appear to be highly specific for genetic end points (gene mutation, sister chromatid exchange, clastogenicity, cell transformation). Class scores proved to be very congruent with the consensus (Sa) scores for the 113 chemicals. c) Using the 113 chemicals as surrogates for the universe ofchemicals, the range ofagent scores fall generally on a continuous, rather than a bimodal, scale with approximately halfthe chemicals having positive agent scores and half having negative agent scores.
The study and refinement ofthe ICPEMC committee 1 MAP method ofcomplex mutagenicity data evaluation will continue. Its adaptation to data assessment will be enhanced by the availability ofsoftware modified for use on personal computers. Based on the initial experiences with the approach, it is clear that important insights about genetic tests and test batteries will emerge. Whether this approach will break through the current barriers encountered in using genetic test to predict carcinogenicity remains to be seen.