Aspects of database construction and interrogation of relevance to the accurate prediction of rodent carcinogenicity and mutagenicity.

Attempts to reconcile qualitative carcinogenicity databases with qualitative mutagenicity database continue to indicate that there is no useful relationship between mutagenicity/genotoxicity and rodent carcinogenicity. It is suggested that recognition of two classes of carcinogen, genotoxic and nongenotoxic, is the first step in finding meaningful correlations between the above parameters. This then leads to purposeful intervention into the databases, including rejecting low quality data, abandoning some assays from the database, and clustering certain end points as repetitive rather that independent of each other. Seeking specific correlations within a focused database may yield knowledge from the current wealth of information. The effort required to build databases, particularly quantitative ones, has so far prevented the equally arduous task of their correct interrogation. Preliminary indications are the mutagenicity is closely correlated with genotoxic carcinogenesis and completely independent of nongenotoxic carcinogenesis.


Introduction
Nearly 400 chemicals have been assessed for carcinogenicity by the U.S. National Toxicology Program (NTP). A review of 301 of these chemicals was recently undertaken (1). Certain trends in the database were recongized, trends that may be worth pursuing among the much larger databases available beyond the NTP. A visualization of the database is provided in Figure 1. Here the chemicals are distributed in two dimensions. First, the cancer bioassay results are in order as follows: A, trans-species carcinogens; B, single species/multiple-site carcinogens; C, single species/single-site carcingogens; D, single sex/species/site carcinogens. Class Eis agents equivocal for carcinogenicity, and Frepresents two-species noncarcinogens. The second segregation is into six broad classes based on key aspects ofthe chemical structure of each agent (see legend to Fig. 1). The assignments ofan alert to potential electrophilicity were based on the megastructure recently presented by Tennant and Ashby (2). If a chemical was mutagenic to Salmonella, a filled symbol is used in Figure 1. From Figure 1, it is evident that the database is segregated into two broad groups: alerting and mutagenic carcinogens and structurally benign nonmutagens, most of which are noncarcinogenic. What is particularly relevant is that the vast majority ofthe structurally alerting two-species carcinogens are mutagenic to Salmonella, whereas the vast majority ofthe structurally nonalerting noncarcinogens are nonmutagenic to Salmonella. An alternative view ofthese conclusions is given in Figure 2. In Figure 2 it becomes even clearer that a significant number of carcinogens are devoid of alerts to potential DNA reactivity and are nonmutagenic to Salmonella. Whether or not *ICI Centrl Toxicology Laboratory, Alderley Park, Cheshire, UK. these chemicals are called nongenotoxic carcinogens is a matter ofwords; the real issue is how to predict further carcinogens of these types. It is at this point that the simple answer presents, namely, conducting multiple in vitro genotoxicity assays in the hope that at least one assay will find such agents positive. Whenever this has been done, the successful assay finds an equal proportion ofnoncarcinogens positive, thereby calling into question the relevance of all of the positive findings.
There is currently a growing literature that indicates many of these presumed nongenotoxic carcinogens to be active not by virtue ofhidden genotoxic activities, but by virtue ofsubtle changes they induce in rodent tissue homeostasis. That implies that the prediction ofsuch carcinogens will lie in studying rodent tissue homeostasis and its chemical disturbance, not in conducting multitudinous in vitro genotoxicity assays. Ifthat conclusion is valid, it has major implications for the design and interrogation of the major mutation/cancer databases. To be specific, the Genetic Activity Profiles (GAP) developed by Waters (3) ofthe U.S. EPA should perhaps contain entries on thyroid function/toxicity effects if they are ever to contribute to the prediction of thyroid-specific carcinogens; amassing genotoxicity data may not be enough. Similar considerations should perhaps also apply to artificial intelligence structure-activity programs such as CASE (4).
Finally, it is important to accept that both mutagenicity and carcinogenicity represent a continuum, starting with potent genotoxic mutagens and carcinogens, through probable nonmutagenic carcinogens on to nonmutagenic noncarcinogens. It is therefore potentially dangerous to reduce both mutagenicity and carcinogenicity to singular (plus or minus) phenomena. Among databases discussed during this meeting, only that of (open circles) Salmonella nonmutagen. AA, aromatic amino/nitrotype chemicals; Alk, natural electrophiles including reactive halogens; misc, minor groups of structurally alerting chemicals; inert halogen, nonalerting chemicals containing a nonreactive halogen; minor structural concerns, nonalerting chemicals but with minor concerns; no structural alerts, compounds devoid of actual potentially electrophilic centers. Levels of carcinogenicity (A-F) are described in the text, with class A being multiplesite/trans-species carcinogens and class Fbeing two-species noncarcinogens. M, mouse; R, rat (both sexes in each case).
the International Commission for Protection against Environmental Mutagens and Carcinogens (ICPEMC) (5) has the ability to regard mutagenicity as a continuum, and only that ofGold (6) has the implicit ability to regard carcinogenicity as such (TD50 ranges).

Using Databases to Enhance Carcinogen Prediction Capabilites
The study ofchemically induced cancer in rodents was initially empirical and concerned almost exclusively with probing the chemical basis of the responses produced. Thus, starting with coal tar in 1918, through isolation of pure polycyclic aromatic hydrocarbon carcinogens in 1933, an ever-expanding universe of structurally diverse carcinogens was defined. The electro-philic theory ofcarcinogenesis proposed by the Miller and Miller (7) enabled structure-activity relationships (SAR) to be understood at the molecular level. The definition ofvinyl chloride as a carcinogen, in the early 1970s, essentially completed definition ofthe structural boundaries of electrophilic carcinogenesis. The large majority ofthis groundwork was done in either universities or cancer research institutes.
The past 20 years have witnessed a dramatic change in the centers where chemical carcinogenesis is studied. Almost without exception, cancer researchers have abandoned all but a few reference carcinogens and have transferred their attention to the cellular and molecular aspects ofcancer biology. Coincident with this has been a surge of interest in the chemical aspects of carcinogenicity among environmental scientists. The various sciences involved are still struggling to find the optimum means to fulfill the broad remit accepted in the early 1970s, namely, to be aware of and to control where necessary chemicals likely to induce cancer or mutations in man. Of immediate relevance to the present meeting, this remit appears to be proving so difficult to achieve that reliance is increasingly being placed on correlations and their prospective (predictive) use by artificial intelligence systems.
The stated aim of this meeting was to increase mutual awareness ofthe many genotoxicity/carcinogenicity databases in existence and to enhance their consolidation and interaction. Those aims were surely achieved. However, recognition of the sheer magnitude of the databases available prompted many discussions ofwhy they were not yielding greater progress in the science of mammalian carcinogen prediction. The fact that such discussions were lively and constructive is a hopeful sign, but they are easily forgotten when returning to the grind of enlarging the databases themselves. With that in mind, some of those discussions are captured herein for wider consideration. All of the topics discussed here are related to the chapters preceding this one, so referencing is minimal. Attribution of specific comments or ideas is limited to instances where they provide a valuable context for the comment.

Information versus Knowledge
The late Malcolm Muggeridge commented on a recent war that the several spokesmen provided a maximum ofinformation that yielded a minimum of knowledge. The prospect that the same phenomenon is happening with our current databases was noted on several occasions. A measure of the knowledge available comes from the correlation coefficient between two data sets that are assumed to be related. Correlation coefficients of 0.4 to 0.7 were the average, and this confirms that no simple relationships exist for all chemicals. At the best, therefore, compromises will have to be made if existing data are to be used to predict effects for untested chemicals. It is interesting to wonder why this science remains as stable as it is given the poor overall correlations observed between any of the parameters studied (cancer versus mutation, etc.). This continued viability must be based on a deep feeling that meaningful correlations do exist, but that they are blurred by a range ofcomplicating factors that can eventually be recognized and corrected. An analogy of relevance happened during the meeting. Before the meeting, a conservative estimate would be that a large proportion of the population of the earth  was praying for a peaceful resolution of the Gulf crisis. It must have been one ofthe most focused experiments on the power of prayer, yet hostilities started coincident with the meeting and grew as it progressed. It is probable that faith in the power of prayer remains unchanged among the faithful, but this is against the immediate evidence and implies belief in a larger picture. So it is with all attempts to discern useful techniques predictive of carcinogenicity-study after study has failed to yield a simple answer, yet people still acquire new data, build new databases, and attempt to derive new and useful correlations. In some cases the implied belief in eventual success may be a convenient way to keep doing the things we always have done, but a more likely explanation is that experience to date indicates a growing trend to understanding, confused by irritating diversions that can be controlled or eliminated given the will to do so. Some ofthe blurring factors to be born in mind when enlarging and comparing databases are listed below.
Quality ofData. Themeeting discussed indetailthecomplex issueofwhetheradatabaseshouldlistallrelevantdataoronlythose thatachieveacertain level ofscientific acceptability. TheU.S. EPA Gene-Tox reports include only scientifically acceptable results, as do the derived Genetic Activity Profiles (GAP) of Waters (3). However, GAPs appearing in IARC reviews are subjectto vetting that can vary between successive review groups. Likewise, the CASE structure-activity learning database used by Rosenkranz (4) is usually limited in its carcinogen predictions to the -250 agents tested by the U.S. NTP. This isjustified by the high quality and coherence of these cancer bioassays. The dangers implicit in such a small learning set were emphasized, and this is sharply focused by the carcinogenicity database of Gold (6) that has several thousand entries, but which are of less even quality. A related and endemic concern is that protocols have advanced dramatically over the past decade, and it is known to follow that a large proportion of the earlier data in the more established databases would not be acceptable for entry ifconsidered today.
Perhaps this is the major challenge to any database-to maintain a core ofacceptable studies and data sets with which to conduct critical correlations among databases. The view was expressed that all databases should represent a repository of all available data, with subsets ofacceptable data being accessed by individual investigators as appropriate to their needs. This view was not generally supported; there was strong support for storing adequate data and rejecting inadequate material.
The Every Dta Set TeUls Us Something Syndrome. The early failure to discern simple correlations between genotoxicity data and rodent cancer data led to the proliferation of assays. Thus, the GAP format can contain up to -200 individual assay entries. This poses the question of what possible need there can be to generate such a database on any chemical. The question was also posed as to whether or not data entries in GAP (and the ICPEMC data format) should be limited to those derived from the 10 to 12 major assays in current use. Several answers to this question were implied, none ofwhich is very credible and each ofwhich will inevitably sustain the current blizzard of information: a) It is too early to decide which assays to omit. b) Data from any assay can add subtle refinement to the overall picture ofthe genotoxicity ofa chemical. c) Even ifassay data do not correlate with carcinogenicity, they may provide important information on mutagenicity.
The cyclic nature ofthese three mutuallly supporting answers is evident. Several speakers commented that the available evidence indicates that some assays (e.g., SCE or L5178Y in vitro) have no correlation with carcinogenicity, yet they are still used for this purpose. The above listed three answers were used to short ciruit that observation. At some stage the science will have to reject the use of some assays. Our joint inability to do this is currently delaying progress and reflecting badly on the objectivity of genetic toxicologists.
Suggestions Made during the Meeting. Despite the many problems faced, there is a underlying confidence that the science is fundamentally sound. A lot ofthis confidence comes from the increasing emphasis on mechanistic studies, mainly into the many mechanistic aspects of chemically induced carcinogenicity. But the results of such studies must be used to improve the basis by which correlations are sought, leading to improved methods of predicting carcinogenicity. Without prejudging the issue, the following points were discussed at some point in the meeting and may provide a means ofbreaking the present log-jam of information.
1. It should be assumed that some carcinogens are active via an overt capacity to damage DNA. These are readily detected by available tests such as the Salmonella assay, assisted by a second test such as a cytogenetic assay, either in vitro or in vivo. Artificial intelligence (e.g., CASE) and chemical knowledge are able to predict and rationalize such activities. These carcinogens should be handled as a separate group, as they are currently predictable. Secondary toxicities, such as induced cell division, are potent modulators of the observed carcinogenic activity (e.g., organotrophy and potency). Databases (including GAP/ICPEMC profiles) should be derived using no more than 10 standard genotoxicity tests for these carcinogens. 2. Having segregated genotoxic carcinogens within the databases (i.e., those overtly genotoxic and structurally alerting), the question of genotoxic noncarcinogens should be studied using focused databases. These databases should be edited to be as large as possible consistent with high quality data from established assays. 3. The remaining carcinogens (the so-called nongenotoxic carcinogens) should then be studied, again using focused databases. Any genotoxicity assay that appears to be capable of detecting such agents as positive should be checked for its ability also to find well-defined noncarcinogens negative. It is at this point that most assays fail, as evidenced by data presented at the meeting, all ofwhich was reflective of the data presented by Tennant et al. (8). The reality of some assays having nothing to offer in carcinogen or in vivo mutagen prediction should be jointly and clearly acknowledged. 4. Toxicological data, coupled to information on the tissues subject to apparent nongenotoxic carcinogenesis, should be considered, together with available mechanistic data. Thus, the ability of some (but not all) thiones to cause rodent thyroid tumors should be approached as a broad animal toxicological problem, not as a lottery for esoteric and littleused in vitro genetic tests.
5. Specific considerations should be given to the possible induction ofmutagenic effects in the absence ofthe induction oftumors. However, there are few data to support this possibility at present. Continued beliefin this prospect is totally justified, but it should not provide a cover for the continued use ofgenotoxicity assays that have no otherjustification for their existence. 6. Those building and maintaining the major databases should cleanse them of low-quality data, update them regularly, break up the databases into speculative subgroups (e.g., genotoxic/nongenotoxic; the bases can easily be remerged) and seek the meaningful correlation that surely exists amid the present sea of information. 7. We should all accept the futility of seeking simple correlations in mutagenesis/carcinogenesis that could apply to all chemicals and all end points. 8. The prospect that a totally new type of structure-activity relationship will underpin nongenotoxic carcinogenesis means that a new approach to the prediction ofthese effects must be considered. The will to cooperate evidenced at this meeting provides hope that true progress in protecting the human genome can be made.