Prediction of the carcinogenicity of a second group of organic chemicals undergoing carcinogenicity testing.

Twenty-four organic compounds currently undergoing testing within cancer bioassays under the aegis of the U.S. National Toxicology Program (NTP) were submitted to the computer automated structure evaluation (CASE) and multiple computer automated structure evaluation (MULTICASE) system for predictions of activity. Individual predictions resulting from the NTP combined rodent, NTP mouse, Carcinogenic Potency Database (CPDB) combined rodent, and CPDB mouse databases were combined using Bayes' theorem to yield an overall probability of rodent carcinogenicity. Based upon an arbitrary probability cut-off of 0.50, nine compounds were predicted to be rodent carcinogens. The predicted carcinogens are chloroprene, 1-chloro-2-propanol, codeine, emodin, furfuryl alcohol, isobutyraldehyde, primaclone, sodium xylenesulfonate, and t-butylhydroquinone.


Introduction
This article presents the results of the predictions of carcinogenicity in rodents of a group of organic molecules currently undergoing cancer bioassays under the aegis of the U.S. National Toxicology Program (NTP). We apply the computer automated structure evaluation (CASE) and the multiple computer automated structure evaluation (MULTICASE) structure-activity relational expert systems (SAR) (1,2) using several carcinogenicity SAR models (e.g., NTP carcinogenicity, Carcinogenic Potency Data Base [CPDB]). The results obtained with the different models are combined into a single prediction by the Methodology CASE/MULTICASE Detailed descriptions of the algorithms employed in CASE and MULTICASE have been published previously (1,2). The necessary input to the CASE/MULTI-CASE system consists of a database comprised of the chemical structures of biologically active compounds as well as experimentally measured values of the activity. The experimental values are expressed in terms of CASE and MULTI-CASE units ranging from 10 (inactive) to 99 (extremely active). In addition to this quantitative scale, compounds can be simply entered as inactive or active. The CASE/MULTICASE system automatically fragments each molecular structure into units of 2 to 10 heavy atoms together with their associated hydrogens. Accordingly, to be predictable by CASE/MULTICASE, a molecule must consist of at least 3 nonhydrogen atoms. The CASE/MULTICASE system accommodates fragments with branching at one position along the linear atomic chain. The resulting fragments are cataloged according to their origins (parent compounds). Fragments arising from active compounds are labeled as activating, while inactive compounds give rise to inactivating fragments.
The CASE/MULTICASE system performs a statistical analysis to identify those molecular fragments that are relevant to the observed activity. A binomial distribution is assumed, and any considerable deviation from a random distribution of a fragment among the active and inactive classes of compounds indicates potential significance to biological activity.
CASE utilizes the set of statistically significant fragments in predictions of the activity or inactivity of compounds. Predictions of activity/inactivity are expressed as probabilities and are based on the observed fragment distributions encountered within the learning set. Quantitative estimation of the potency (quantitative structure-activity relationship [QSAR]) for all the compounds within a database is achieved by a multivariate linear regression analysis based on the stepwise selection of a subset of molecular fragments. In addition, calculated values of the logarithm of the partition coefficient and its squared value (log P, log P2) are included as potential variables. The coefficient of each of the molecular fragments within the QSAR is a measure of the activating/inactivating contribution made to the activity by the presence of the fragment. MULTICASE, on the other hand, utilizes the set of statistically significant molecular fragments to find a descriptor (biophore) accounting for the majority of the active class of compounds. In addition to molecular fragments, MULTICASE considers two-dimensional distance descriptors. Compounds containing the primary biophore (fragment or distance) are removed from further consideration, and subsequent biophores are selected that explain the activity of the remaining compounds. This iterative process of selection is continued until either all the active compounds are accounted for or no significant descriptors remain. The presence or absence of relevant biophores determines the prediction of activity/inactivity. Unlike CASE, MULTICASE attempts to derive a QSAR only within each group of compounds sharing a common biophore. These local QSARs identify molecular features that modulate the activity/potency of compounds containing the biophore. The Environmental Health Perspectives * Vol 104, Supplement 5 * October 1996 modulators are selected from the associated pool of molecular fragments, distance descriptors, calculated electronic indices (lowest unoccupied molecular orbital, highest occupied molecular orbital, etc.), and calculated transport parameters (log P, water solubility, molecular weight). The contribution (positive or negative) of the modulators to the activity is valid only within the context of compounds containing the relevant biophore.
At this point, the analyzed CASE/ MULTICASE database can be queried regarding predictions of the activity of chemicals not present in the learning data set or experimentally untested agents. For each compound both CASE and MULTI-CASE predictions are made according to the respective SAR model. Compounds containing structural fragments not encountered in the learning set are flagged as "warnings" since the contribution of these unknown functionalities to biological activity is uncertain.
A typical example of a CASE/MULTI-CASE prediction is shown in Figure 1. CASE/MULTICASE predictions are shown for furfuryl alcohol tested against the CPDB mouse carcinogen database. The first part of the output deals with MULTICASE predictions. Furfuryl alcohol contains the biophore O-C=CH-CH= present in 13 carcinogenic compounds within the learning database. The compound also contains an extra biophore (O:-CH-). The overall conclusion from MULTICASE is that furfuryl alcohol has a 97% chance of being a carcinogen with an activity calculated to be 55 units, corresponding to a TD50 value of 0.17 mmol/kg/day. The CASE results are listed in the second half of the output. The overall conclusion from CASE is a probability of 89% with a calculated activity of 54 (0.20 mmol/kg/day).

Carcinogenicty Databases
Carcinogenicity databases used in the present study to predict the activity of chemicals currently undergoing experimental evaluation consist of results obtained from previous NTP evaluations (6) and the CPDB data assembled by . Data for both the rat and the mouse are available in both compilations. Furthermore, the rat and mouse data have also been combined into a rodent database whereby a chemical carcinogenic to either rat or mouse is classified as a rodent carcinogen. In the present study, only the NTP and CPDB rodent and mouse databases were used.
Because of the nature of the CASE/ MULTICASE system, only organic molecules can be analyzed. Furthermore, these molecules must consist of more than two heavy atoms (e.g., formaldehyde would not be suitable for analysis

CASE/MULTICASE Predictons
Submission of an unknown molecule to the CASE/MULTICASE system yields predictions based on both CASE and MULTI-CASE SAR models ( Figure 1). Specifically the predictions generate MULTICASE potency units (MCu), MULTICASE probabilities (MC%), CASE potency units (Cu), and CASE probabilities (C%). To maximize the use of the information within these four predictive indices, each was evaluated individually as to its predictivity. This was accomplished by performing n-fold cross-validation studies using a range of threshold values (corresponding to CASE/MULTICASE potency units and probabilities) serving as demarcations between actives and inactives. Indices that consistently resulted in poor sensitivities or specificities were eliminated from consideration. As a result of these studies, we were able to assign optimum threshold values to the acceptable CASE/MULTICASE potency units and probabilities. These values served to classify compounds as active (+) or inactive (-) and were used to determine the respective sensitivities and specificities of each individual index (MCu, MC%, Cu, C%). The individual activity classifications (+ or -) corresponding to the four individual prediction indices can be combined according to Bayes' theorem to provide an overall predicted probability of activity within each SAR model.
For the NTP rodent, mouse, and rat databases, complete leave-one-out validation studies were performed. These involved removing one compound at a time from the complete database and rederiving a CASE/MULTICASE SAR model for the remaining compounds. The compound originally removed was subsequently submitted for a prediction of its activity. This procedure was performed n times, where n represented the total number of compounds within each of the respective databases. The predicted values of Cu, C%, MCu, and MC% for each of the chemicals were tabulated for comparison with the actual experimental activity. Because of the large size of the CPDB set, a complete leave-one-out cross-validation was not attempted. Alternatively, only 57 randomly selected compounds were removed one at a time from the rodent, mouse, and rat databases.
The optimal threshold values for classifying a compound as active (carcinogenic) or inactive (noncarcinogenic) were based on the Cu, C%, MCu, and MC% obtained from the above validation studies. The selection of optimal threshold values for each of the prediction indices was performed in isolation; e.g., the value chosen for MCu was independent of the values selected for MC%, Cu, and C%. The chemicals for which predictions were made during the cross-validation studies were used as the calibration set for determining predictions of activity/inactivity on the basis of the threshold value being considered. Comparison with the experimentally observed activities yielded the respective concordances, sensitivities, and specificities for each value being considered as a cutoff. The optimal threshold values as well as the corresponding concordances, sensitivities, and specificities are listed in Table 1 for the NTP and CPDB rodent and mouse databases. The NTP and CPDB rat models were found to lack sufficient concordance to be included in the battery of SAR models used for prediction. The NTP rodent and mouse SAR models employ only the MCu, MC%, and C% as predictive indices to arrive at a conclusion of activity.
The CPDB rodent and mouse SAR models, on the other hand, use all four indices. This is not unexpected; for the NTP-derived models, the assigned potencies reflect a spectrum of carcinogenic activities, while for the CPDB models, they reflect TD50 values expressed as mmole/kg/day (12).

Application of Bayes' Theorem
To combine all four predictions into a single overall conclusion, we sequentially applied Bayes' theorem (5). Bayes' theorem is based on the fact that the joint probability of two events can be written as the product of the probability of one of the events and the conditional probability of the second event, given the first event.
If we designate the two events as "A" (or "not A") and "+" (or "-"), the form of Bayes' theorem is as follows: P(AI/+)-= P(+/A)*P(A) P(+/A) * P(A) + P(+/notA) * P(notA) The left side of Bayes' theorem is the probability that we are dealing with the state "A," given that we have observed the data "+." The probability P(+/A) represents the probability of the observed data "+," given that we are dealing with "A." Finally, P(A) is our current belief concerning the probability of "A". In the above form, P(A/+) is termed the posterior probability, i.e., the updated probability, given that we have observed a "+". P(A) and P(not A) are termed prior probabilities because they are decided on before any new data are known to us. When we are in the diagnostic or predictive setting, Bayes' theorem takes on the following form: P(Active/+) = Sensitivity * P(Active) Sensitivity * P(Active) + (1 -Specificity) * P(not Active) Using this form of Bayes' theorem, estimates of sensitivity and specificity associated with the prediction models and an estimate of the probability of activity of the chemical, we can estimate the new probability that a chemical is active, given that we have obtained a positive prediction when we apply our model. For the purpose of the present analysis, we have assumed a prior probability of 0.50, which reflects the prevalence of carcinogens in the NTP database. Similar expressions are available for P(Active/-), P(not Active/+), and P(not Active/-) (13).
Each additional application of Bayes' theorem uses the previously estimated posterior probability of activity as the new prior probability and the relevant estimates of sensitivity and specificity from the new source. By sequentially assimilating the data through Bayes' theorem, we obtain an increasingly updated estimate of the probability of activity for the chemical under consideration. This probability is based on the consideration of all four metrics (probabilities and units) obtained from the CASE/MULTICASE SAR models. Table 2 lists the calculated probabilities for the NTP/CPDB rodent/mouse SAR models that result from combining the activity classifications (+/-) with respect to MCu, MC%, Cu, and C%. The overall probabilities for each possible activity pattern were calculated by Bayes' theorem with the appropriate values of sensitivity and specificity ( Table 1).
Environmental Health Perspectives * Vol 104, Supplement 5 * October 1996 SAR model is available. This is especially relevant to the prediction of carcinogenicity, wherein several databases can be used in making assessments of carcinogenic activity. Given the complexity of carcinogenicity, we do not expect a single SAR model to be as effective as the use of a battery of carcinogenic SAR models. Indeed, we expect that using such a battery for an overall prediction will overcome some of the limitations associated with individual SAR models.
In the present study, the carcinogenicity of the target compounds was predicted based on four carcinogenicity databases for which predictive SAR models had been derived: NTP rodent, NTP mouse, CPDB rodent, and CPDB mouse. To make an overall prediction of carcinogenicity, we applied Bayes' theorem using the predictions from the four models and their respective sensitivities and specificities (calculated from the n-fold cross validation studies). Table 3 lists the calculated probabilities resulting from combining all the possible predictions using the four carcinogenic SAR models. Elevated overall probabilities can be assumed to be indicative of an increased potential for rodent carcinogenicity. In the present analysis we arbitrarily selected a probability cut-off of 0.50 to classify compounds as carcinogens.

Results and Discussion
Our previously established CASE/MULTI-CASE SAR models (14,15) pertaining to NTP rodent, NTP mouse, CPDB rodent, and CPDB mouse data were used to rank the potential for carcinogenic activity of 24 organic compounds currently undergoing testing (Table 4). Organic salts were analyzed as the free bases/acids. The four inorganic compounds (cobalt sulfate heptahydrate, gallium arsenide, molybdenium trioxide, vanadium pentoxide) are not amenable to CASE/MULTICASE analyses.
Predictions were also not made for nitromethane and sodium nitrite because these compounds are too small for CASE/ MULTICASE analyses (above). All possible isomers of sodium xylenesulfonate were analyzed individually since commercial samples of xylene are a mixture of the ortho-, meta-and para-isomers. Table 4 lists the results of our analysis of the carcinogenic potential exhibited by the chemicals of interest. Predictions of carcinogenic activity within each individual CASE/MULTICASE SAR model used in this study (Table 1) are shown, as well as the overall calculated probability resulting from the sequential application of Bayes' theorem as outlined above. Individual SAR model predictions may differ from one another because the SAR models differ (the fragments identified within the mouse model are not necessarily identical to the rodent fragments). The chemicals and their overall predictions are listed in terms of the increasing probability of exhibiting carcinogenicity. If we arbitrarily assign a probability value of >0.50 to indicate a potential for carcinogenicity (+), nine compounds would be classified as such. Our SAR models, however, are based on oral administration and hence the predictions are based on the same assumption.
Five chemical compounds are predicted as carcinogens in at least two of the CASE/ MULTICASE SAR models (1-chloro-2-propanol, emodin, furfuryl alcohol, primaclone, and t-butylhydroquinone). Furfuryl alcohol and t-butylhydroquinone are predicted to be carcinogenic in three and four, respectively, of the SAR models. The remaining compounds (chloroprene, codeine, isobutyraldehyde, and sodium xylenesulfonate) are concluded to be carcinogenic by virtue of being so predicted by the CPDB mouse SAR model.
In the case of sodium xylenesulfonate, only the 1,3-dimethyl-4-sulfonate and 1,4-dimethyl-2-sulfonate isomers yielded positive predictions. Assuming that sodium xylenesulfonate is a mixture, the isomers responsible for carcinogenicity may not reach critical levels to induce carcinogenicity under experimental conditions. The order is NTP rodent, NTP mouse, CPDB rodent, and CPDB mouse. Probability > 0.5 is scored as "+."    Figure 2 contains the chemical structures of the compounds predicted to be rodent carcinogens. In addition, the molecular regions identified as associated with carcinogenicity are indicated. These molecular regions consist of fragments identified by the CASE and/or MULTICASE SAR models of CPDB mouse data. Table 5 lists the putative carcinogens predicted in this study, together with known rodent carcinogens sharing carcinogenic CASE/MULTICASE fragments used within the CPDB mouse SAR model. It is presumed that compounds containing identical fragments in the same molecular environment will ci Chloroprene Codeine f-Q HO Furfuryl alcohol Primaclone exhibit similar modes of action and consequently exhibit similar biological activities.
Our approach entails the rational evaluation of a chemical's potential to exhibit carcinogenic activity on the basis of chemical structure alone. The quality of predictions depends on the learning set used to derive the resulting SAR model as well as on the complexity of the biological phenomenon modeled. Thus, in a previous study using a Salmonella mutagenicity SAR model, the concordance between experimental and predicted CASE/MULTICASE results for 100 chemicals was 76% (16). Obviously, given the multiple mechanisms associated with carcinogenicity, we do not expect the SAR model for carcinogenicity to be as good as that for mutagenicity. However, we anticipate that the use of a battery of carcinogenic SAR models for an overall prediction will overcome some of the limitations associated with individual SAR models. Indeed, we have compared the respective concordances, sensitivities, and specificities of individual SAR model predictions with the carcinogenic SAR model battery predictions for an independent test set of compounds (17). The SAR model battery possessed the highest overall concordance with a significantly enhanced sensitivity and a comparable specificity. Furthermore, the SAR model battery predictions were significantly more accurate than using Salmonella results alone or in combination. Finally, the SAR data used to derive our models are based on the oral route of administration and may not be applicable to the prediction of carcinogenicity by other routes.