Quantitative predictivity of carcinogenicity of the autoradiographic repair test (primary hepatocyte cultures) for a group of 80 chemicals belonging to different chemical classes.

In this work we have investigated the correlation existing between a short-term genotoxicity test (DNA repair in rat liver cells) and carcinogenicity in rodents. The work is in the framework of a line of thinking that considers as a possibility the utilization of the quantitative component of the information obtained from genotoxicity tests. In a preliminary report for 25 compounds belonging to different chemical classes, a correlation coefficient of 0.36 was found between carcinogenic potency in small rodents and potency in autoradiographic repair. This level of correlation is comparable with similar levels found for many other short-term tests: Ames test, alkaline DNA fragmentation in vivo, DNA adducts in vivo, morphological transformation in vitro and SCE induction in vivo. Obviously, since only 25 compounds were examined, assessment was rather uncertain, and the subdivision of the set into subsets for different chemical classes would have generated groups too small for a meaningful statistical analysis. With a much larger set (80 compounds) we hoped to be able to discriminate different predictivities for different chemical classes. This seems important because the test could be much more suitable for one given class than for another. Previous investigations with different short-term tests have shown that these differences can indeed exist and be very great. In this respect it is potentially very encouraging that the test considered here showed a fair correlation with carcinogenic potency for aromatic amines. Many other tests that we have examined so far have shown little or no predictivity for this important class of chemicals.


Introduction
In previous works we have discussed at length the possibility of utilizing the quantitative component of the information obtained from genotoxicity tests in the presence of high levels of statistical "noise" (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16). Among the works listed, a more comprehensive discussion of these problems is given. In the present work, this possibility will be given as a postulate, and we will analyze the degree of correlation with carcinogenic potency of the responses obtained in a specific genotoxicity test: autoradiographic repair in primary cultures of liver cells in vitro. The results obtained will be compared with the ones previously found for other short term tests.
In a preliminary work we investigated the quanti-*Department of Clinical and Experimental Oncology, University of Genoa/Instituto Nazionale per la Ricerca sul Cancro (IST). Vaiale Benedetto XV, 10-I-16132 Genova, Italy. tative predictivity of carcinogenicity for this test (1). The average level of predictivity (r = 0.36) was similar to that observed for other tests (2).
Owing to the relatively small number of compounds examined (25 chemicals), it was impossible to investigate the predictivity of the test for specific classes of chemicals. We have already observed that quite frequently a test can be highly predictive for a specific class of chemicals but not predictive at all for others (3,7,16). It therefore seemed of interest to investigate possible differences in predictivity for different chemical classes on a larger group of chemicals (80 compounds).

Sources of Data and Criteria for Evaluation of Potency
The studies utilized as the source of the autoradiographic repair data were obtained from Probst et al. (18)(19)(20) and from Williams et al. (21)(22)(23)(24). The methods for obtaining the hepatocytes were very similar for the two groups. The major differences in the execution of the experiments were related to the length of treatment and repair, length of exposure time of the emulsion in the dark, and specific activity of tritiated thymidine. We were able to take into account all those factors in the formula used for the evaluation of the potency of the response. The formula used was the following: Unscheduled DNA repair index (UDI) where CF is a correction factor for specific activity and concentration of tritiated thymidine, C is the concentration of the chemical tested in mmoles, T1 is the length of treatment in hours, T2 is the length of repair in hours, and T3 is the exposure time of the emulsion in the dark in days. Both authors used 10 ,XCi/mL of labeled nucleoside. CF was fixed arbitrarily at 1 when the specific activity was 22 Ci/mmole.
For specific activities between 50 and 70 Ci/mmole, CF was fixed at 0.6, taking into account not only the higher specific activity but also the lower concentration (25).
For both authors, T, and T2 coincide. In different experiments T, and T2 = five or 18 hr. For a computation off (Tl, T2), when both tritiated thymidine and the chemical under study are present at the same time in the medium, we have followed the mathematical treatment described elsewhere (26). Assuming that the half-life of damage is much longer than the time allowed for repair, then, making t = T, = T2, the amount of repair will be proportional to t2 [see eq. (6) of Parodi et al. (26)]. If, on the contrary, the half-life of damage is assumed to be much shorter than the time allowed for repair, then the amount ofrepair will be proportional to t [see equation (7) of Parodi et al. (26)].
For t = 5 or 18 hr, t2 will be equal to 25 or 324 hr. Dividing these two numbers by 2.5 we obtain 10 and 129.6, respectively. Considering that we move from the limit of eq. (6) in direction of the limit of eq. (7) for longer times of repair, we approximated the second value to 100. For f(Tl, T2) we used the proportional numbers 10 and 100, respectively, in order to simplify most of the computations. It is worthwhile to underline that different considerations could have changed some of the values attributed tof(T1, T2) by an order of magnitude of at most two to three times. This variation seems negligible in respect to the differences in UDI potencies, that can span a range of three to four times log10.
Two types of emulsion were used in these studies: NTB and NTB2. Provisionally, no correction was introduced to take in account this difference. For the evaluation of the carcinogenic potencies, we utilized two sources of data: data elaborated by our group (3) and data elaborated by Peto and co-workers (27,28). This second set of data was normalized according to our formula, transforming their tumor dose 50 into our oncogenic potency index, taking into account correction factors related to food intake and body weight estimations. The formula finally used was the following: Oncogenic Potency Index (OPI) = -) Dt according to Meselson and Russell (17); I is the incidence of animals with at least one tumor, over controls; D is dosage in mmole/kg/day equivalent to the total dosage divided by 730 (730 -2 year exposure); t is the average duration ofthe experiment (time unit = 2 years) (16,17).

Results and Discussion
For computation of OPI and UDI data, in the framework of the same experiment, the dosage generating the highest potency was selected.
When, for the same compound, different experiments were available, we calculated the arithmetic mean of the different results. Correlations were made using log1o of OPI and UDI values. The potencies obtained for the different compounds are listed in Table 1. The compounds were divided into different subclasses, according to their chemical structure. In Table 2 we show the overall correlation values and values for individual chemical classes. In order to verify if our data may not be approximately normally distributed, in Table 2 the different correlations were also analyzed by using the nonparametric Spearman's rank correlation coefficient (31). The results obtained were globally very similar and their statistical significance was also very similar to the statistical significance of the correlation coefficient r obtained with the parametric method.
As expected the efficiency of the nonparametric method is slightly lower than the efficiency of the parametric statistics. As a consequence the levels of significance were slightly reduced using the Spearman's rank correlation test.
From the experience of this work and from the experience of previous works in which other short-term tests were correlated with carcinogenicity (1-3,12), it appears that both OPI and test potencies have essentially a lognormal distribution. As a consequence, parametric statistics are acceptable only applied to the logarithms of the potency values.
The same results are also shown in Figure 1. The data were subdivided into two subsets both for OPI and UDI: compounds above and below median value.
In the case of a perfect correlation, 50% of the compounds should be in quadrant C and 50% in quadrant B. With a correlation level of -0.3 only 66% of compounds are correctly placed in quadrants C + B.
This elaboration gives an idea of the correspondence existing between a qualitative and a quantitative approach. It suggests that the quantitative approach is feasible even in the presence of a level of statistical "noise" that is also very relevant from a qualitative point of view.    (30). Nonparametric statistical computations according to (31). bNS = p > 0.05 The first result that emerges from our analysis is a confirmation of our previous observation, related to a much smaller number of chemicals (1,29). The overall correlation level seems statistically significant but not especially high (r -0.3).
This level of correlation is slightly lower than the correlation level found for other short-term tests (2,3,16). For instance, a correlation level of about 0.65 was found for the parameter morphological transformation of hamster embryo cells in vitro (12), and a correlation level of about 0.57 was found for the parameter induction of SCEs in bone marrow cells in vivo (2).
However, the differences mentioned above are not statistically significant and could well be related to sample variability. When we look at single chemical subclasses (see Table 2) we can observe that they are spread over a fairly large range (between zero and 0.72). Even so, owing to the very limited number of the samples, not even the difference between aromatic amines and esters and carbamates is statistically significant (p< 0.25) (30).
The situation that we have observed for aromatic amines in this study and in previous works is illustrated in Table 3. There is a strong possibility that the test considered in this report is the only one predictive for aromatic amines. However, the serious limitation of the small size of the samples remains.
Another important point worth considering in the framework of this approach, which utilizes the quantitative component of the information contained in genotoxicity tests, is the possible advantage in predictivity that can be obtained using a battery of tests. We have already shown that with this type of approach it is possible to obtain a better estimation of carcinogenicity by organizing the data according to a multiple regression analysis (6). In Figure 2, taken from a previous work (6), the gain in predictivity obtained with a battery of two or three tests is illustrated. In the ordinatae, the correlation level with carcinogenicity is given. In the abscissae, the internal correlation between couples of short-term tests is given. Five double curves are presented; each curve refers to a different level of simple correlation (from 0.35 to 0.55). The heavy lines depict three tests; the light lines depict the situation for a battery of two tests. Important gains in predictivity can be obtained with a battery of tests. What is important is that the different tests employed are complementary and not merely repetitive. In other words they should have a fairly low level of internal correlation. We have already preliminarily explored the correlation level among couples of different short-term tests  (6) with the permission of the editors. (3,6,8,10,15). In Table 4, we show the relationship between UDI and other tests already studied. Unfortunately, both in this case and in the case of the differences in predictivity observed for different chemical subclasses, potentially important choices ofthe most suitable battery of tests are rendered impossible because of the insufficient number of the samples available for analysis.
As we have observed for other tests, the quantitative approach utilized here can potentially offer elements for much better predictions than the qualitative approach. But most of this promise remains unfulfilled, as a consequence of the fact that too many data are available for different tests and in different experimental conditions and too few data are available for a few basic, sufficiently normalized, tests.
It is important to emphasize that we suggest using the quantitative component of the information obtained from genotoxicity tests where there is a high level of statistical "noise" and a relatively low level of correlation with carcinogenicity. Our idea is that in this way, the uncertainty of the prediction is very large but can be measured (for instance in terms of belt zones of confidence). This is very different from the situation in which an attempt at a quantitative correlation between carcinogenicity and mutagenicity was proposed for the first time (17). Meselson and Russell presented in their report, as potentially representative of a more general situation, a regression line between mutagenicity and carcinogenicity with a correlation level of 0.94! This is totally unrealistic, and the potential usefulness of the quantitative component of the information must be considered and discussed only in the presence of high levels of statistical "noise."