Dealing with the data deluge in high throughput screening

Numerical taxonomy and pattern recognition analysis offer powerful tools that can greatly reduce the information burden of multiple-assay screening programs. These methods can be used to rationally design prescreens, identify assays that have similar chemical response patterns, select reporter assays for chemical response groups, evaluate drug selectivity, and predict a drug's likely mechanism of action. When combined with assays designed to identify lead compounds that have characteristics likely to cause failure at a later and more expensive stage of development, a simple three-stage primary discovery process consisting of a rational prescreen, reporters, and clinical failure assay can reduce the number of required culture wells by more than 20-fold and can eliminate all but 1–2 drugs per 1000 tested as leads for further evaluation and development.


Introduction
The extraordinary volume of data generated by high throughput screening has shifted the bottlenecks in drug discovery from compound acquisition and screening to the management and analysis of data. This presentation explores two questions. How can biological data be used to make the screening process smaller, simpler, faster, and cheaper? And how can biological data be used to better prioritize lead compounds for further development? Numerical taxonomy and pattern recognition o er powerful tools for addressing these questions, and can greatly reduce the information burden of multi-assay screening programs.

Identifying chemical response groups
Hundreds of di erent drug discovery assays are now available. With cancer, there are more than 300 di erent human neoplastic diseases, each a potential screening target. Do we need to screen against all of them? Or, is it possible that some cancers may be similar enough to one another in their chemical response patterns that a single assay might serve as a reporter for an entire group of cell lines?
To explore this issue, we performed similarity analyses (Pearson's and Kendall's tau) and three types of cluster analysis (city block, Pearson's coe cient, and Kendall' s tau taxonomic distance metrics using both average and median linkage methods for each) of the chemical response patterns for 72 tumour lines in tissue culture [1,2]. Most of the lines were of human origin, but some animal lines were used as well. Nearly 400 bioactive compounds were screened against the cell lines in dose± response mode using a homogeneous propidium iodide assay that measures the¯uorescence emission of dye molecules intercalated into double stranded regions of RNA and DNA [3]. The compounds encompassed a wide range of chemical structures and mechanisms of action.
As the cell lines employed had markedly di erent doubling times, conventional methods for quantifying drug e ects, such as I 50 or T =C, could not be used because the biological signi® cance of such indices is proportional to growth rate. Instead, we calculated a response function R F that allows e cacy and potency to be compared for cell lines with very di erent growth rates [4,5]. Three measurements must be made to calculate R F : a time zero sample Z at the start of an assay, and end of assay control C and test T samples. If growth of the test cultures is non- ‡100, total net growth inhibition has an R F value of 0, and an R F value of ¡100 re¯ects total culture extinction. R F values greater than 100 signify a net growth stimulation ; an R F value between 0 and 100 represents net growth inhibition, and a value between 0 and ¡100 indicates net cell killing.
For each compound, the test concentration was identi® ed that produced maximum di erential activity as indexed by the mean absolute deviation of e cacy values for all cell lines. At this concentration, a selectivity coe cient S Eˆh R F i ¡ R F was calculated for each cell line by subtracting the e cacy value of a cell line R F from the median e cacy value hR F i of all cell lines.
E cacy rather than potency was used for three reasons. First, potency is an interpolation while e cacy is a measured value. Second, potency requires an arbitrary activity criterion, such as total growth inhibition …R Fˆ0 †, which is often not achieved with a ® xed assay protocol; when the activity criterion is not achieved, potency values must be either deleted from the database or assigned an arti® cial value such as the highest or lowest concentration tested. Assigned values can be in error by orders of magnitude. Third, it is not uncommon for the dose± response curve of one assay to just barely achieve the activity criterion and the curve for another assay to just barely fail to do so. Even though the two curves are nearly superimposable, very large apparent di erences in potency can arise that are artefactual.
Ten di erent chemical response groups were identi® ed, each with its own unique chemical response pattern. The number of cell lines in each group ranged from two to 16 with a median of 5.5. There were four cell lines that were related to one or another of the ten groups, but not strongly enough to meet the inclusion criterion, and there were seven cell lines whose chemical response patterns were unrelated to any of the ten response groups. The results of the cluster and similarity analyses were essentially identical, suggesting that the taxonomic conclusions reached are robust.
Of the ten chemical response groups, six were considered to be clinically relevant, while four were not. The six clinically relevant groups varied considerably in their sensitivity to the chemical screening library. Maximum group median selectivity coe cients ranged from a low of 62 to a high of 150. Similarly, the most resistant group was selectively sensitive to just 79 compounds, while the most sensitive group selectively responded to 271.

Reporter assays
The question now arose: is it necessary to screen against all of the cell lines within a response group, or might there exist a single cell line that can serve as a reporter for the entire group? For each chemical response group, compounds that were selectively active against the group were rank ordered by median group selectivity coe cient S E from the highest selectivity to zero selectivity. The cell line was then identi® ed that recognized the greatest number of compounds active at various selectivity levels ranging from 0 to over 100. S E values from 0 to 40 represent weak selectivity; from 40 to 80 represent moderate selectivity; and greater than 80 represent strong selectivity.
Four of the reporter lines had an accuracy of better than 90% at S E values of 20 or greater, and ® ve exceeded 90% at an S E value of 40 or greater (® gure 1) . The worst behaving reporter line displayed an accuracy of more than 80% at S E values of 40 or greater. These ® ndings indicate that single reporter assays can re¯ect the behavior of an entire chemical response group with reasonable accuracy, and the use of reporter assays can greatly reduce the screening burden of multi-assay screening programs. The six reporter lines comprise the basis for a selective toxicity screen, which operates in dose± reponse mode.

Rationally designed prescreen
The identi® cation of reporter lines allowed us to reduce our number of cancer assays by more than tenfold. However, about 85% of the compounds tested were inactive, with the result that a further reduction in the screening burden could be achieved if a prescreen with fewer than six cell lines could be constructed.
Prescreens can be rationally designed by statistically determining the minimum number of assays required to identify at a speci® ed level of accuracy those compounds active against one or more of a larger group of screens.
Compounds were tested at a single high dose, and 50% net growth inhibition was used as an activity criterion for the prescreen analysis. The entire panel of cancer cell lines was examined to determine the single cell line that correctly identi® ed the greatest number of compounds that were active against ® ve or more cell lines (noise level of the system) . That cell line was placed in the prescreen. The remaining cell lines were then examined to ® nd the next cell line that correctly identi® ed the greatest number of active compounds not already identi® ed by the ® rst cell line. That cell line was also placed in the prescreen. This process was repeated iteratively until a prescreen was developed that could predict activity within the entire panel of lines with an accuracy of greater than 95%.
Surprisingly, this criterion was achieved with just two cell lines. The two-cell line prescreen identi® ed the activity of more than 800 compounds with an accuracy of 95.5%. There were 0.7% false positive and 3.8% false negative identi® cations.

Clinical failure assays
A common failing of many discovery screens is their inability to identify lead compounds that have characteristics likely to cause failure at a later and more expensive stage of development. Simple in vitro assays predictive of likely clinical failure can often be developed and included as part of the primary drug discovery process. Such clinical failure assays can quickly eliminate all but a few competing leads from further development. With cell-based screening, a clinical failure assay combined with a rational prescreen and response group reporters can eliminate all but 1± 2 drugs per 1000 tested as leads for further evaluation and development.
One of the major reasons for the clinical failure of anticancer drugs is the survival of residual tumour burden. Most traditional anticancer drugs act more as growth inhibitors than as target-eradicatin g cytotoxins. They may well kill a portion of a tumour cell population, but leave surviving cells that can grow back to life threatening proportions.
To identify drugs likely to permit the survival of residual tumour burden, we developed a long term recovery (LTR) assay that was incorporated as the third leg of our primary discovery process [3]. In the LTR assay, cultures are incubated with test compounds for 48 hours in T25¯asks at half-, just-, and supra-maximal concen- Figure 1. Accuracy of reporter lines.
trations. The drugs are then removed and the cultures washed and fed with a fresh drug-free non-bicarbonate d growth medium that uses beta-glycerophosphate as a bu er. This medium, e.g. Gibco's Nonbicarbonate d Growth Medium, is pH stable under atmospheric conditions, does not require a CO 2 -enriched environment, and contains phenol red as a colorimetric visual pH indicator [6]. The¯asks are capped tightly, and placed in a 37 8C incubator, where they are incubated for 60 days or until cellular regrowth is obvious.
There are two end points in the LTR assay: metabolic and proliferative. Metabolic recovery is monitored by visual inspection three times a week. Where metabolically surviving cells remain, their secretion of organic acids gradually changes the colour of the growth medium's pH indicator dye from red to reddish-orange , to orange and ® nally to yellow. When a colour change has become obvious, cultures are inspected microscopically to con® rm that the metabolism is the result of cellular regrowth and not microbial contamination. The extent of the pH change can be quantitate d by measuring the phenol red optical density at 560 nm. Proliferative recovery is quantitated by sulphorhodamine B protein optical density or the propidium iodide¯uorescence from double stranded RNA and DNA dye intercalation [3]. Control asks are collected at the time of drug addition and at the end of the 48 hour incubation period. Test samples are collected at these same times and after recovery has occurred. R F values are calculated as described above.
In the LTR assay, 85% of high priority lead compounds emerging from the selective toxicity screen failed to completely erradicate tumour cells in culture, allowing the cells subsequent regrowth. With most of the drugs that failed the LTR assay, regrowth was obvious within 3± 5 days and sometimes within 1± 2 days.

Three-stage primary screening process
The primary screening process that Andes has adopted consists of three stages: (1) an initial prescreen of two cell lines with single high-dose testing; (2) a selective toxicity screening panel in which test samples are screened in dose± response mode against six reporter lines; and (3) a long term recovery assay. For every 1000 compounds that enter the screening process, about 150 are active in the prescreen, 11 exhibit moderate or stong selectivity in the selective toxicity screens, and 1.7 show no regrowth in the LTR assay. The overall percentage of compounds that progress to further development is about 0.17% (® gure 2) .

Multi-assay ngerprints
Multi-assay screening data can be visually represented as bar graph ® ngerprints projecting to either the right or left of a central reference value depending on whether an assay is more or less sensitive to a test compound than is the reference value. With mean graphs [7], log potencies are ® rst determined for each assay, and the mean log potency identi® ed. The mean log potency is then sub-tracted from each individual value to produce an index of di erential sensitivity to the test compound. Negative values of this di erence indicate that an assay is more sensitive than the mean, while positive values re¯ect resistance.
Similarly, an e cacy ® ngerprint can be constructed by ® rst ® nding the median e cacy for a set of R F values, then subtracting individual R F values from this median. Again, individual di erences from the median are plotted as bars projecting to the right for assays that are more sensitive and to the left for assays that are more resistant than the median.
Multi-assay ® ngerprints, ® rst used widely by the National Cancer Institute, have proven valuable in a number of ways. They provide a simple method for presenting complicated data in a manner that is easily understood and which visually highlights patterns of di erential sensitivity (® gure 3) .

Comparing ngerprints
Fingerprints can be compared both visually and numerically. The mean absolute di erence (MAD) of two ® ngerprints provides a useful quantitative index of their similarity or di erence. Two compounds are compared by summing the absolute di erence of either their R F or log potency values for each assay, then dividing this sum by the number of assays. The more similar two compounds are, the smaller their MAD will be. When the ® ngerprint of a test compound is compared with the ® ngerprints of a large number of previously tested drugs, a great deal of useful information can be obtained. The best database matches often suggest the likely mechanism of action of a new test compound. Similarly, structural analogues are often very close matches as well, and can suggest likely chemical structures of active components in crude natural product extracts. Even when close matches are not analogues, they sometimes possess similar molecular surface properties or three-dimensional conformations. The insight that these similarities provides can be valuable in rational drug design. Finally, ® ngerprint comparisons are extremely useful in studies of structure-activity relationships.

A ssociation coe cients
Drug response patterns can also be qualitatively compared using association coe cients such as the simple matching coe cient S M and Jacard's similarity coecient S J . These association coe cients compare two drugs by determining on an assay by assay basis whether the drugs are matched (aˆ11 or dˆ00) or mismatched (bˆ01 or cˆ10) in their e ects, where 1 represents activity and 0 inactivity. S M is well suited to the situation in which active and inactive outcomes are roughly equal in frequency. It is the ratio of matches to matches plus mismatches: S J ignores the doubly inactive matches (00), and is well suited to the situation in which active compounds are infrequent by comparison with inactive: S Jˆa =…a ‡ b ‡ c †. A variety of other association coe cients are in common use as well [1,2]. Association coe cients can be used for the same purposes as ® ngerprint comparisons, or can be used in conjunction with ® ngerprints. For the two compounds shown in ® gure 4, S J was 0.903 for selectivity values (S E ) greater than 0, and S J 0.714 for selectivity values greater than 50.
Association coe cients have a particularly useful property: they can be used to compare multistate characteristics de® ned by logic operators such as AND, OR, and NOT. Thus an element for comparison could be: active in assays 1± 10 NOT active in assays 11± 15 AND water soluble at 1 mM AND a hydrophobic interior AND( sensitive to antimicrotubulars OR antifols). Such complex activity criteria permit very sophisticated comparisons to be performed.