An isogenic cell line panel for sequence-based screening of targeted anticancer drugs

Summary We describe the creation of an isogenic cell line panel representing common cancer pathways, with features optimized for high-throughput screening. More than 1,800 cell lines from three normal human cell lines were generated using CRISPR technologies. Surprisingly, most of these lines did not result in complete gene inactivation despite integration of sgRNA at the desired genomic site. A subset of the lines harbored biallelic disruptions of the targeted tumor suppressor gene, yielding a final panel of 100 well-characterized lines covering 19 frequently lost cancer pathways. This panel included genetic markers optimized for sequence-based ratiometric assays for drug-based screening assays. To illustrate the potential utility of this panel, we developed a high-throughput screen that identified Wee1 inhibitor MK-1775 as a selective growth inhibitor of cells with inactivation of TP53. These cell lines and screening approach should prove useful for researchers studying a variety of cellular and biochemical phenomena.


INTRODUCTION
Recent advances in chemical synthesis techniques and robotics have led to an expansion in the availability of small molecule libraries (Clark et al., 2009;Gerry and Schreiber, 2018). With the availability of curated libraries containing more than a million compounds, screening emphasis has shifted to identifying good targets and robust screens to efficiently exploit these libraries (Hatzis et al., 2014;Mullard, 2019). Highthroughput screening (HTS) assays can broadly be divided into biochemical and cell-based assays. Biochemical assays enjoy the advantages of low cost, facile scaling, specificity of measured outcome, and the ability to incorporate rigorous controls (Inglese et al., 2007). However, not all pathways, cellular functions, or phenotypes can be adequately captured in biochemical assays. For example, cell-based assays have the advantage of directly identifying compounds that produce the desired biological effect via known or unknown mechanisms (Pech et al., 2019).
The unprecedented progress in defining the cancer genome gave rise to hope for the development of new targeted cancer therapeutics. This hope was largely driven by early success of targeted therapies that inhibited the function of oncogenic driver mutations (Druker et al., 2001). However, while the typical adult solid tumor harbors 3 or more driver gene mutations, most of these mutations affect tumor suppressor genes, with many tumors lacking even a single oncogene mutation (Vogelstein et al., 2013). Even when effective therapies for targeting oncogenes are found, resistance to monotherapy is almost guaranteed in patients with major tumor burden (Diaz et al., 2012;Engelman and Settleman, 2008;Garraway and Lander, 2013). The optimal strategy to overcome this resistance is to treat patients with combinations of drugs targeting different cancer growth mechanisms (Diaz et al., 2012). But as noted above, more than one oncogene mutation is unusual in most common cancer types.
Effective strategies for targeting loss of functions associated with tumor suppressor gene (TSG) mutations would substantially increase the number of therapeutically addressable pathways. Unfortunately, to date, only one FDA-approved therapy specifically exploits a TSG loss of function mutation (Lord and Ashworth, 2017;Zhao and DePinho, 2017). This therapy, as well as other approaches for targeting the loss of function associated with TSG mutations is based on the concept of synthetic lethality or essentiality (Hartwell et al., 1997;Kaelin, 2005). This concept was originally described in yeast, and a key aspect of assigning specificity 1 ll OPEN ACCESS to synthetic lethality is the availability of isogenic cells differing only in a single genetic alteration (Torrance et al., 2001;Wang et al., 2017). CRISPR-based technologies allow the creation of such lines in human cells.
In response to these issues, we created a human isogenic cell line panel targeting 19 critical genes inactivated in cancer. Each of these lines was engineered using CRISPR-based methods to disrupt a single tumor suppressor gene, and each contained a unique genetic barcode to permit multiplex screening. For each cell line, multiple orthogonal assays were used to validate successful gene disruption. Moreover, the panel was constructed from three distinct normal cell lines to ensure the generality of observed affects. And, finally, a sequence-based ratiometric assay was designed from this panel that incorporates numerous internal controls to maximize the reliability and sensitivity of the screening process.

CRISPR-Cas9 creation of the isogenic cell line panel targeting critical tumor suppressor gene pathways
We first sought to create a resource for screening compounds active in critical cancer pathways. We focused on 22 pathways that were collectively altered in greater than two-thirds of the cancers as assessed in multiple large scale sequencing efforts (Tables 1 and S1). In total, 22 TSGs and 3 control genes with  (Table S2) and Passenger Target TK1 a 9 6 0 0 0 Control (Table S2) HPRT a X 6 0 0 0 Control (Table S2 iScience Article known small molecule sensitivities (Table S2) were chosen for targeting by CRISPR-mediated knockouts (Table 1). Typically, 6 gRNAs were employed for each target gene (range 6-12) with three chosen from published studies and three designed de novo, targeting either known mutation sites identified from the COSMIC database or early exons within the gene (Sanjana et al., 2014;Smurnyy et al., 2014) (Tables 1  and S3). A total of 162 gRNAs were individually introduced into lentivirus constructs for gene targeting. Three distinct non-cancerous epithelial cell lines, RPE1 (retinal), MCF10A (breast), and RPTec (renal), were targeted. All three lines have a predominately normal karyotype and the only known genetic alteration among the three lines was a homozygous deletion of the CDKN2A gene in MCF10A (Jonsson et al., 2007). After transduction with lentiviral CRISPR-Cas9 and puromycin selection, over 1,800 individual CRISPR-targeted single cells were picked and expanded for subsequent characterization.

Genetic characterization of candidate knockout lines
A massively parallel sequencing approach was used to assess targeting and ensure that essentially all of the cells within any chosen cell line had the expected genotype. For this purpose, a SafeSeqS approach was implemented which utilizes unique molecular barcodes to reduce errors from PCR or sequencing (Kinde et al., 2011). For each gRNA, two distinct sets of primer pairs were designed to cover the targeted region.
In total, 324 PCR primer pairs were designed and used to amplify the 162 gRNA genomic target regions (Table S4). This analysis confirmed successful gene disruption in only 302 of the greater than 1,800 lines tested. Though one might have expected a higher fraction of successfully targeted lines based on the previous successes of functional screens (Ling et al., 2020), our criteria for gene disruption were particularly stringent: both alleles had to contain out-of-frame insertions or deletions that could not be readily ''rescued'' by skipping an exon during splicing. Moreover, we used high-depth sequencing and required that the fraction of reads containing an intact targeting site was <1%. In the 302 lines chosen on the basis of the sequencing results, the deletions ranged from 1 bp to 38 bp and the insertions ranged from 1 bp to 43 bp ( Figure S1). Over 31% of cell lines harbored a single base pair insertion or deletion, and an additional 9% of the lines harbored 2 bp insertion or deletion ( Figure S1). The targeting success rate varied across genes and cellular backgrounds. Overall, we successfully identify cell lines with biallelelic gene inactivation in 22 of the 25 targeted genes in one or more cellular background, covering 50 of the theoretically possible 75 gene-cell line combinations. Each of these biallelic mutations were predicted to cause significant disruption of gene function ( Figure 1A, Table S5; Douville et al., 2013;Masica et al., 2017).

Orthogonal validation of knockout lines
We next sought to orthogonally validate the disruptions in these 302 lines. We established a hierarchical validation strategy where we first sought to establish loss of protein by Western blot analysis, followed by immunohistochemistry (IHC) and finally loss of wild-type transcript by transcriptome analysis. Western blot assays were performed on 95 cell line and protein loss was confirmed in 71 of them ( Figure 1B and Table S6). For 102 of the cell lines, we performed IHC assays and confirmed protein loss in 65 lines ( Figure 1C and Table S6). Finally, to validate 4 genes lacking western or IHC assays and to begin to characterize the transcriptomes of additional selected lines, we constructed RNA-Seq libraries from 97 isogenic cell lines and sequenced them to an average depth of 2.2 3 10e7 reads per cell line (Table S6). To be validated by RNA-Seq, at least 15 reads (Average = 87.5, N = 19) covering the mutated or flanking exons were required with no evidence of wild-type sequence or in-frame exon skipping. In total, we were able to validate loss of normal gene product at the protein or RNA level in 152 cell lines, representing 20 of the 22 targeted genes ( Figure 1D and Table S5). One hundred of these 152 lines were subsequently assembled into the ''Cancer Pathway Knockout Panel'' to minimize overlap while maximizing diversity (Tables 1 and S7).
As noted above, several genes with known chemical sensitivities were also targeted to provide controls for assay development (Tables S2, S8 and examples in Figure S2). In addition, we exploited the known differential sensitivity of cells without genetic inactivation of TP53 to the small molecule MDM2 inhibitor Nutlin-3a (Vassilev et al., 2004). Nutlin-3a causes cell senescence or death in cell lines with functional TP53 by increasing the amount of available p53 protein. As expected, cell lines with wild-type TP53 were 5-10 times more sensitive to Nutlin-3a than their TP53 null counterparts ( Figure S3).

Development of a multiplexed ratiometric cell growth assay
To demonstrate the potential utility of our engineered TSG knockout panel, we developed a screening platform that permits co-culture of multiple cell lines in a single well. Each of the cell lines in a single well thereby provides multiple internal controls for drugs that are generally toxic, rather than specifically toxic to a cell line   High-throughput screen of FDA-approved and clinical trial compounds Using the multiplex assay described above, we evaluated a library of 2,658 FDA-approved small molecule compounds for their ability to inhibit the growth of cell lines with specific pathway defects (Table S9). For this screen, we used 81 cell lines derived from two different parental cell lines and representing 19 targeted pathways (See Table S5 for cell lines used in screen). Each of the 81 cell line were exposed to 1mM or 10mM of compound for 72 h ( Figure 2A). As negative controls, each 96-well plate included wells without drug or vehicle and wells with only vehicle (DMSO). The positive controls included in each 96-well plate were one well treated with Nutlin-3a (an inhibitor of normal p53 function) and one well treated with staurosporine (a non-specific, cytotoxic control). The negative and positive controls performed as expected, with DMSO having no effect on growth ( Figure S5A) and staurosprine producing a marked reduction in growth (Figure S5B). The other positive control (Nutlin-3a) documented a pronounced difference between the growth of cell lines dependent on their TP53 status, as expected ( Figure S5C).
In total, 430,596 compound-cell line interactions were scored in this assay. Compounds of interest were identified by requiring both a statistically (i.e., Z-score of À1.5 or less) and biologically significant (i.e., greater than 50% inhibition) effect as described in the STAR Methods section. Furthermore, it was required that this criteria were satisfied by independent cell lines with the same pathway disrupted in two parental cell backgrounds ( Figure 2B). After applying these stringent filters to the 2,658 FDA-approved small compounds, 1 hit emerged: TP53 loss sensitized cells to the effects of MK-1775 ( Figures 2B and 2C, Table S10).
To model the power of the screen, we examined the distribution of counts across the entire screen, consisting of 43 RPE1 and 38 MCF10A cell lines and performed an in silico analysis where we reduced the level of each cell line reads by 50% or 90% in a given well (Table S11). This analysis indicated that a 90% reduction in reads would yield a Z-score of À1.5 in 94% of the wells for RPE1 and 79% of the wells for MCF10A. Likewise, a 50% reduction in reads would yield a Z-score of À1.5 in 75% of the wells for RPE1 and 38% of the wells for MCF10A. These results were consistent, and slightly better, than pilot experiments used to determine optimal conditions for this screen (data not shown). Based on these results, we felt the 50% cutoff was reasonable to ensure biological significance while still preserving a reasonable statistical power. Supporting this criteria, the average observed decrease in expression of all wells meeting statistical significance (Z factor < À1.5) was 45% (29%-61% interquartile range).

TP53 deficiency sensitizes to MK-1775 (AZD1775)
MK-1775 demonstrated selective growth inhibition of TP53-deficient cell lines from both RPE1 and MCF10a backgrounds within the primary screen when treated in the low micromolar ranges ( Figure 2C). This result was confirmed in an orthogonal fashion using co-cultures of GFP-(TP53 mutant) and RFP (TP53 wild type)labeled isogenic knockout cell lines ( Figure 2D). MK-1775 is an inhibitor of Wee1, a kinase that controls the G2/M transition (Hirai et al., 2009). Previous studies have indicated that MK-1775 can selectively inhibit the growth of TP53-deficient cells in human cancers in vivo and in vitro in combination with radiation or chemotherapy and MK-1775 is currently in clinical trials for TP53-deficient tumors in combination with chemotherapy or radiation (Center, 2016;Ku et al., 2017;Leijen et al., 2016;Osman et al., 2015;Rajeshkumar et al., 2011). Thus, our result does not represent a new drug discovery but rather represents an unbiased proof of principle for the new assay.

DISCUSSION
The results present above document two aspects of a novel resource for drug screening. First, we describe a panel of highly characterized isogenic cell lines containing single gene knockouts in critical cancer pathways. Second, we describe a multiplex, sequence-based assay that can be used for drug screening.
One important characteristic of our panel is the extensive validation undertaken for candidate knockout cell lines. Each of them had out-of-frame insertions or deletions which could not be ''exon-skipped'' without giving rise to a downstream out-of-frame event. Moreover, all cell lines show a lack of functional RNA or protein products. In total, we derived a panel of 100 well-annotated isogenic cell lines that were validated in this way. Not all of the cell lines have to be included in a drug screen, particularly an initial ll OPEN ACCESS 6 iScience 25, 104437, June 17, 2022 iScience Article one. But the redundancy inherent in the cell lines described here allows rapid confirmation of the activity of a drug identified in an initial screen. The variety of pathways and cellular backgrounds represented in these lines should provide an ideal resource for phenotypic high-throughput screening for a wide range of disease targets.

Limitations of the study
This study was performed in non-cancerous cell lines from tissues of origin that may not best reflect the biology of the targeted cancer pathways. Not all targeted genes were represented by two or more knockouts in every background which may make results for these genes less reproducible than originally desired.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

ACKNOWLEDGMENTS
The results shown in Table S1  iScience Article Personal Genome Diagnostics. SZ has a research agreement with BioMed Valley Discoveries, Inc. CB is a consultant to Depuy-Synthes and Bionaut Labs. The companies named above, as well as other companies, have licensed previously described technologies related to the work described in this paper from Johns Hopkins University. BV, KWK, NP, CB, and SS are inventors on some of these technologies. Licenses to these technologies are or will be associated with equity or royalty payments to the inventors as well as to Johns Hopkins University. Patent applications on the work described in this paper may be filed by Johns Hopkins University. The terms of all these arrangements are being managed by Johns Hopkins University in accordance with its conflict of interest policies. Ling, X., Xie, B., Gao, X., Chang, L., Zheng, W., Chen, H., Huang, Y., Tan, L., Li, M., and Liu, T.  Table S5 Oligonucleotides CRSPR gRNA sequences This paper Table S3 CRPSR target sequencing primers This paper

OPEN ACCESS
The cell line characterized in this study are detailed in Table S5. A subset of the lines were validated and banked for distribution including 100 lines targeting the 19 critical cancer pathways (Table S7) and 8 lines where control non-cancer pathway were targeted (Table S8). Of the non-tumor suppressor genes targeted, MTAP is of particular interest and represented by multiple lines because it is frequently co-deleted with CDKN2A making it passenger mutation targetable for therapeutic benefit in human cancers (Mavrakis et al., 2016).

CRISPR-Cas9
Integrated CRISPR-Cas9 gRNAs were designed using Chop-Chop based upon common mutations sites identified in COSMIC (Montague et al., 2014). Each gene was targeted with 6-12 gRNAs (Table S1). gRNAs were ordered from IDT Technologies (Iowa, USA) with the addition of ligation sequences: caccgNNNNNNNNNNNNNNNNNNNN and aaacNNNNNNNNNNNNNNNNNNNNc. gRNAs were ligated into the LeniCRISPR V2 plasmid (Addgene, Massachusetts, USA, Cat #52961) using previously published protocol (Sanjana et al., 2014). CRISPR/Cas9 plasmid was virally transduced into cells using Lenti-X Packaging Single Shots (VSV-G) using manufacturer's instructions (Clontech, California, USA, Cat #631275). See Table S3 for list of all gRNAs utilized. Any target genes which appear in the gRNA list but do not have a corresponding cell line clone associated with it were unable to be obtained. This was due to a variety of reasons, including but not limited to loss of the gene being lethal.

Mutation detection and analysis
DNA was extracted from cells using Quick Extract (Lucigen, Wisconsin, USA, Cat #QE09050) and amplified using primer pairs listed in Table S4 designed to amplify 66-80 base pair segments containing the predicted cut site for each of our gRNAs listed in Table of gRNAs. Primer sets were designed for the SafeSeqS application, and were sequenced on an Illumina MiSeq and analyzed as previously described (Cohen et al., 2018). iScience Article backgrounds and 19 critical cancer pathways at two doses: 1mM and 10 mM. Cells were plated in the morning and treated with the compound libraries in the evening of day 1. Plates were harvested on day 4 and molecular barcodes identifying each cell line were quantified by high-throughput sequencing.

Western Ab information
Sequencing using barcoded forward primer including Illumina primer sequence, N14, plate barcode, and LentiV2 sequence (AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGAT CT NNNNNNNNNNNNNNBARCODESTGTGGAAAGGACGAAACACC). Reverse primer includes Illumina primer sequence, well barcode, spacer, and a LentiV2 sequence (CAAGCAGAAGACGGCATACG AGATBARCODESNNCGGACTAGCCTTATTTTAACTTGC). This amplicon requires 96 reverse primers and 1 forward to amplify and uniquely identify each well in a 96-well plate ( Figure S4). In total 25 plate and 192 well barcodes were designed and verified. These primers were used to amplify cell pools after DNA extraction using Quick Extract (Lucigen, Wisconsin, USA, Cat #QE09050). Amplified reads were sequenced on either an Illumina MiSeq or Illumina HiSeq 2500.
Screen controls were scored by evaluating the ratio of unique identifier (UID) reads matching a single cell line in each staurosporin treated well to the cell lines respective UID reads from the untreated DMSO wells in the same screen plate. This was performed for each knockout line within each screen plate. To assess cell line representation and performance in each treated well, we calculated the z-score for each cell line's fraction of reads within a single well to compare cell line abundance in drug treated wells and compared to the 95 other wells within the same plate. We assume the null hypothesis -for any given compound, it will not have a specific interaction with our gene of interest. Compounds in our screen with UIDs less than two-fold more unique molecular barcodes than the non-specific small molecule control within the same plate, staurosporine, were classified as non-specific cell killing. We did not consider compounds from these wells in our analysis.
To determine the z-score threshold we looked at representation of each cell line in the DMSO treated control wells and down-sampled the sequencing of these well to 0 in increments of 10%, calculating the z-score at each increment to determine the power to see each cell line. This was determined for each plate in our screen and thresholds determined from the 3 rd quartile, the lowest 25% of plates based on cell line representation within the plate. Based on this in silico calculated 3 rd quartile z-score we classified cell lines as well powered, and used a cutoff of À1.5, or low powered, and used a cutoff of À1.0 to identify compounds of interest. Based on this classification, a maximum z-score threshold was set for each cell line either À1.5 for well powered or À1 for low powered. All compound-cell line z-scores below these thresholds were considered compounds of interest. For a compound to be considered a hit, the majority of cell lines in two cell line backgrounds would need to identify it as a compound of interest. Applying these criteria to DMSO wells showed that 0.11% of controls wells met the hit criteria.