CRISPR-Cas9 for selective targeting of somatic mutations in pancreatic cancers

Abstract Somatic mutations are desirable targets for selective elimination of cancer, yet most are found within noncoding regions. We have adapted the CRISPR-Cas9 gene editing tool as a novel, cancer-specific killing strategy by targeting the subset of somatic mutations that create protospacer adjacent motifs (PAMs), which have evolutionally allowed bacterial cells to distinguish between self and non-self DNA for Cas9-induced double strand breaks. Whole genome sequencing (WGS) of paired tumor minus normal (T-N) samples from three pancreatic cancer patients (Panc480, Panc504, and Panc1002) showed an average of 417 somatic PAMs per tumor produced from single base substitutions. Further analyses of 591 paired T-N samples from The International Cancer Genome Consortium found medians of ∼455 somatic PAMs per tumor in pancreatic, ∼2800 in lung, and ∼3200 in esophageal cancer cohorts. Finally, we demonstrated 69–99% selective cell death of three targeted pancreatic cancer cell lines using 4–9 sgRNAs designed using the somatic PAM discovery approach. We also showed no off-target activity from these tumor-specific sgRNAs in either the patient's normal cells or an irrelevant cancer using WGS. This study demonstrates the potential of CRISPR-Cas9 as a novel and selective anti-cancer strategy, and supports the genetic targeting of adult cancers.


Introduction
Somatic mutations accumulate as we age and are clonally propagated from the cancer initiating cell to all neoplastic daughter cells ( 1 ).These somatic mutations genetically define the malignant cell population as unique and can be exploited as therapeutic targets.Indeed, most targeted therapies focus on mutations within coding regions, as drugs and vaccines have been designed to target the mutated proteins or produce synthetic lethality.The targeted mutations are commonly driver mutations or mutations with known roles in carcinogenesis, inevitably limiting the number of targetable mutations available ( 2 ).Meanwhile, most mutations in cancers are found within noncoding regions which make up 98% of the human genome ( 3 ).Most of these tumor-specific 'passenger mutations' are traditionally thought to be inconsequential to cell fitness or haven't had their functions elucidated, and therefore historically lack therapeutic value ( 4 ).This prompted us to develop a strategy to turn this vast number of mutations into targets of therapeutic importance, independent of their individual genetic functions.
CRISPR-Cas9 is a programmable endonuclease or 'molecular scissor' (5)(6)(7).It can be designed to induce double strand breaks (DSBs) at desired locations in the human genome, and in theory, can be highly selective.CRISPR-Cas9 has been celebrated as a revolutionary gene editing tool for both research and therapeutic purposes, with the recent regulatory approvals of Casgevy by both US FDA and UK MHRA to treat sickle cell disease ex vivo as the most prominent example of its disease-altering potential ( 8 , 9 ).Currently, CRISPR -Cas9 gene editing therapies approved or under clinical trials work by either disrupting the expression of an existing gene ( 8 , 10 , 11 ), correcting mutation (NCT04774536), or for engineering CAR-T cells ( 12 ).The utility of CRISPR-Cas9 as a direct cell killing tool has yet to be widely explored.
Our approach is inspired by the bacterial adaptive immune system, in which protospacer adjacent motifs (PAMs) have been evolutionarily selected to differentiate between host and viral DNA, that otherwise contain the exact same sequence, to selectively eliminate the pathogen while leaving the bacterial host intact ( 28 ).The 3-nucleotide 5 -NGG-3 PAM sequence, recognized by S. pyogenes Cas9, serves as a binding signal, where Cas9 neither binds nor cuts the target in its absence ( 29 ,30 ), significantly decreasing the risk of off-targeting.In an analogous way, CRISPR-Cas9 holds the potential to distinguish between cancer cells containing novel PAMs from somatic mutations and the patient's normal cells (lacking the PAM sequence) for selective cancer killing.
We therefore determined the type of somatic mutations in cancers that would create the largest number of novel CRISPR-Cas9 target sites and developed a bioinformatics pipeline to identify single base substitutions (SBSs) that formed novel, cancer -specific P AMs.We identified hundreds to thousands of somatic PAMs in different solid tumor types, and these somatic PAMs could serve as cancer-specific targets to selectively kill the malignant cell population.

Ethical approval
Human biospecimen collection and use were approved by The Johns Hopkins Medicine Institutional Review Boards (JHM-IRB2, NA_00045283) for whole genome sequencing of previously de-identified patient samples.
PAMfinder (perl) was written to process variant call files (VCFs) based on their genome builds (hg38) to identify somatic variants that produced novel PAMs.Tumor (arrayT) and normal (arrayN) were specified based on column number, read depth was set at 18X ( 34 ), and variant allele frequency (VAF) cutoff was modified based on the tumor purity (30% cutoff for 100% tumor purity).30% cutoff was used to filter out mutations that could arise from in vitro cultures or subclonal populations.The 5 and 3 genomic sequences flanking the somatic variants were obtained from the FASTA of individual chromosomes to inspect whether novel Cs were adjacent to an existing C or novel Gs were adjacent to an existing G.The output contained information about the somatic variant, the potential sgRNA sequence along with the novel PAM, and specified whether the novel PAM was located on the plus or minus strand of the genome.
Somatic mutations with VAF > 95% were chosen to put through CRISPOR ( 35 ).Mutations that produced sgRNAs with > 50 specificity score in CRISPOR were subsequently validated by PCR and Sanger sequencing ( Primers Table S1 ).

Somatic PAM discovery on ICGC samples
VCFs containing raw SNV calls generated via the GATK Mu-tect2 variant calling workflow were downloaded from the ICGC-ARGO Data Portal ( 36 ).These VCFs were sourced from four projects: APGI-AU (Australian Pancreatic Cancer Genome Initiative; N = 44), LUCA-KR (Personalised Genomic Characterisation of Korean Lung Cancers; N = 29), PAC A-C A (Pancreatic Cancer Harmonized 'Omics' analysis for Personalized Treatment; N = 130), and OCCAMS-GB (Oesophageal Cancer Clinical and Molecular Stratification; N = 388).To correct for tumor purity, we used the VCF for each sample, comprising variants for tumor and normal tissue, as input to the R package FACETS ( 37 ) to calculate tumor purity.Then, we used both the VCF and the tumor purity data as input to PAMfinder to identify base substitutions that produced novel PAMs.The VAF cutoff for somatic PAM calling was % tumor purity × 30% (e.g. if the tumor purity was 50%, the VAF cutoff for somatic PAM calling was 15%).Finally, % novel PAM was calculated by dividing the number of novel PAM by the total number of base substitutions.

Computing mutational profiles and COSMIC signature contribution
MutationalPatterns R package was used to calculate SBS mutational profiles for each VCF file and to compute the contribution of each of 60 COSMIC signatures to these profiles ( 38 ,39 ).The 12 displayed COSMIC signatures represented the 12 most frequently appearing amongst the top five signatures across all samples, when signatures were ordered from the greatest contribution to the least contribution within each sample.

Multitarget sgRNA design
Chromosome range was entered into CRISPOR ( 35 ) 2kb at a time starting at chr1:0-2000 and ending at chr1:100248000-100250000 based on hg38.sgRNAs that had different number of perfect target sites were selected from the pool of sgRNA options generated by CRISPOR based on the following criteria: (i) none of the perfect target sites and potential off-target sites target exons; (ii) Doench'16 efficiency score > 50% and (iii) the number of off-targets that have no mismatches in the 12 bp adjacent to the PAM (seed region) is < 10 ( 25 ).Sequences of non-targeting control sgRNAs were obtained from Doench et al. (NT) and Chiou et al. (NT2) ( 25 ,40 ).Positive control sgRNAs were designed by inserting LINE-1 and Alu element sequences to CRISPOR.

Clonogenicity and cell viability assay
Cas9-expressing PC cells were transduced with lentivirus at MOI10.The cells were split into 96-well plates in 1:1000 dilution for clonogenic survival.When cells in non-targeting controls reached full confluence (1-2 months), colony counting was performed using phase microscopy and alamarBlue Cell Viability Reagent (ThermoFisher) was added according to manufacturer's instruction.BMG POLARstar Optima microplate reader was used for fluorescence reading to assess cell viability.Excitation was set at 544 nm and emission at 590 nm, with a gain of 1000 and required value of 90%.

Co-culture assays
Parental cells and / or cells that expressed either mApple or mNeon-Green fluorophores were co-cultured at different ratios under antibiotic selection.Puromycin selection was for 7 days at 1 ug / ml and hygromycin selection was for 14 days at 200 ug / ml.Proportions of mApple-expressing cells (Ex / Em: 561 / 620 nm) and / or mNeon-Green-expressing cells (Ex / Em: 488 / 530 nm) post-transduction of sgRNAs were measured and analyzed at different time points using Attune NxT Flow Cytometer (ThermoFisher) and FCS Express 7 (De Novo Software).

Mouse-human NGS assay
The RC3H2 gene was selected as the mouse and human orthologs as they differ by a 3 bp indel followed by 3 SBSs ( 41 ).Primers for unbiased PCR amplification of the locus in mouse and human DNA were previously developed by Lin e. al ., des-ignated as primer pair 45 ( 41 ).For this assay, a 101bp amplicon in the RC3H2 gene was amplified with primers containing Illumina adaptor sequences ( Primers Table S2 ).Amplicons were subjected to NGS (see Supplementary Methods ), and FASTQ files were aligned to the hg19 genome using bwa 0.7.17 ( 42 ) and visualized in IGV.Human and mouse reads were quantified since the 3bp-shorter mouse sequence maps as a deletion in the human genome.For assay validation, mouse DNA and human DNA were mixed at varying ratios and assayed.

γH2A.X staining and imaging
Two PC cell lines, TS0111 and Panc10.05,were seeded at density of 2 × 10 5 in each well of a six well plate with coverslips on the bottom.Cells were then transduced with pool of lentivirus containing either non targeting sgRNA (NT) or 4, 7 or 9 target TS0111-specific sgRNAs at MOI 100.Two days after transduction, cells were fixed with 4% paraformaldehyde for 15 mins.Cells were subsequently blocked and permeabilized with 5% BSA / 0.5% Triton X-100 in PBS for 30 mins at room temperature followed by an overnight incubation with anti-phospho-Histone H2A.X (Ser139) antibody (# 05-636, Sigma Aldrich) at 1:1000 dilution.Cells were then washed thrice with 1 × PBS for 5 min and subsequently incubated with secondary antibody conjugated with Alexa Fluor 594 (# A-11032, ThermoFisher Scientific) for 1 h at room temperature.Cells were again washed thrice with 1 × PBS and then counterstained with DAPI containing mounting medium (# H-1800, VectorLabs).Stained cells were imaged using fluorescence microscope (# EVOS M7000, TFS) and γH2A.X foci were quantified using Fiji software ( 43 ).

Functional testing of multiplex sgRNA expression plasmids
The targeted cell line and non-targeted cell line(s) were transduced at MOI 10 with lentivirus expressing the multiplex vectors.14-21 days after transduction, cells were harvested and genomic DNA extracted using QIAmp UCP DNA Micro kit (QIAGEN).The targeted loci were PCR amplified with NGS adaptors and sent for amplicon sequencing ( Primers Table S1 'Panc480 mutation validation' & 3, see Supplementary Methods ).The sequencing data was analyzed for the percent of edited reads by CRISPResso2 ( 44 ).

WGS analyses of potential off-target sites
Two replicates of Panc480 Cas9-expressing cell pellets and one replicate of Panc1002 Cas9-expressing cell pellet from the Panc480-MT7 functional testing assay were used for this analysis.Cells were harvested 14 days after transduction (T14) or the day of transduction (T0).DNA was whole genome sequenced and FASTQ files were aligned to hg19 using bwa v0.7.7 ( 31 ) to create BAM files.Picard-tools1.119 ( RRID: SCR _ 006525 ) was used to add read groups as well as to remove duplicate reads.GATK v3.6.0 ( RRID:SCR _ 001876 , ( 32 )) base call recalibration steps were used to create a final alignment file.
sgRNA sequences were put through Cas-OFFinder ( 45 ) to identify potential off-target sites including ones with noncanonical NAG PAM and ones with 1-4 mismatches.sgRNA sequences were also uploaded to IDT CRISPR-Cas9 gRNA checker ( 46 ) to obtain a second list of potential off-target sites.
Then, we examined each site on IGV for Cas9-induced mutation signatures.
MuTect2 v3.6.0 ( 32 ) was used to call somatic variants between the T14-T0 pairs.The default parameters and SnpEff (v4.1) ( 33 ) were used to annotate the passed variant calls.From the list of variants generated, we performed homology analysis with an R script that performed the following steps: (i) read in an Excel file containing one mutation per row; (ii) obtain the forward and reverse strand sequences from the hg19 genome between the start -50 bp and stop +50 bp positions of the locus; (iii) align each locus's forward and reverse sequences to the target sgRNA with no gaps using the Smith-Waterman algorithm; (iv) determine the number of mismatches between the sgRNA and the nearest matching piece of DNA within each junctions; (v) output the original information along with new columns displaying the mismatches between each junction and the sgRNA into a new Excel file.EMBOSS Needle was used to illustrate the alignments between the sgRNA sequences and the potential target sequences with the lowest number of mismatches ( 47 ).

Development of PAM discovery approach
We tested two approaches with the potential leading to highly selective target cell killing with minimal off-target risk.As pancreatic cancer (PC) is one of the most lethal cancers with a dismal five-year survival rate of only 12.8% ( 48 ), we chose PC as the model to test our hypothesis.We previously generated primary PC cell lines and their corresponding normal cell lines from three PC patients (Panc480, Panc504 and Panc1002; Supplementary Table S1 ) ( 49 ) and performed whole genome sequencing (WGS).We then performed tumor-normal (T-N) subtraction to identify somatic mutations.All three PC samples harbored deleterious mutations in KRAS , CDKN2A , SMAD4 and TP53 , which are the most common driver mutations in PCs ( Supplementary Table S1 ).
We first considered structural variants (SVs) as they could juxtapose a new target DNA sequence next to an existing NGG PAM ( Supplementary Figure S1 A, B).This could theoretically decrease the risk of off-target effects, as the resulting breakpoint is significantly different from the original sequence ( Supplementary Figure S1 C).We discovered an average of 35 SVs per cell line by comparing tumor to normal, and validated 84.9% of them by PCR amplification across the breakpoint and Sanger sequencing ( Supplementary Table S2 , Supplementary Figure S1 C).We found an average of 22 novel SVs juxtaposed next to an existing PAM per cell line ( Supplementary Table S2 ).Using our sgRNA selection criteria (see Supplementary Methods ), we obtained an average of 17 sgRNAs per cell line with minimal off-targeting risk ( Supplementary Table S2 ).
Next, we attempted to discover novel PAMs created from SBSs (Figure 1 A, B).A somatic NGG PAM can arise through a SBS that creates a novel G from A / T / C, and this novel G is adjacent to an existing G immediately upstream or downstream of the new G (Figure 1 A-B).The same concept applies to the complementary strand (5 -CCN-3 ).Our PC samples harbored mutational signatures that produced novel Cs and Gs (Figure 1 C), with the most common signatures being SBS1, 5 and 40 ( Supplementary Figure S2 ).These were all clock-like signatures ( 38 , 50 , 51 ), suggesting that aging itself gave rise to novel PAMs.We then developed a program, PAMfinder, to discover the subset of somatic base substitutions that produced novel PAMs in a given tumor sample.
We identified an average of 4548 somatic SBSs per sample, of which 9.2% created somatic PAMs (mean = 417; Figure 1 D, Supplementary Table S3 ).A VAF cutoff of 30% was used to exclude mutations that might be subclonal or have arisen through in vitro culture of these cell lines.We selected novel PAMs with VAFs > 95% (mean = 63, 15%) for initial functional testing of sgRNAs as targeting them should produce the highest toxicity.Of these, we were able to design an average of 33 sgRNAs, one for each mutation, that had minimal risk for off-target activity (see Methods, Figure 1 D, Supplementary Table S3 ).We confirmed all qualifying mutations, except two, through Sanger sequencing ( Supplementary Table S3 ).A similar approach using whole exome sequencing (WES) data failed to yield sufficient targets (mean = 1; Supplementary Figure S4 ).This was because the majority of cancer-specific PAMs were located in noncoding regions, with 64.4% of somatic PAMs located in intergenic regions, 28.1% in introns, 0.5% in exons, and the remaining 7.0% in other regions such as noncoding RNAs (Figure 1 E).Thus, we concluded that the WGS-based PAM discovery approach using SBSs was more productive than the SV and WES approaches, and provided hundreds of novel PAMs per cancer as potential CRISPR-Cas9 target sites.

High prevalence of novel PAMs in different tumor types
To determine whether this large number of PAMs also existed in other tumor types, we obtained T-N VCFs from the ICGC Data Portal, including pancreatic (APGI-AU and PAC A-C A), lung (LUCA-KR), and esophageal cancer (OCCAMS-GB) samples ( 52 ).We performed tumor purity correction and analyzed the data using PAMfinder (Figure 2 A, Supplementary Table S5 ).Overall, we found that the number of base substitutions and number of somatic PAMs from the two PC cohorts, APGI-AU ( N = 44) and PAC A-C A ( N = 130), were comparable to findings from our discovery PC lines, in which a median of 478.5 and 430.5 somatic PAMs per tumor were identified, respectively (Figure 2 B, C, Supplementary Tables S3  and S5 ).Regarding the 29 lung cancer samples and 388 esophageal cancer samples, the number of PAMs identified was > 5-fold higher than that of PCs, with medians of 2790 and 3235.5 somatic PAMs per tumor, respectively (Figure 2 C, Supplementary Table S5 ).Since the number of base substitutions were also higher in lung cancers (median = 30 553) and esophageal cancers (median = 20 106) compared to PCs (median = 5890.5 and 5354.5),these results suggested that tissue-specific factors, such as environmental mutagens, contributed to the varying number of mutations present (Figure 2 B, Supplementary Table S5 ).
While the proportions of base substitutions that gave rise to somatic PAMs (% novel PAM) were similar among PCs (median = 8.9% and 8.4%) and lung cancers (median = 8.5%),   esophageal cancers had significantly higher % novel PAM at 16.1% (interquartile range = 12.3-20.5%;P < 0.0001; Figure 2 D, Supplementary Table S5 ).To investigate the potential mechanism contributing to the higher % novel PAM, we performed mutational signature analysis on all samples.We found that the two sets of PC samples showed similar top ranked mutational signatures that were consistent with our discovery PC cell lines (SBS1 and SBS40; Figure 2 E, Supplementary Figure S2 ).The top mutational signatures for lung cancers, SBS4 and SBS92, were associated with tobacco smoking, in which SBS92 had a predominant T > C mutation signature (Figure 2 E) ( 50 , 53 , 54 ).Notably, the top ranked mutational signature of esophageal cancers, SBS17b, distinguished itself from the other tumor types (Figure 2 E), consistent with published studies with these samples ( 55 ,56 ).It was characterized primarily by a T > G transversion with an unknown etiology, but previous studies had associated it with fluorouracil (5FU) treatment and possibly damage by reactive oxygen species ( 55 ,57 ).Based on our analyses of different large tumor cohorts, we concluded that somatic base substitutions yielded hundreds, if not thousands, of novel PAMs in each tumor, and these findings were tissue, and potentially, treatment-dependent.This supported the potential generalizability of this application.

Selective and specific cell killing with CRISPR-Cas9
To estimate the number of sgRNAs required to generate significant cytotoxicity in PC cells, we designed sgRNAs with increasing number of target sites in the human genome (designated as multitarget sgRNAs), transduced them into two PC cell lines (Panc10.05 and TS0111), and performed clonogenicity assays ( Supplementary Table S6 ).All perfect target sites and potential off-target sites of the multitarget sgR-NAs were located in the noncoding regions of the human genome to prevent gene essentiality-linked cytotoxicity from confounding our interpretations.We found that clonogenic growth inhibition increased with the number of sgRNA target sites (Figure 3 A, Supplementary Figure S3 ).The 3-target sgRNA, 52F(3), exhibited > 70% growth inhibition in both cell lines, suggesting that a few DSBs could produce significant cytotoxicity.The 12-and 14-target sgRNAs, 230F (12) and 164R( 14), displayed > 90% growth inhibition, comparable to the positive control sgRNAs targeting LINE-1 and Alu elements in the human genome.
To demonstrate proof-of-concept selective toxicity induced by CRISPR-Cas9, we generated Cas9-expressing mouse (NIH3T3 and Panc02) and human PC (TS0111 and Panc10.05)cell lines with confirmed Cas9 activity ( Supplementary Figure S4 A-B), established mouse-human cell line co-cultures, and transduced them with a multitarget sgRNA that targets 12 sites in the human genome (230F( 12)) but none in the mouse genome ( Supplementary Table S7 ).Using both flow cytometry and a mouse-human next generation sequencing (NGS) assay ( Supplementary Figure S4 C-D), we saw > 95% reduction of the human PC cells in different cocultures (Figure 3 B, Supplementary Figure S4 E-F).Humanspecific cell killing was dependent on both functional Cas9 and the human-specific sgRNA ( Supplementary Figure S4 E-G), showing that CRISPR-Cas9 is capable of selectively eliminating cancer cells.
To determine the specificity of DSBs induced by CRISPR-Cas9 in conjunction with cell line-specific sgRNAs targeting different numbers of mutations, we quantified γH2A.X foci after transduction of sgRNA-expressing vectors.Using PAMfinder, we identified a list of TS0111-specific sgRNAs that had > 95% VAF in the cell line and analyzed them to identify sgRNAs that had minimal potential off-target activity (see Materials and methods).We selected 9 from the sgRNA list and cloned each of them individually into a sgRNA expression vector ( Supplementary Table S8 ).We transduced different number of sgRNAs (4, 7 and 9) into TS0111 cells (target cell line) and Panc10.05 cells (negative control) as lentivirus pool.We demonstrated increased γH2A.X foci with increased sgRNA targets in the target cell line (TS0111), and 7 or 9 but not 4 TS0111-specific sgRNAs displayed a significant increase in the number of γH2A.X foci compared to TS0111 cells transfected with non-targeting sgRNAs (4 sgRNAs: P = 0.079 ; 7 sgRNAs: P = 0.008 ; 9 sgRNAs: P < 0.0001; Figure 3 C, D).As expected, transduction of TS0111-specific sgRNAs into Panc10.05cells serving as a negative control did not induce a significant increase in γH2A.X foci (4 sgRNAs: P = 0.693; 7 sgRNAs: P = 0.752; 9 sgRNAs: P = 0.238; Figure 3 E, Supplementary Figure S5 ), thus demonstrating the specificity of our system.Transduction of AGGn (sgRNA targeting AGG repetitive element) into both cell lines resulted in large number of γH2A.X foci that were too abundant to quantify (Figure 3 C, Supplementary Figure S5 ).Finally, we tested the hypothesis that we could selectively target a patient's cancer by treating with sgRNAs designed from our somatic PAM discovery approach.Using PAMfinder, we identified a list of Panc10.05-specificsgRNAs that had > 95% VAF in the cell line and analyzed them to identify sgRNAs that had high cutting efficiency and minimal potential off-target activity (see Methods).We then cloned four of them into a multiplex sgRNA expression vector that expressed four sgRNAs simultaneously (Panc10.05quad) and, in parallel, cloned four non-targeting sgRNAs into a second multiplex expression vector (NT quad) as negative control ( Supplementary Figure S6 A, Supplementary Table S9 ).The target sites of these Panc10.05-specificsgRNAs were located in either introns or intergenic regions, and deep sequencing showed that these sgRNAs induced mutations in Panc10.05 and not in the non-target cell line ( Supplementary Figure S6 B, Supplementary Table S9 ).We then transduced the quads into co-cultures of Panc10.05 and TS0111, and observed an average of 86% selective reduction of Panc10.05 cells relative to negative control 21 days after transduction (Figure 4 A,  B).We also performed a clonogenic survival assay and observed a 58% growth inhibition with the Panc10.05quad ( Supplementary Figure S6 C).Additionally, we cloned each of the sgRNA sequences into a sgRNA expression vector and transduced them as a pool of 4 sgRNAs into the same co-culture setting, and observed an average of 74% selective reduction of Panc10.05 cells 21 days after transduction ( Supplementary Figure S6 D, E, Supplementary Table S9 ).Pool of positive control sgRNAs that was non-selective ( Supplementary Table S9 ) generated > 90% growth inhibition in both cell lines (data not shown).We demonstrated that selective reduction could be achieved with two different experimental designs delivering the same set of cell line-specific sgRNAs, indicating that the selective elimination was indeed caused by the Panc10.05-specificsgRNAs.
To examine Cas9 selectivity in a second PC cell line, TS0111, we chose 27 sgRNA sequences that had > 95% VAF in the cell line and cloned each of them individually into a sgRNA expression vector ( Supplementary Table S8 ).We then transduced three separate pools of sgRNAs, 9 sgR-NAs per pool ( Supplementary Table S8 ), into Panc10.05-TS0111co-cultures.We found that > 83% selective reduction of TS0111 was achieved by day 14, and > 88% by day 21 (Figure 4 C, D, Supplementary Figure S7 A).A positive control sgRNA (ALU_112a) that was non-selective (Figure 3 A) generated > 99% growth inhibition in both cell lines (data not shown).
Furthermore, we cloned 12 sgRNA sequences that had > 95% VAF in the Panc480 cell line into a sgRNA expression vector individually, and transduced three separate pools of sgRNAs, 4 sgRNAs per pool, into TS0111-Panc480 cocultures ( Supplementary Table S10 ).We found that > 68% selective reduction of Panc480 was achieved by day 21 (Fig- ure 4 E, F, Supplementary Figure S7 B), and the pool of positive control sgRNAs generated > 90% growth inhibition in both cell lines (data not shown).
Altogether, our results demonstrated that 4-9 sgRNAs, designed using our PAM discovery approach, were selectively toxic against targeted cells and produced significant cell death.

Absence of off-target activity by patient-specific sgRNAs
We selected 7 of the 13 targets that we identified in Panc480 using our PAM discovery approach, confirmed functional targeting of individual sgRNAs, and cloned the corresponding sgRNAs into a multiplex sgRNA expression vector that expressed 7 sgRNAs simultaneously (designated as Panc480-MT7; Figure 5 A; Supplementary Table S11 ).Cells were harvested for deep sequencing at the targeted loci 14 days after transduction of Panc480-MT7.We detected cutting activity of all 7 sgRNAs in Panc480 Cas9-expressing cells, but not in its controls (Panc480 parental and Panc1002 Cas9expressing cells) and corresponding normal (lymphoblasts) cells from the patient (Panc480-N; Figure 5 B).To investigate potential off-target activity, we performed WGS on DNA extracted from Panc480 and Panc1002 Cas9-expressing cells 14 days post transduction (T14).Using two different programs, Cas-OFFinder ( 45 ) and IDT gRNA checker ( 46 ), we generated a list of potential sgRNA off-target sites of Panc480-MT7 and examined them on the WGS data.We found no evidence of Cas9 activity at these off-target sites (data not shown).
As an additional assessment of potential off-target activity, we used a somatic variant caller, MuTect2 ( 58 ), to identify novel indels present at 14 days after transduction and performed sequence alignment and homology comparison with the sgRNA sequences ( 59 ).We found that the indels novel to T14 did not exhibit significant homology to any of the 7 sgRNAs in Panc480-MT7 (Figure 5 C, D; Table 1 ).The lowest possible mismatches were 4bp from the chr8:29032916 sgRNA and 5bp from the chr3:59525282 sgRNA (Table 1 ), each with one occurrence only, and were unlikely to be targeted by the sgRNA as they were located at trinucleotide and dinucleotide repeat regions, respectively (data not shown).Novel indels at non-repetitive regions, with two examples included (Figure 5 C, D), were also unlikely to be targeted by Panc480-MT7 due to either absence of PAM, the distance between the potential sgRNA binding site and the indel, and / or the high number of mismatches, which have not been reported for Cas9 activity ( 24 , 26 , 60 ).These indels, present at low VAF, likely represent sequencing errors at repetitive regions, background heterogeneity in a bulk cell population, or ongoing genomic instability.Nonetheless, our results showed that these sgRNAs were highly specific to the intended targets in the targeted cell line and had no obvious off-target activity.

Discussion
We present an efficient, cancer-specific PAM discovery approach that selectively kills cancer cells.We discovered that PCs, which generally have low mutational burden, contained > 400 somatic PAMs as candidates for CRISPR-Cas9 targeting, significantly expanding the repertoire of targetable mutations in a given solid tumor.Since single-base mutations increase as a function of age ( 1 ,52 ), one could hypothesize that adult solid tumors, in general, should contain hundreds of novel PAMs for subsequent selection of sgRNAs and targeting.This was supported by our analyses on ICGC data which discovered hundreds to thousands of novel PAMs per tumor.
Our mutational signature analyses also revealed aging signatures in most tumors and additional tissue-dependent factors, likely environmental, increasing the number of somatic PAMs.While it is conceivable that pediatric tumors might not contain as many somatic PAMs as adult cancers, we found that significant toxicity could be achieved with < 10 sgRNAs, providing evidence that only a few sgRNAs would be needed to achieve selective killing.Strategies to enhance the inhibitory effect would be essential to broaden the applicability of this approach, especially in pediatric patients.For example, incorporating a DSB repair inhibitor increased the toxicity of the sgRNAs treated ( Supplementary Figure S8 ).Combining our approach with existing therapies, such as immunotherapy and / or chemotherapy, might also be considered for synergistic effect.
As our strategy exploits the vast number of novel PAMs located in noncoding regions, it requires WGS analyses of both tumor and normal genomes.Most published data only includes exome sequencing, and obtaining corresponding normal for each tumor sample is still not common.WGS analyses are more computationally demanding than for WES, but the exponential decrease in sequencing costs ( 61 ) and enhancement of computing power have made these issues less concerning ( 62 ).As very few sgRNAs were needed to produce significant toxicity, one could potentially discover sufficient targetable PAMs from standard clinical workflow involving solid tumor panel sequencing.Molecular testing of pancreatic ductal adenocarcinoma (PDAC) can be challenging due to their low tumor cellularity commonly < 30% since stromal cells predominate ( 63 ).The ability to detect a given mutation in a low cellularity tumor increases as a function of: (i) the number of genomic DNA equivalents added to the reaction, functionally defined, (ii) the depth of coverage achieved during sequencing and (iii) the desired sensitivity ( 64 ).To circumvent low tumor cellularity, we produced patient-specific cell lines, either directly from the resected sample or by first generating organoids ( 49 , 65 , 66 ).Generating cell lines from primary tumor samples and corresponding normal tissues for sgRNA screening can be challenging; however, this process has become much more routine with the advent of organoids ( 67 ).We have generated many cell lines from tumors and normal tissues of PCs, providing us with unique opportunities to study them extensively in vitro ( 49 ,68-70 ).The somatic mutations detected are confirmed in the primary tumor at high depth of coverage prior to targeting them with sgRNAs.
Our results indicate a high specificity of the CRISPR-Cas9 system for targeting selected sequences in the genome.Importantly, we observed a continuous increase in the number of DSBs with increasing number of targeting sgRNAs in the target cell line but not in the negative control cell line (Figure 3 D, E), demonstrating the efficacy and specificity of our system.Although the number of foci detected appeared to be lower than the number of sgRNAs transduced, this could be explained by the fact that the sgRNAs were transduced as a lentivirus pool and there was no selectable marker to ensure that all cells received all 4, 7 or 9 sgRNAs in a pooled transduction setting.
Delivery of CRISPR-Cas therapeutics continues to be an active work in progress, with in vivo efficacy already demonstrated in two different clinical settings and more delivery strategies under rapid development ( 10 , 11 , 71 , 72 ).Depending on the target cell type and location, various delivery methods (e.g.mRNA, RNP) and modalities (e.g.microinjection, AAVs, non-viral vectors) have to be explored for different cancer types.Our main focus for future experiments are to test our patient-specific sgRNAs in vivo using xenograft mod- a WGS analyses were performed for T14s.For each indel detected by Mutect2 ( 58), the original sequence on the reference genome was compared to the sgRNA sequence to determine the homology between both using the Smith-Waterman algorithm (see Methods).The lowest number of sequence mismatches was shown.On-target mutation was excluded in this analysis as results had been shown in Figure 4 B using a deep sequencing assay.
b Only one mutation existed with the stated number of mismatches.
els.We are currently collaborating with nanoparticle experts to identify nanoparticle formulations that have high transfection efficiency for PCs in vivo , followed by testing different delivery methods (e.g.Cas9 mRNA + sgRNA, Cas9 protein + sgRNA) of our system to determine the optimal mode of delivery.Nanoparticles have been shown to improve the efficacy of anticancer agents in pancreatic cancers in both preclinical studies and clinical trials ( 73 ,74 ).To improve the targeting specificity of nanoparticles, we would have to consider conjugating the nanoparticles with antibodies or aptamers that target the biomarkers of pancreatic cancers in order to increase the transfection efficiency in pancreatic tumors and observe treatment response ( 75 ,76 ).As pancreatic tumors tend to have high stroma content, combination strategies with therapies targeted against the tumor microenvironment could also be employed to increase targeting efficiency of the nanoparticles ( 73 ).We also have to demonstrate safety of our system in vivo by studying the potential immunogenicity , toxicity , and off-targeting of our system in healthy cells, which could be done through intravenous injection of the nanoparticles.
Off-target activity of CRISPR-Cas9 has been a general concern (77)(78)(79)(80), with tolerance of up to five mismatches reported in the literature ( 24 ).We employed rigorous approaches to address this concern by performing WGS to examine for offtarget activity of the sgRNAs under long term expression (14 days).We inspected potential off-target sites generated by two different sgRNA programs, complemented with pairwise alignments on mutations identified by a somatic variant caller against the sgRNA sequences used.Although we didn't detect any off-target activity, we recognized that the level of detection using WGS would not be as sensitive as targeted deep sequencing.As activity at one-mismatch sites was more commonly reported in the literature compared to 2-5 mismatches ( 25 ,26 ), we intentionally selected cell line-specific sgRNAs that had no 1-mismatch sites in the human genome for our co-culture assays.Meanwhile, strategies to mitigate off-target effects continue to evolve, with various approaches (such as improved sgRNA design, enhanced nuclease fidelity, and limiting exposure time of targeted cells to CRISPR-Cas9) have been proposed ( 25 ,80-82 ).
Barriers exist before the approach we report could be implemented clinically, in addition to those discussed above.We will need to further increase the cytotoxicity of the cancer-specific sgRNAs above the approximately 90% reported herein, while maintaining the lack of activity in the patient's normal cells.Regarding the initial testing in human patients, we plan to develop a preclinical assay that includes the generation of patient-derived tumor and normal cell lines, sequencing both tumor and normal samples to identify tumor-specific sgRNAs, and screening the sgRNAs in both cell lines to ensure toxicity in tumor cells and lack of activity in patient-matched normal cells.For PC patients, one option could be to begin sample sequencing and cell line generation from the Whipple procedure, in which tumors and normal tissues or blood could be biopsied, and the preclinical assay could be implemented while the patient is undergoing post-operation recovery.Identifying the optimal clinical setting for the initial clinical trial would be essential for the successful development of this approach.
PAM-finding approaches have been published in a few studies ( 83 ,84 ).However, our approach is cancer-and, more importantly, patient-specific.This strategy presents a unique opportunity as a new precision medicine-based therapeutic tool that possesses the specificity of a targeted therapy, but without the restriction of a targetable protein and the drug development to target it ( 34 ).As cancer is a clonal disease, the distinct set of mutations found in the cancer initiating cell should be present in all primary tumor and metastatic sites, thus making genetic targeting of somatic PAMs a viable option for patients with multiple metastases ( 85 ).Since mutations are a universal feature of cancer, we envision the possible applicability of our approach to a broad range of cancers.

Figure 1 .
Figure 1.Somatic PAM disco v ery yielded hundreds of no v el PAMs in pancreatic cancers (PCs).( A ) A somatic NGG protospacer adjacent motif (PAM) can arise through a single base substitution (SBS) that creates a no v el G from A / T / C (indicated as X), and this no v el G is adjacent to an existing G immediately downstream (PAM 1) or upstream (PAM 2) of the no v el G. Examples of T > G are shown.( B ) Two somatic PAMs in the Panc480 tumor were both absent in their corresponding normal tissues.( C ) Mutational signatures of two PCs, Panc480 and Panc504.The proportion of SBSs creating cancer-specific Gs and Cs that could potentially form novel PAMs were highlighted in red boxes.Y-axis is the percentage of SBS. ( D ) Workflow of somatic PAM disco v ery.( E ) Proportions of somatic PAMs disco v ered in Panc480 (left), Panc504 (middle) and Panc1002 (right) that were located in different regions of the human genome.Others included non-coding RNAs, untranslated regions, and 1kb regions upstream / downstream of transcription start / end sites.For Panc480, no no v el PAMs were found in exons.

Figure 2 .
Figure 2. Hundreds to thousands of somatic PAMs were found in different adult solid tumor types.( A ) Workflow of PAM discovery in 591 tumor samples using T-N subtracted VCFs from the ICGC Data Portal ( 36 ).All analy ses w ere corrected based on the tumor purity of individual sample.Samples from four cohorts were included: APGI-AU (Pancreas (AU); N = 44), PACA-CA (Pancreas (CA); N = 130), LUCA-KR (Lung (KR); N = 29), and OCCAMS-GB (Esophagus (GB); N = 388).(B, C) Truncated violin plots present the total number of ( B ) base substitutions and ( C ) no v el PAMs in each cohort.( D ) Truncated violin plot presents the percentage of base substitutions that created somatic PAMs.K olmogoro v-Smirno v tests; ns indicates non-significant, **** indicates P < 0.0 0 01.( E ) Mutational spectra analysis in each cohort.

Figure 4 .
Figure 4. Selective cell killing with low number of sgRNAs designed from our somatic PAM discovery approach in three PC cell lines.(A-B) Co-cultures of Panc10.05(labeled with mNeonGreen) and TS01 1 1 cell mixtures were transduced with 4-sgRNA expression vectors that included either all non-targeting sgRNAs (NT quad) or Panc10.05-specificsgRNAs (Panc10.05quad), and flow cytometry was performed to quantify mNeonGreen-positive cells.( A ) Flow cytometry analyses of one replicate on day 21 post transduction.Left panel: cells treated with NT quad; right panel: Panc10.05quad.( B ) Percentage reduction of Panc10.05relative to NT quad on day 1 and day 21 post transduction.N = 3; mean ± SEM. (C, D) Co-cultures of TS01 1 1 (labeled with mApple) and Panc10.05(labeled with mNeonGreen) were treated with three different pools of TS01 1 1-specific sgRNAs (9 sgRNAs per pool) or the equivalent doses of non-targeting sgRNA controls (NT), and flow cytometry was performed to quantify cells that were positive for either mNeonGreen or mApple.( C ) Flow cytometry analyses of one replicate on day 14 post transduction.Left panel: cells treated with NT; right panel: TS01 1 1 Pool 1. ( D ) Percentage reductions of TS01 1 1 relative to NT on days 1, 14 and 21 post transduction.N = 3; mean ± SEM. (E, F) Co-cultures of Panc480 (labeled with mApple) and TS01 1 1 (labeled with mNeonGreen) were treated with three different pools of Panc480-specific sgRNAs (4 sgRNAs per pool) or a pool of 4 non-targeting sgRNA controls (NT), and flow cytometry was performed to quantify cells that were positive for either mNeonGreen or mApple.( E ) Flow cytometry analyses of one replicate on day 21 post transductions.Left panel: cells treated with NT; right panel: Panc480 Pool 2. ( F ) Percentage reductions of Panc480 relative to NT on days 1 and 21 post transduction.N = 3; mean ± SEM.

Figure 5 .
Figure 5. Absence of off-target activity of Panc480-specific sgRNAs.( A ) Illustration of Panc480-MT7 vector.The vector was designed to express 7 sgRNAs (red line indicates 20bp spacer) targeting Panc480 simultaneously.Diagram was generated by SnapGene.( B ) Mutation frequency at 7 Panc480-specific target sites in Panc480 parental, Panc480 Cas9-expressing, Panc480 patient's Cas9-expressing lymphoblasts (normal cell line, Onc3286, indicated as Panc480-N), and Panc1002 Cas9-expressing (negative control) cell lines after treatment with NT (-) or Panc480-MT7 (+) multiplex sgRNA vector.( C ) Pairwise sequence alignments among Panc480-MT7 c hr8:2903291 6 sgRNA sequence (DNA bases are shown), the region surrounding chr8:7530860 in the hg19 human reference genome that has the lo w est number of potential mismatches (6bp mismatches), and the actual sequence in Panc1002 T14.Red dash indicates deletion.( D ) Pairwise sequence alignments among chrX:3982448 sgRNA sequence (DNA bases are shown), the region surrounding c hr1 5:2467181 5 in hg19 that has the lo w est number of potential mismatches (7 bp mismatches), and the actual sequence in Panc1002 T14.Red dash indicates deletion.Note also the absence of PAM immediately downstream of the potential target sequence (blue circles).

Table 1 .
L o w est number of mismatches of potential off-target sites in P anc480-T14 and P anc1002-T14 treated with P anc480-MT7 multiplex sgRNA e xpression v ector