Selection of highly efficient sgRNAs for CRISPR/Cas9-based plant genome editing

The CRISPR/Cas9-sgRNA system has been developed to mediate genome editing and become a powerful tool for biological research. Employing the CRISPR/Cas9-sgRNA system for genome editing and manipulation has accelerated research and expanded researchers’ ability to generate genetic models. However, the method evaluating the efficiency of sgRNAs is lacking in plants. Based on the nucleotide compositions and secondary structures of sgRNAs which have been experimentally validated in plants, we instituted criteria to design efficient sgRNAs. To facilitate the assembly of multiple sgRNA cassettes, we also developed a new strategy to rapidly construct CRISPR/Cas9-sgRNA system for multiplex editing in plants. In theory, up to ten single guide RNA (sgRNA) cassettes can be simultaneously assembled into the final binary vectors. As a proof of concept, 21 sgRNAs complying with the criteria were designed and the corresponding Cas9/sgRNAs expression vectors were constructed. Sequencing analysis of transgenic rice plants suggested that 82% of the desired target sites were edited with deletion, insertion, substitution, and inversion, displaying high editing efficiency. This work provides a convenient approach to select efficient sgRNAs for target editing.

A major concern of CRISPR/Cas9-sgRNA system is the editing efficiency and specificity of sgRNA. Recently, two groups developed web tools for design of highly specific sgRNA in plants 16,17 , which provides a number of highly specific guide sequences for a given gene. However, it is still unknown whether a certain candidate guide sequence is efficient or inefficient in plants.
The construction of multiple mutants is often required for functional research of multiple duplicated genes, which is extremely time-consuming, particularly for the closely-linked genes. Given that multiple sgRNAs can be expressed simultaneously, editing multiple genomic loci is allowed. Recent reports showed that up to 8 target sites can be edited simultaneously in rice 4,9 . In the present study, criteria for efficient sgRNAs were instituted according to the assessment of nucleotide composition and secondary structure of sgRNAs. At the same time, we invented a new strategy to construct multiple target editing CRISPR/Cas9-sgRNA system. Experimental test of 21 sgRNAs in rice plants demonstrated a high editing efficiency in the expected target sites.

Results
Design of efficient sgRNAs. The CRISPR/Cas9-sgRNA system has been adapted for facilitating genome editing in eukaryotic cells. Although this system can be programmed to virtually cleave any sequence preceding a 5′ -NGG-3′ PAM sequence, it does not always succeed with regard to all sites predicted to be targeted 9,18 . The major concerns of CRISPR/Cas9-sgRNA system are the target specificity and efficiency. For a given gene, tens or hundreds of NGG containing sites can be candidate editing loci. Although several online tools were developed for highly specific sgRNA selection 16,17 , the tools for the evaluation of sgRNA editing efficiency in plants are still lacking. Unlike the screening libraries in animal cell lines, genome editing in plants takes long time to obtain transgenic products. Thus, criteria that can be used to distinguish efficient and inefficient sgRNAs are of great utility for avoiding generation of non-edited transgenic plants resulting from inefficient sgRNAs.
To institute criteria to design efficient sgRNAs, we collected and analyzed those sgRNAs which have been validated in plants (Supplemental Table 1). Several groups analyzed the nucleotide composition of sgRNAs used in animal and identified a few nucleotide preferences [18][19][20] . However, the nucleotide preferences were not found in plant sgRNAs. We saw no statistically-significant difference for the nucleotide composition in each of all 20 positions, namely, no nucleotide preference was observed (Fig. 1A), implying a difference between animal and plant sgRNAs. G/C content has been thought as a key factor affecting sgRNA editing efficiency 9,18,19,21 . Our analysis revealed that 97% of sgRNAs have a G/C content between 30% and 80%.
It is well known that one sgRNA functions by the interaction of its secondary structure with the Cas9 protein in vivo 22 . Ma et al. 9 suggested that the secondary structure of sgRNA may interfere with the editing efficiency. Therefore, to establish link between secondary structure and editing efficiency of sgRNAs is necessary. The sgRNA contains crRNA-and tracrRNA-derived sequences connected by an artificial tetraloop. The crRNA sequence consists of guide (20nt) (also referred to spacer) and repeat (12nt) region, whereas the tracrRNA sequence consists of anti-repeat (14nt) and three tracrRNA stem loops (Fig. 1B). The repeat and anti-repeat region (stem loop RAR) triggers precursor CRISPR RNA (pre-crRNA) processing by the enzyme RNase III and subsequently activates crRNA-guided DNA cleavage by Cas9 14 . The analysis of crystal structure of Cas9-sgRNA-DNA revealed that stem loop 1 is crucial for the formation of the functional Cas9-sgRNA-DNA complex whereas stem loop 2 and 3 promote the stable complex formation and hence improve the in vivo activity 22 . Since the guide sequence is variable in an sgRNA, the final secondary structure should vary with the guide sequence. Assessment of sgRNA secondary structures found that all sgRNAs validated in plants have intact stem loop RAR, as well as intact stem loop 2 and 3, implying that these 3 stem loop structures are crucial for genome editing. In contrast, 82% of sgR-NAs lose their stem loop1, implying that stem loop 1 is not related with editing efficiency. It is noteworthy that 83% of guide sequences contain at least one base pairing with the other nucleotide(s) of sgRNAs (Supplemental Table 1). Because a stable complex between the guide sequence and the other bases can affect the base pairing of guide sequence with its target DNA, we further analyzed the number of pairing-bases, finding that 98% of guide sequences have no more than 12 bases paring with the other bases of sgRNA and 99% of guide sequences have no more than 7 consecutive base pairs (CBPs). In addition, internal base pairs (IBP) in the guide sequence also may interfere with its target recognition. Statistical analysis indicates that 35% of guide sequences contain at least one IBP and the highest IBP number is no more than 6.
T-DNA is commonly used for delivery of DNA. We developed a restriction enzyme based system to assemble Cas9 and multiple sgRNAs into a T-DNA. This system consists of two modules for the assembly of multiple sgRNA cassettes and the Cas9 gene. The first module is an intermediate vector (pSAK2) (Supplemental Fig. 1) which contains a multiple clone site consisting of eleven regular restriction enzyme sites and hence can load up to ten sgRNA cassettes in theory. The pSAK2 vector is derived from the pBluescript II KSII (+ ) and retains the Lac Z marker gene which can be used to help selection for positive clones carrying the sgRNA expression cassettes. The second module contains a set of four binary vectors (Supplemental Fig. 2) adapted from those of Ma et al. 9 , each one of which possesses one Cas9p gene driven by the cauliflower mosaic virus 35S promoter (P 35S ) or the Zea may The first step is to construct sgRNA expression cassettes containing the guide sequences. The guide sequences can be easily integrated into sgRNA expression cassettes by overlapping PCR with guide sequence-containing chimeric primers ( Fig. 2A). At the same time, ten pairs of public primers, which can pair with the 5′ -terminal of promoters and the 3′ -terminal of sgRNA and contain specific restriction enzyme sites, are available (Supplemental Table 2). The public primers of interest should be selected according to the desired order of sgRNA expression cassettes. After two rounds of PCR, an intact sgRNA expression cassette with two different restriction enzyme sites in the 5′ and 3′ ends can be produced ( Fig. 2A). The second step is to clone different sgRNA expression cassettes into the intermediate vector pSAK2 (Fig. 2B). The third step is to digest pSAK2-sgRNA(s) by Spe I/Asc I and then ligate the sgRNA(s) into the final binary vectors ( Fig. 2C; Supplemental Fig. 2).
Highly efficient genome editing. Having instituted the criteria for selection of efficient sgRNAs in plants, subsequently we performed experimental test in rice plants. To reduce potential off-targets, the guide sequences with a high specificity to desired target sites were selected from the CRISPR-PLANT Database 16 . Then, 21 guide sequences (Guide1 to Guide 21) which agree with the criteria were used for the next test (Supplemental Table 3). To further confirm whether the criteria are reliable, we also designed two guide sequences (Guide22 and Guide23) against the criteria as potential inefficient sgRNAs. sgRNA22 contains up to 7 IBPs in the predicted secondary structure and sgRNA23′ s G/C content is less than 30% (Supplemental Table 3; Supplemental Fig. 3). Guide 1 to 23 are predicted to target 11 encoding genes (Supplemental Table 3) and assembled into sgRNA1 to sgRNA23, respectively. The sgRNA cassettes were further cloned into the binary vector, pMH-SA (Supplemental Fig. 2), by the aforementioned method. At last, 12 expression constructs with one or multiple sgRNA cassettes were generated (Table 1) and then used for rice transformation.
To detect the Cas9-sgRNA-mediated precise genome editing, T 0 transgenic plants were used for sequencing analysis of desired target sites. PCR products covering predicted target sites were directly used for sequencing analysis. As expected, sgRNA 1 to 21 caused genome editing in the predicted target sites. In 371 sequenced plants (except for sgRNA22 and sgRNA23), 305 plants (82.2%) had mutations, which can be classified into homozygous (28.1%), heterozygous (14.2%), biallelic (56.7%) and chimeric (1.0%) mutations (Fig. 3A). In contrast to the 21 sgRNAs (sgRNA1 to sgRNA21) with the high editing rate, no target mutation was detected in all T 0 transgenic plants containing sgRNA22 and sgRNA23 (20% G/C content) which do not comply with the criteria. sgRNA24 9 is a confirmed inefficient sgRNA which contains 14 CBPs and does not agree with the criteria (Supplemental Table 3). These evidence suggested that the criteria are reliable for selection of efficient sgRNAs.
We found that two out of 23 sgRNA15/16 transgenic plants contains DNA fragment inversion between their two target sites (Fig. 3C). In all of 9 two-sgRNA expressing constructs, only one caused fragment inversion, suggesting a low frequency event.
To investigate whether the number of simultaneously expressed sgRNAs affects the editing efficiency, the editing frequency was analyzed between transgenic plants with two sgRNAs (sgRNA1/2) and those with four sgRNAs (sgRNA1/2/3/4), as well as sgRNA9/10, sgRNA21, and sgRNA9/10/21. As shown in Fig. 3D, no significant difference was observed. The results suggested that the number of sgRNAs has no significant effect on the editing efficiency.

Discussion
The CRISPR/Cas9-sgRNA mediated precise genome editing is being universally applied in diverse plant species. In the present study, we instituted criteria for efficient sgRNAs based on nucleotide composition of guide sequences and secondary structure of sgRNAs and introduced a new strategy to construct CRISPR/Cas9-sgRNA cassettes. Using rice as an example, we demonstrated that this clone strategy for assembly of multiple sgRNAs is rapid and functional, and the criteria can be used to select efficient sgRNAs from the highly specific sgRNAs.

Constructs
Guide Target specificity is a major concern for the CRISPR/Cas9-sgRNA system. High-frequency off-target mutagenesis induced by CRISPR/Cas9-sgRNA was found in human cells 27 . Several studies have investigated off-target effects caused by CRISPR/Cas9-sgRNA system in animal based on the alignment of guide sequences to the genome 28 , dCas9 ChIP-seq 29 and GUIDE-seq 30 . The off-target effects were also detected in plant cells 3 . Genome-wide specificity analysis of candidate target sites can help avoid or reduce off-target effects 16,17 . Another major concern for CRISPR/Cas9-sgRNA system is editing efficiency of sgRNAs. It has been established that the PAM containing NGG consensus is necessary for Cas9-DNA binding and cleavage in a CRISPR/Cas9-sgRNA system 31 . Recent studies showed that the sgRNA editing efficiency depends on nucleotide composition of guide sequences in animals 19,20 . By analyzing sgRNAs which have been experimentally validated in plants, we found that most of their guide sequences have G/C content between 30% and 80% whereas no nucleotide preference was found. A recent study revealed that the guide sequence with the higher G/C content has the higher editing efficiency 9 . Based on the secondary structures of sgRNAs, criteria for efficient sgRNAs were instituted. Experimental tests revealed that all 21 sgRNAs agreeing with the criteria were efficient for target site editing. Many factors can affect the editing efficiency of sgRNAs. In addition to nucleotide composition and secondary structure, the expression levels of sgRNAs may be also involved 9 . Thus, in order to ensure efficient editing of a given gene, more than one sgRNA is recommended to be selected to target multiple sites in this gene.
Very recently, Golden Gate cloning method was used to assemble multiple sgRNA cassettes 9,32,33 , which employs the type II restriction enzyme, Bsa I, to design and generate distinct, non-palindromic sticky ends. This method is efficient for linking multiple sgRNA cassettes in a designed order in single reaction because the sticky ends caused by Bsa I can avoid self-ligation and non-compatible end ligation. The efficiency for assembly of two or three sgRNA cassettes was high whereas 6 or more sgRNA cassettes was far less efficient and often failed 33 . The Cas9 gene and Zea may ubiquitin promoter sequences contain most of regular restriction enzyme sites, which limits the utility of regular restriction enzymes in the assembly of multiple sgRNA cassettes. Here, we developed two modules to construct CRISPR/Cas9-sgRNA system containing multiple sgRNA cassettes. Both modules employ common restriction enzymes to perform molecular cloning. In theory, our method can assemble up to 10 sgRNA cassettes by the use of 11 different restriction enzyme sites provided in the pSAK2. Our two modules comprise an efficient, inexpensive, time-saving, user-friendly, multifaceted, extensible toolkit for the construction of CRISPR/Cas9/sgRNA system targeting multiple genome sites.
The efficient editing of target genes in transgenic plants can provide researcher with desired mutants, which will accelerate the progress of gene function dissection. We confirmed that the sgRNAs complying with the criteria were efficient for target gene editing. Both this work and Ma et al.'s 9 confirmed that at least two same snoRNA promoters can be used simultaneously to drive different sgRNAs. Our work also revealed that the number of sgRNA cassettes has no effect on the editing efficiency of sgRNAs. It is noteworthy that up to 84.8% of edited plants contained loss-of-function gene mutations (i.e., biallelic or homozygous mutations) in the T 0 transgenic plants and they can be used directly for functional analysis. In our two-sgRNA expression plants, both targets sites can be edited simultaneously, which also caused deletion or inversion of DNA fragment between two target sites. The editing of both target sites will facilitate the gene correction via homologous recombination by providing a mutant plant with wild type DNA fragment donor. Altogether, our toolbox for sgRNA design criteria and assembly of multiplex CRISPR/Cas9-sgRNA system provides researchers with a new approach to efficiently edit one or multiple target sites and perform genetic improvement.

Methods
Design of the CRISPR/Cas9-sgRNA-related vectors. The pSAK2 (Supplemental Fig. 1) was derived from the pBluescript II KSII and introduced into an Asc I restriction enzyme recognition site in the multiple clone site. 11 restriction enzyme sites between Spe I and Asc I can be used for assembly of multiple sgRNAs. Four binary vectors, pYLCRISPR/Cas9P 35S -H, pYLCRISPR/Cas9P 35S -N, pYLCRISPR/Cas9P ubi -H, and pYLCRISPR/ Cas9P ubi -B 9 , were modified by the remove of ccdB gene and introduction of Spe I and Asc I to generate pDH-AS, pDK-SA, pMH-SA, and pMB-AS (Supplemental Fig. 2), respectively.
Assembly of multiple sgRNAs. Vectors (pYLsgRNA-OsU3, pYLsgRNA-OsU6a, pYLsgRNA-OsU6b, and pYLsgRNA-OsU6c for rice and pYLsgRNA-AtU3b, pYLsgRNA-AtU3d, pYLsgRNA-AtU6-1, and pYLsgRNA-AtU6-29 for Arabidopsis) 9 with four rice snoRNA promoters and four Arabidopsis snoRNA promoters were used for templates in the overlapping PCR. After two rounds of PCR, as described in Fig. 2A, by specific primers and public primers, specific sgRNA cassette products with specific restriction sites in the both ends were produced, which then were cloned into the pSAK2 vector. One or multiple sgRNA cassettes between the Asc I and Spe I of pSAK2 were cloned into the binary vector by digestion and ligation. When the OsU3 promoter containing an Spe I cutting site is chosen, the Xba I (an isocaudarner of Spe I) cutting site can be chosen to avoid the digestion of OsU3 promoter.
Plant Transformation. Oryza sativa spp. japonica was used for transformation. The CRISPR/Cas9 constructs were introduced into Agrobacterium tumefaciens strain EHA105 by electroporation. Transformation of rice was described previously 35 . Mutation Detection. Genomic DNA was extracted from leaves of T 0 transgenic rice plants and used for templates for PCR by primers flanking the predicted target sites. The PCR products were sequenced directly by specific primers. For low quality of sequencing results, PCR products were cloned into a plasmid vector and 4-6 single clones were further sequenced. Samples with heterozygous and biallelic mutations were decoded using the Degenerate Sequence Decoding (DSD) method 24 .