Editing of the urease gene by CRISPR-Cas in the diatom Thalassiosira pseudonana

Background CRISPR-Cas is a recent and powerful addition to the molecular toolbox which allows programmable genome editing. It has been used to modify genes in a wide variety of organisms, but only two alga to date. Here we present a methodology to edit the genome of Thalassiosira pseudonana, a model centric diatom with both ecological significance and high biotechnological potential, using CRISPR-Cas. Results A single construct was assembled using Golden Gate cloning. Two sgRNAs were used to introduce a precise 37 nt deletion early in the coding region of the urease gene. A high percentage of bi-allelic mutations (≤61.5%) were observed in clones with the CRISPR-Cas construct. Growth of bi-allelic mutants in urea led to a significant reduction in growth rate and cell size compared to growth in nitrate. Conclusions CRISPR-Cas can precisely and efficiently edit the genome of T. pseudonana. The use of Golden Gate cloning to assemble CRISPR-Cas constructs gives additional flexibility to the CRISPR-Cas method and facilitates modifications to target alternative genes or species. Electronic supplementary material The online version of this article (doi:10.1186/s13007-016-0148-0) contains supplementary material, which is available to authorized users.


28
Diatoms are ecologically important microalgae with high biotechnological potential. Since their 29 appearance about 240 million years ago [1], they have spread and diversified to occupy a wide range 30 of niches across both marine and freshwater habitats. Diatom genomes have been shaped by 31 secondary endosymbiosis and horizontal gene transfer resulting in genes derived from heterotrophic 32 hosts, autotrophic endosymbionts and bacteria [2,3]. They play a key role in carbon cycling [4], the 33 food chain, oil deposition and account for about 20% of the world's annual primary production [5, 34 6]. However, they are perhaps best known for their intricate silica frustules which give diatoms a 35 range of ecological advantages and play a key role for carbon sequestration and silica deposition. 36 Several aspects of diatom physiology including the silica frustule, lipid storage and photosynthesis 37 are being applied to biotechnology. Areas of high interest include nanotechnology [7], drug delivery 38 [8], biofuels [9], solar capture [10] and bioactive compounds [11]. 39 Given the ecological importance of diatoms and their applications for biotechnology, it is pivotal that 40 the necessary tools are available to study and manipulate them at a molecular level. This includes 41 the ability to replace, tag, edit and impair genes. A recent edition to the genetic tool box, CRISPR--42 Cas, allows double strand breaks (DSBs) to be introduced at specific target sequences. This adapted 43 mechanism, used by bacteria and archaea in nature as a defence system against viruses, facilitates 44 knock--out by the introduction of mutations through repair by error prone non--homologous end 45 joining (NHEJ) or homologous recombination (HR). This requires both a Cas9 to cut the DNA and a 46 sgRNA to guide it to a specific sequence. Further information on the history and application of 47 CRISPR--Cas can be found in several excellent reviews [12][13][14]. Zinc--finger nucleases (ZFNs), 48 meganucleases and transcription activator--like effector nucleases (TALENs) have also been used to 49 induce double strand breaks. TALENs and CRISPR--Cas both bring flexibility and specificity to gene 50 editing, however CRISPR--Cas is also cheap, efficient and easily adapted to different sequences by 51 simply changing the 20nt guide sequence in the sgRNA. 52 So far, within the diverse group of algae, the diploid, pennate diatom Phaeodactylum tricornutum 53 [15] and the haploid, green alga Chlamydomonas reinhardtii [16] have been subject to gene editing 54 by CRISPR--Cas. NHEJ and HR have been used to repair DSBs following CRISPR--Cas or TALENs in P. 55 tricornutum, introducing mutations into a nuclear coded chloroplast signal recognition particle [ performed on the genome against the central conserved region of the U6 sequence. Two potential 97 guanine (G) start sites were found downstream of a TATA box in the promoter. To identify the start 98 site of the U6 snRNA and empirically determine the end of the promoter, 5' RACE was carried out as 99 follows: 400ml of culture was grown to exponential phase (1x10 6 cells ml --1 ) and harvested. Small 100 RNAs were extracted and enriched using a miRNeasy kit (Qiagen  Figure 1 for an overview of the 143 Golden Gate assembly procedure and the final construct. 144 145 sgRNA design for the urease gene knockout 146 Two sgRNAs were designed to cut 37nt apart early in the coding region of the urease gene (JGI ID 147 30193) to induce a deletion and frame--shift. Several programmes, explained below, were used to 148 collect data and make an informed decision on sgRNA choice. Excel was used to combine, process 149 and compare data. 150 Selecting CRISPR--Cas targets and estimating on--target score 151 Twenty bp targets with an NGG PAM were identified and scored for on--target efficiency using the 152 Broad Institute sgRNA design programme (www.broadinstitute.org/rnai/public/analysis--tools/sgrna--153 design), which utilises the Doench et al.
[29] on--target scoring algorithm calculated from >1800 154 empirically tested sgRNAs. 155 Determining cut positions and cross referencing to restriction recognition sites 156 All restriction sites and their positions within the urease gene were identified using the Emboss 157 restriction tool (http://emboss.bioinformatics.nl/). As the Broad Institute sgRNA design programme 158 does not give the location of CRISPR--Cas targets within a gene, this was determined using Primer 159 map (http://www.bioinformatics.org/sms2/primer_map.html [36]). The cut site position (3nt 160 upstream of the start of the PAM sequence) was calculated for each sgRNA depending on sense or 161 anti--sense strand placement. All predicted CRISPR--Cas cut sites were cross--referenced to restriction 162 recognition sites. 163

Reverse complement of antisense strand CRISPR--Cas targets 164
The reverse complement (RC) was found for each CRISPR--Cas target using the programme: 165 http://www.bioinformatics.org/sms2/rev_comp.html [36]. In the final spreadsheet (Supplementary 166 Figure 1), if a target was located on the anti--sense strand, the RC was shown for the 'sense strand 167 sequence' column. This allows the sgRNA to be easily searched within the original gene sequence. 168 Determine position of CRISPR--Cas cut sites in relation to coding region 169 An array was made with start and end positions for each exon/intron. Cut site positions were 170 compared to exon/intron ranges and the relevant exon/intron returned if the data overlapped. 171 The final spreadsheet gives data on CRISPR--Cas target sequences and their sense sequence (if 172 located on the antisense strand), location of target (relative to the sense strand), predicted CRISPR--173 Cas cut site, first nucleotide of the target, PAM sequence, location (i.e. exon, intron), strand, sgRNA 174 score and restriction recognition sites overlapping the cut site. The table (Supplementary Figure 1) 175 was sorted to prioritise sgRNAs by starting base prioritising guanine, sgRNA score, position within 176 the gene and interaction with restriction recognition sites. 177 Predicting off--targets 178 The full 20nt target sequences and their 3' 12nt seed sequences were subjected to a nucleotide 179 BLAST search against the T. pseudonana genome. Resulting homologous sequences were checked 180 for presence of an adjacent NGG PAM sequence at the 3' end. The 8nt sequence outside of the seed 181 sequence was manually checked for complementarity to the target sequence. In order for a site to 182 be considered a potential off--target the seed sequence had to match, a PAM had to be present at 183 the 3' end of the sequence and a maximum of three mismatches between the target and sequences 184 from the blast search were allowed outside of the seed sequence. 185 Off--targets were also checked using the EuPaGDT program [ Cas construct, pTpfcp/nat (positive control) and water (negative control). Five x 10 7 cells in 194 exponential phase were used per shot with a rupture disc of 1350psi and a 7cm flight distance. 195 Following transformation, cells were rinsed into 25ml of media and left to recover for 24 hours 196 under standard growth conditions. Cells were counted using a Coulter counter (Beckman) and 2.5 x 197 10 7 cells from each transformation were spread onto 5, ½ salinity Aquil 0.8% agar plates (5 x 10 6 198 cells/ plate) with 100µg ml --1 nourseothricin. Plates were incubated under standard conditions for 199 two weeks. Remaining sample was diluted to 1 x 10 6 cell ml --1 in media and supplemented with 200 nourseothricin to final a concentration of 100µg ml --1 for liquid selection. Liquid selection cultures 201 were maintained under standard growth conditions with 100µg ml --1 nourseothricin. Colonies were 202 picked and transferred to 20µl of media. Ten µl from each colony was transferred to 1ml of selective 203 media for further growth. The remaining sample was used in screening. 204 To isolate sub--clones from colonies which screened positive for mutations, 100µl of cells at 205 exponential phase were streaked onto ½ salinity Aquil 0.8% agar plates with 100µg ml --1 206 nourseothricin. 207 Screening clones and cultures 208 Ten µl from each colony or culture from liquid selection, was spun down and supernatant removed. 209 Cells were re--suspended in 20µl of lysis buffer (10% Triton X--100, 20mM Tris--HCl pH8, 10mM EDTA), 210 kept on ice for 15 minutes then incubated at 95°C for 10 minutes. One µl of lysate was used in Taq  211 PCR to amplify the CRISPR--Cas targeted fragment of the urease gene. Clones were also screened for 212 Cas9 and NAT. For PCR primers, see Table 1 (ref. numbers 21--26). PCR products were run on an 213 agarose gel to check for the lower MW band associated with a double--cut deletion in the urease 214 gene and for the presence of Cas9 and NAT. Urease PCR products were also digested with BsaI and 215 HpaII to determine if the restriction recognition sites, which overlap the cut sites, had been mutated. 216 PCR products were sent for sequencing to confirm mutations. 217

Growth experiments 218
Knockout and wild--type (WT) cultures were nitrate depleted by growing cells in nitrate free media 219 until cell division stopped and quantum yield of photosynthesis (Fv/Fm measured on the Phyto--220 PAM--ED) dropped below 0.2. Cultures were then transferred in triplicate at a final concentration of 221 2.5 x 10 4 cells ml --1 into 25ml of media with either 1mM sodium nitrate or 0.5mM urea. Cell count 222 and mean cell size were measured once a day using a Coulter counter. Fv/Fm measurements were 223 also taken daily. Growth rates were calculated using µ= Ln 2 --Ln 1 / T 2 --T 1 , where T is a time point 224 corresponding to exponential growth and Ln is the natural log of cell counts ml --1 . Analysis of variance 225 with Tukey's pairwise comparision was used to compare both growth rates and cell size at the end of 226 exponential phase between samples. 227 228

Results and discussion
229 sgRNA design 230 The two CRISPR--Cas targets with the highest on target scores (0.5 and 0.79), containing a predicted 231 cut site over a restriction site and occurring early in the coding region, were chosen. sgRNAs were 232 designed to cut 37nt apart at positions 138 and 175 within the urease gene. Both targets started 233 with a G for polymerase III transcription (Figure 2). No off--target sites were predicted for sgRNAs 234 designed for either of the two CRISPR--Cas target sequences. 235 236 Constructing the CRISPR--Cas plasmid using the Golden Gate cloning method 237 A single CRISPR--Cas construct was made using Golden Gate cloning (Figure 1) The long term effects from off--target mutations introduced through CRISPR--Cas are currently 255 unknown, therefore it may be advantageous for future work to remove CRISPR--Cas constructs from 256 mutants. Adding a yeast CEN6--ARSH4--HIS3 sequence to plasmids allows autonomous replication in 257 diatoms and expression of genes without random integration into the genome [20]. Furthermore, 258 removing selection leads to plasmids being discarded. By expressing CRISPR--Cas genes and selective 259 markers on a removable episome, mutations could be introduced without integration of the 260 plasmid. CRISPR--Cas constructs could then be expelled by removing selection. As well as 261 considerations for long term off--target effects, this could also be advantageous for studies and 262 applications which are sensitive to the presence of transgenes. confirmed by sequencing (Figure 2). The fourth colony (M1) showed a single band associated with 278 the WT urease, however sequencing showed two products: a WT urease and a mutant urease with a 279 4nt deletion at the first sgRNA cut site. A mixture of PCR products may be due to a mono--allelic 280 mutation, in which one allele is WT and the other displays a mutation. It can also be due to colony 281 mosaicism where a colony contains a mixture of cells with WT and mutant alleles due to mutations 282 occurring after transformed cells have started to divide. Both mono--allelic mutants and mosaic 283 colonies have been observed in P. tricornutum [15,18]. 284 To determine if the colonies were mosaic or mono--allelic, cells from mutant clones producing mixed 285 PCR products were spread onto selective plates to isolate single sub--clones. Thirty four sub--clones 286 from each clone were screened by PCR (a few examples are presented in Figure 2). Two clones (M2 287 and M3) were mosaic with a mixture of sub--clones showing either a single band corresponding to 288 the expected deletion (61.5% and 25%, respectively), two bands associated with the WT and 289 expected deletion (25.5% and 28.1%, respectively) or a single band corresponding to the WT urease 290 fragment (13% and 46.9% respectively). For each of the two clones PCR amplicons from three 291 putative bi--allelic sub--clones were sequenced (Figure 2). Four out of six (M2_9, M2_10, M3_10 and 292 M3_11) showed the expected 37nt 'clean' deletion without any additional mutations. Precise 293 deletions, such as this, using 2 sgRNAs have previously been generated with high efficiency [37, 43], 294 and allow a large degree of control over the mutation. Two of the sub--clones (M3_9 and M2_12) 295 showed one allele with the expected 37nt deletion and the other with an additional deletion at the 296 2 nd sgRNA cut site. In addition, M2_12 showed a C-->G SNP within the sgRNA1 target site. Sub--clones 297 derived from the M1 clone showed WT and 4nt deletion PCR amplicons as seen in the original clone, 298 suggesting that this clone may have a mono--allelic mutation. 299 Using CRISPR--Cas with one sgRNA can introduce a variety of indels into a locus of interest via the 300 error--prone NHEJ DNA repair mechanism [15]. Cas9 preferentially cuts DNA three nucleotides 301 upstream of the PAM sequence in the seed region [44] and the NHEJ mechanism either repairs a 302 double strand break perfectly or indels are introduced. If cut sites are not cleaved at the same time, 303 when using two sgRNAs, mutations at each site rather than removal of the fragment in between 304 target sites may occur [37]. In this study, however, we report a high occurrence of bi--allelic mutants 305 with precise deletions between the CRISPR--Cas cut sites, suggesting that the Cas9/sgRNA complex is 306 cutting efficiently and DNA ends tend to be repaired perfectly. This allows control over the 307 introduced mutations and gives the chance to avoid introducing in--frame indels. 308 Restriction digest (results not shown) and sequencing (Figure 2)  identifying bi--allelic mutants especially given the limited sgRNA/restriction site interactions available 313 for this gene. 314 As well as clones from plate selection, one culture from liquid selection (LM1; population of cells 315 transferred to liquid selective media after transformation), showed a single band associated with the 316 bi--allelic 37nt deletion following PCR. This was confirmed by sequencing ( Figure 2). PCR screening 317 following growth of LM1 in urea showed only the lower MW band product (results not shown), 318 giving further evidence for a bi--allelic mutation from a population of cells. As small volumes of cells 319 are transferred to fresh media when passaging this may have isolated bi--allelic mutants. 320

Growth experiments with mutants 321
Urease catalyses the breakdown of urea to ammonia allowing it to be used as a source of nitrogen 322 [45]. Sub--clones from different cell--lines with 37 or 38nt deletions were tested for knock--out of the 323 urease gene by looking for a lack of growth when supplemented with urea as the sole nitrogen 324 source. 325 Cells were nitrogen starved and then transferred to media with either nitrate or urea. Cell counts, 326 cells size and Fv/Fm were measured daily for 7 days. Negative controls to account for any 327 background nitrate in the media were also run in which no nitrate or urea was added for WT 328 cultures. 329 Four putative bi--allelic mutants (LM1, M4, M2_10 and M3_9) were tested along with WT and the 330 mono--allelic M1_10 over two growth curve experiments. Both LM1 from liquid selection (p=0.0029) 331 and the sub--clone M3_9 (p=0.0000001) showed a significant decrease in growth rate in urea 332 compared to nitrate (Figure 3) as well as a significant 13--18% decrease in cell size (Figure 4; 333 p=0.0029 and p=0, respectively). The latter was also apparent with light microscopy (results not 334 shown). Mutants in urea could be easily discerned even without cell counts, as cultures appeared 335 much paler in colour. M4 did not show a difference in growth rate but did show a significant 336 decrease in cell size (p=0.038).The mono--allelic mutant M1_10, displayed higher growth in urea and 337 similar growth to the WT control (Figure 3). This correlates with results from Weyman et al. [17] 338 which showed that despite a reduced protein concentration, a mono--allelic urease knock--out was 339 able to grow in urea. M2_10 which screened as a bi--allelic mutant prior to growth experiments 340 showed a smaller but still significant decrease in growth rate (p=0.0014) (Figure 3) and cell size 341 (p=0.0039) (Figure 4). PCR screening of the urease gene following growth in nitrate and urea showed 342 the expected bi--allelic mutation for LM1, M3_9 and M4, however M2_10 also showed a faint WT 343 band in nitrate and a strong WT band in urea ( Figure 5). This suggests that M2_10 was mosaic, with 344 cells containing a functional urease out--competing those with a mutant urease. Given that only a 345 faint WT band was present after growth in nitrate this suggests that the majority of the cells from 346 the sub clone contained the mutant urease, initially accounting for the majority of growth and 347 resulting in a lower but still significant decrease in growth rate. 348 Knock--out of the urease gene in the diatom P. tricornutum prevents growth in urea [17]. Urease 349 mutants in this study still grew in urea but with a lower growth rate and reduced cell--size, 350 characteristics which are associated with nitrogen limitation in diatoms [46, 47] rather than nitrogen 351 starvation. Mutant cell--lines in urea grew to the same density as the same cell--lines in nitrate, but at 352 a lower rate ( Figure 3). As nitrogen is an essential nutrient for growth, this suggests that mutant cells 353 in urea still have access to nitrogen, but lower growth rates and cell--size indicates that nitrogen may 354 not be as readily available compared to cells grown with nitrate. Controls in nitrogen free media 355 showed very little growth which suggests that growth of mutants in urea was not due to residual 356 nitrate in the culture. It is unlikely that random integration of the CRISPR--Cas plasmid is responsible 357 for reduced growth rate in mutants as all four individual mutant cell--lines display increased growth 358 rates when grown in nitrate. Therefore it seems likely that impaired growth of urease mutants in 359 urea is due to a reduction in function of the urease gene. 360 There are a few possible reasons why a mutation in the urease gene appears to lead to nitrogen 361 limitation rather than nitrogen starvation as seen in P. tricornutum. Cells may be able to access 362 nitrogen from another source, separate to the breakdown of urea via urease. Some algae have an 363 alternative pathway for breakdown of urea but this has only been found in Chlorophyceae codons after the deletion in the gamma sub--unit, leading to major disruption of the gamma sub--unit, 373 nonsense down--stream and short products of 24 or 44 amino acid residues ( Figure 6). Since all 374 mono--clonal bi--allelic mutants tested for growth in urea had either two alleles with a 37nt deletion 375 or both a 37 and 38nt deletion, it was predicted that the urease gene would no longer be functional. 376 However, several mechanisms exist in eukaryotes which can allow translation of the protein from 377 start codons later in the coding region. These include leaky initiation, re--initiation of ribosomes and 378 internal ribosome entry sites (IRES) [51]. IRES have been shown to become active in yeast following 379 amino acid starvation [51]. If an in--frame translation can occur after the deletion at an IRES or via a 380 mechanism such as re--initiation then the active site located in the alpha--subunit could still be 381 present. The first in--frame ATG after the deletion would start translation of the protein just before 382 the beta sub--unit, leading to an N--terminal truncated protein without the gamma sub--unit but with 383 both the beta and alpha sub--units ( Figure 6). Earlier start codons are predicted to result in non--sense 384 and early stop codons. 385 The 5' end of the urease coding region was targeted to induce a frame shift and disrupt the protein 386 early on, however it may be better to target the active site or entirely remove the gene. Precise 387 deletions larger than a gene using CRISPR--Cas and two sgRNAs have been previously demonstrated 388 [43]. 389

390
CRISPR--Cas can precisely and efficiently edit the genome of the diatom Thalassiosira pseudonana. 391 Twelve percent of initial colonies and 100% which screened positive for Cas9 showed evidence of a 392 mutation in the urease gene, with many sub--clones showing precise bi--allelic 37nt deletions from 393 two sgRNA DSBs. Screening for the deletion by PCR allowed efficient identification of bi--allelic 394 mutants and Golden Gate cloning allowed easy assembly of a plasmid for CRISPR--Cas. This included 395 adapting the system for T. pseudonana by including endogenous promoters and two specific sgRNAs. 396 Due to the flexible modular nature of the cloning system, this can be easily adapted for other genes 397 in T. pseudonana. A variety of available online tools were used to design two sgRNAs that would 398 target the early coding region of the urease gene. A reduced growth rate and cell--size phenotype 399 was seen in mutant cell--lines grown in urea compared to nitrate, suggesting that function of the 400 urease may have been impaired rather than removed or an alternative source of nitrogen was 401 available. 402 As    (division day --1 ) was measured in exponential phase and rates compared using analysis of variance 556 with Tukey's pairwise comparisons. 557 (5) and M3_9 in nitrate (6) and urea (7). 564      Mean cell size (µm) Supplementary Figure 1