Genome Editing Tools for Functional Analysis of HNF 1 A and IL 6 ST Genes

Genome editing tools, such as TALEN (transcription activator-like effector nuclease) or CRISPR-Cas9 (CRISPR-associated protein-9 nuclease) systems, enable functional studies by targeted gene knockout. They introduce double-stranded breaks (DSBs) into a DNA molecule in a sequence-specific manner, thereby stimulating the error-prone non-homologous end joining repair mechanism, leading to probable gene inactivation when the coding sequence is targeted. Vectors for expression of TALEN and Cas9-based constructs targeting the human IL6ST and HNF1A genes were assembled and tested for their ability to introduce DSBs when transfected into cultured cells using the luciferase assay. The Cas9-based construct targeting the IL6ST gene was shown to be active, while the two TALEN-based constructs did not introduce DSBs above background level. Both the TALEN and the CRISPR-Cas9 constructs targeting the HNF1A gene were found to be active, with the TALEN showing higher activity in a dose-dependent manner. The constructed genome-editing tools can be used for functional analysis of the putative role of HNF1A and IL6ST genes in IgG glycosylation, as shown previously by genome wide association studies.


INTRODUCTION
ENE knockout using site-specific nucleases has opened new possibilities for functional studies of gene function.Chimeric nucleases, composed of programmable sequence-specific modules and DNA cleavage domains, can be used to introduce single (nick) or doublestranded breaks (DSBs) at the targeted site, thus stimulating error-prone non-homologous end joining (NHEJ) or homology-directed repair (HDR). [1]The breaks then stimulate either knockout or homology-based insertion at specific loci.They can be applied both in cultured cells and in a whole organism. [2,3]Those actions have been termed genome editing or genome engineering.
A successful genome editing tool must perform two essential functions: targeting to a specific DNA sequence and cleavage of one or both strands of the DNA molecule.There are several targeting systems possessing different complexity, specificity, modularity, ease of assembly and sensitivity to DNA methylation.Homing nucleases (extensively reviewed by Stoddard) [4] were among the first enzymes of prokaryotic origin that were used for modifications of the genome.They recognize, bind and cleave specific DNA sequences (20 to 30 bp long) unlikely to be found in a genome by chance alone.In some cases they were successfully engineered to target novel sequences by altering their DNA-contacting residues. [4]Another step in flexibility and modularity of targeting were the zinc finger nucleases (ZFNs), consisting of zinc finger protein motifs assembled to bind the desired sequence and the FokI nuclease for cleavage of DNA. [5]Additional specificity is conferred by the requirement for dimerization of two FokI subunits, which need to be aligned by targeting two appropriately spaced sequences in opposite orientations. [5]ranscription activator-like effectors (TALE) are proteins naturally occurring in the plant pathogen Xanthomonas, where they bind DNA via a domain of tandem repeats. [6]These repeats bind specific nucleotides according to amino acids at two key positions (repeatvariable di-residue, RVD).Modular nature of DNA-binding G repeats enables efficient assembly of targeting domains fused with the FokI nuclease.The pair of TALE nucleases (TALENs) [7] introduces double-stranded breaks at the targeted site in a manner similar to ZFNs, but with the advantages of highly predictable modular assembly and the possibility of targeting almost any DNA sequence.Plasmid kits containing RVD-encoding sequences and appropriate TALEN backbones [7,8] have made the TALEN technology widely available.Compatible backbones enable harnessing the additional specificity conferred by engineered heterodimeric FokI domains. [9]nother prokaryotic DNA-binding system repurposed for genome engineering is the CRISPR-Cas9, which is based on the endonuclease Cas9 guided by a short RNA molecule that defines the binding specificity. [10]Co-expression of the Cas9 protein with a 20 bp single guide RNA (sgRNA) enables targeting to any DNA sequence followed by the protospacer adjacent motif (PAM) NGG, which occurs on average every 8-12 bp in the human genome. [11]This system enables easy programming of Cas9 specificity and even allows multiple targeting by co-expression of several sgRNAs, which can be quickly and conveniently cloned into a dual-expression vector (Cas9 and sgRNA) by oligo annealing. [12]ene knockout by TALEN or Cas9-based systems depends on introduction of double-stranded breaks (DSBs) into the reading frame of the targeted gene.Those DSBs are repaired by the error-prone NHEJ mechanism, which introduces short insertions or deletions potentially leading to frameshift mutations, thus abolishing gene function.In order to generate successful biallelic knockouts, it is useful to select the most efficient TALEN or CRISPR-Cas9 constructs by testing their ability to introduce DSBs at the targeted region in vivo.An elegant assay based on luciferase reporter plasmid has been developed by Porro and coworkers. [13]The luciferase reporter plasmid drives expression of the firefly luciferase gene interrupted by a multiple cloning site (MCS) flanked by two 548 bp repeated regions of luciferase cDNA.The sequence targeted by genome editing tools is cloned into the MCS and upon co-transfection, homologous recombination between flanking repeats reconstitutes luciferase activity in proportion to the frequency of DSBs induced in the cloned targeted region.
The post-translational modification of eukaryotic proteins by glycosylation highly influences their structure and function.Protein glycosylation is involved in many key cellular and physiological processes, [14] and can furthermore provide a cell with a mechanism for rapid adaptation by integration of environmental information. [15]Large-scale genome-wide association studies (GWAS) [16,17] have recently identified many putative loci associated with protein glycosylation.Two of them, the HNF1A gene (encoding the hepatic nuclear factor alpha) and the IL6ST gene (the interleukin 6 signal transducer shared by many cytokines) are particularly interesting in our studies of epigenetic regulation of protein glycosylation in complex diseases.The HNF1A is shown to be a master regulator of plasma protein fucosylation. [16]Recently, we have shown that epigenetic silencing of HNF1A by methylation at several CpG sites within the promoter strongly correlates with highly branched glycans from plasma and that epigenetic silencing of this gene could be a mechanism leading to a subtype of diabetes (i.e.HNF1A-MODY). [18]On the other hand, the IL6ST has been associated both with immunoglobulin G (IgG) glycosylation and inflammatory and autoimmune conditions, such as inflammatory bowel disease (IBD), [17] which we are currently studying in terms of epigenetics and glycosylation. [19]he aim of this work was to assemble several genome editing tools for knockout of human genes HNF1A and IL6ST and select the most efficient TALEN or CRISPR-Cas9 construct for each gene using the luciferase assay.The most successful genome editing tools will be used for generation of cell lines with biallelic knockout and functional analyses of the role of the HNF1A and IL6ST genes in regulation of plasma protein and IgG glycosylation, respectively.

EXPERIMENTAL TALEN Vector Construction
A single TALEN target site in the HNF1A gene was selected using the Mojo Hand TALEN design tool (http://www.talendesign.org/classic/). [20]Two TALEN target sites located in different exons of the IL6ST gene were selected from the collection of TALENs previously designed to target every human protein-coding gene. [21]he plasmid kit used for generation of TALENs was a gift from Daniel Voytas and Adam Bogdanove (Addgene kit # 1000000024). [7]The NN RVDs were used as guaninebinding modules.TALE repeats for the HNF1A gene were assembled according to the original protocol. [7]The assembly of two IL6ST TALE repeats was conducted using a modified protocol involving intermediary 6-module pFUS array vectors. [8]These modified pFUS vectors can reduce the number of module plasmids and improve the success rate of Golden Gate assembly.The additional plasmid kit used for building TALENs and TALE-TFs was a gift from Takashi Yamamoto (Addgene kit # 1000000030).The assembled TALE repeats were cloned into heterodimeric FokI destination vectors pCAG-T7-TALEN(Sangamo)-FokI-ELD-Destination (Addgene plasmid # 40132) and pCAG-T7-TALEN(Sangamo)-FokI-KKR-Destination (Addgene plasmid # 40131), both of which were gifts from Pawel Pelczar. [9]inal constructs were confirmed by bidirectional Sanger sequencing of repeat arrays using primers TAL_F1 and TAL_R2.Primer sequences are listed in Table 1.

CRISPR-Cas9 Vector Construction
The SpCas9 target sites in the HNF1A and IL6ST genes were selected from the list of suitable targets predicted using the CRISPR Design Tool (http://crispr.mit.edu/). [22]airs of oligonucleotides encoding sgRNA sequence were annealed and cloned between BbsI restriction sites of pX330-U6-Chimeric_BB-CBh-hSpCas9 expression vector using the established protocol. [12]The plasmid pX330-U6-Chimeric_BB-CBh-hSpCas9 was a gift from Feng Zhang (Addgene plasmid # 42230).The sequence of cloned sgRNAs was verified by sequencing from the hU6_Seq_F primer.Sequences of overlapping oligonucleotides used for sgRNA cloning (IL6ST_pX330_S and IL6ST_pX330_A for IL6ST sgRNA; HNF1A_pX330_S and HNF1A_pX330_A for HNF1A sgRNA) and the sequencing primer are listed in Table 1.

Generation of pGL3-IL6ST and pGL3-HNF1A Luciferase Reporter Vectors
Generation of pGL3-Linker vectors containing the target fragment was done essentially as described previously. [13]o generate the target sequence containing three binding sites for IL6ST-targeting nucleases, 548 bp fragment containing parts of exons 3, 7 and 12 from IL6ST cDNA flanked by BamHI restriction sites was synthetized (Blue Heron Biotech, WA, USA) and cloned into the pGL3-Linker BamHI site.The targeted IL6ST sequences (52 bp long for TALENs and 20 bp long for CRISPR-Cas9) were flanked by about 50 bp of their genomic context on either side, making them spaced about 100 bp relative to each other on the reporter plasmid.As all the HNF1A-targeting nucleases were directed to exon 1, a 552 bp fragment from exon 1 was amplified by PCR with Herculase II Fusion proofreading DNA polymerase (Agilent Technologies, Santa Clara, CA, USA) and cloned into the EcoRV site of the pGL3-Linker plasmid.All constructs were verified by Sanger sequencing.Primers used for PCR of the HNF1A gene target (HNF1A_For and HNF1A_Rev) and for sequencing from the pGL3-Linker plasmid (RVprimer3) are listed in Table1.

Transient Cell Transfection and Dual Luciferase Assay
HEK293 cells were plated in 24-well plates in complete DMEM (supplemented with 10% FBS, 4 mM L-glutamine and 100 U / mL-100 μg / ml Pen-Strep) 24 h prior to transfection.Cells were co-transfected in duplicates, at 50-70 % confluence with 40 ng of Renilla luciferase plasmid (phRG-TK, Clontech, Mountain View, CA, USA -for normalization of transfection efficiency), 400 ng of pGL3-Linker (negative control) or pGL3-IL6ST/pGL3-HNF1A plasmids and increasing amounts of nuclease (TALEN or Cas9) expressing plasmids (100 ng, 200 ng and 400 ng) using Lipofectamine 3000 (Thermo Fisher Scientific, Waltham, MA, USA) according to manufacturer's instructions.Five hours after transfection cells were washed in 1× PBS and fresh medium was added.Cells were harvested 72 h following transfection and nuclease activity was determined by dual luciferase assay (Promega, Fitchburg, WI, USA).Briefly, cells were lysed with 100 µL of 1× PLB (passive lysis buffer), incubated for 30 min at room temperature and centrifuged for 1 min at maximum speed.Firefly luciferase activity was determined by adding 30 µL of LAR II (luciferase assay reagent II) to 30 µL of the cell lysate.Subsequently, Renilla luciferase activity was determined by adding 30 µL of Stop & GLO substrate.Co-transfection with another plasmid expressing Renilla luciferase (phRG-TK) was used for normalization according to transfection efficiency.Luciferase activity was determined on a Fluoroskan Ascent luminometer (Thermo Fisher Scientific).

Assembly of TALEN and CRISPR-Cas9 Constructs for Knockout of the HNF1A and IL6ST Genes
TALEN constructs were successfully assembled with correct RVDs cloned into the destination vector backbone, as confirmed by sequencing.TALEN pairs were constructed as heterodimers (ELD and KKR domains of FokI) to decrease non-specific activity. [9]CRISPR plasmids had the correct guide RNA sequence inserted at the appropriate site, which was verified by sequencing from the U6 promoter.The exact sites within the HNF1A and IL6ST genes which were targeted by the constructs are depicted in Figure 1.In order to prevent spurious binding and non-specific activity, all targeted sites were selected so that they are unique within the human genome and sufficiently different from the next closest match.Constructs were targeted to the 5' part of the coding sequence to disrupt the gene product activity, avoiding the possibility of a resulting truncated protein with residual activity.

Activity Testing and Selection of Constructs Using Luciferase Assay
Constructs targeting specific regions in the HNF1A and IL6ST genes were tested for their ability to introduce double-strand breaks (DSBs) in reporter plasmids when co-transfected in HEK293 cells.We did not expect any interference between endogenous targets in the genome and reporter plasmids because both the reporter vector and expressed constructs (Cas9 or TALEN proteins) were in large excess relative to genomic targets, as we observed in experiments with similar constructs using immunofluorescence and western blots. [23]In Figure 2, the activity is presented as normalized luminescence.Luminescence measurements were in the range similar to the range reported in the study describing the luciferase assay, [13] which also reported a good correlation between the luciferase assay and in situ targeting.While TALEN-based construct could not induce DSBs above background level at the IL6ST targeted sites, the Cas9-based construct showed activity in a dose-dependent manner.Effectiveness was higher when lower amount of plasmid was transfected (100 ng) and decreased almost to the background level when the maximum amount (400 ng) of Cas9 plasmid DNA was used.This demonstrates the need for careful titration of the amount of construct used for transfection since too much plasmid might have an inhibitory effect on DSB generation activity.Constructs targeting the HNF1A gene both showed activity levels well above background (Figure 2).The TALEN pair showed higher activity when larger amounts of the plasmids were transfected.In contrast, the CRISPR-Cas9 construct did not show a dose-dependent response.
Given the increase in standard deviation with increasing concentration of the Cas9-based plasmid used for transfection, the lowest tested amount (100 ng) can be recommended for further experiments.

CONCLUSION
We have successfully constructed the Cas9-based and TALEN-based genome editing tools targeting the IL6ST and HNF1A genes.Targeted regions were selected at the beginning of the genes within the coding region in order to achieve the complete functional knockout once the biallelic DSBs were introduced successfully.
Based on the luciferase assay, we identified the Cas9-based construct as the most active for inducing DSBs in the IL6ST gene.Since activity was dose-dependent, we can recommend the lowest dose tested (100 ng) for HEK293 cells in our experimental setup.Both constructs targeting the HNF1A gene induced DSBs above the background level, with the TALEN pair showing stronger activity, especially when the cells were transfected with the larger amounts of plasmid.
We have demonstrated that our genome editing tools can efficiently induce DSBs in the targeted genes IL6ST and HNF1A.Those tools will be essential for our prospective functional studies of the role of these genes in plasma protein and IgG glycosylation, as well as their possible role in disease pathology via deregulated protein glycosylation pathways.Normalization was done relative to the background of the reporter plasmid without a co-transfected nucleaseexpressing plasmid, which was defined as 1 (dashed horizontal line).Luminescence was expressed as the ratio between Renilla and firefly luciferase to control for cell number, viability and transfection efficiency.Error bars represent standard deviation from two different transfections.

Figure 1 .
Figure 1.Sequences within the HNF1A (A) and IL6ST (B) genes targeted by the genome editing tools.Boxes represent gene schematics with coordinates within the hg38 human genome assembly and relative positions of the exons.Arrowheads along the gene model point in the direction of the coding strand of the gene (HNF1A is on the [+] and IL6ST on the [-] strand of the genome assembly).Exon models (gray and blue horizontal bars with the exon number indicated) are drawn on the same scale for both genes and highlighted regions correspond to the targeted sequences.Exon models are always in the direction of the coding strand.Expanded view for each targeted site shows the exact sequence of the target site.For CRISPR constructs (the HNF1A exon 1 to the right and the IL6ST exon 12) the 20 nt sgRNA sequence which binds the target site is indicated in capital letters.The PAM sequence NGG is shown on the coding strand.Red triangles indicate the cleavage site.Other expanded models show TALEN pairs binding to their recognition sequence, which is indicated by capital letters.Two TALENs from a pair bind to opposite strands.Each TALEN binding sequence is preceded by a T nucleotide (indicated in red) at its 5' end, which is necessary for proper interaction with the N-terminal part of the TALEN.RVDs are shown color-coded and aligned to the target sequence.The C-terminally fused FokI nuclease subunits ELD and KKR form an active heterodimer in the region between two facing TALEN binding sites and cleave the intervening sequence close to its center.

Figure 2 .
Figure 2. Activity of constructs targeting the IL6ST and HNF1A genes measured by the luciferase assay.Normalization was done relative to the background of the reporter plasmid without a co-transfected nucleaseexpressing plasmid, which was defined as 1 (dashed horizontal line).Luminescence was expressed as the ratio between Renilla and firefly luciferase to control for cell number, viability and transfection efficiency.Error bars represent standard deviation from two different transfections.

Table 1 .
Sequences of oligonucleotides used for cloning, PCR or sequencing.