Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles created by CRISPR-Cas9

CRISPR-Cas9 is efficient enough to knock out both alleles directly by introducing out-of-frame mutations. We succeeded in making biallelic on-target frameshift mutations of the endogenous Gli3 gene; however, the GLI3 protein was expressed in all six of the established cell lines carrying homozygous out-of-frame mutations. We developed a dual-tagged expression vector and proved that illegitimate translation (ITL) was the cause of the unexpected Gli3 expression. Thus, gene expression must be examined even if designed on-target out-of-frame sequences are introduced by genome editing. In addition, it is highly recommended to pre-examine the occurrence of ITL in vitro prior to the design and construction of any genome-editing vectors. In vitro assay systems such as the dual-tagged ITL assay system developed in this study should aid the identification and elucidation of ITL-based human diseases and gene expression.


Supplementary Discussion
Supplementary Discussion 1: Enhanced ITL expressions in 3xFlag short Gli3 vectors All of the short Gli3 expression vectors of WT, del97G, and insGafter97G ( Figure S9A) expressed two common ITL peptides corresponding to markers d and f (Figures S9C and D). The two common signals did not appear on marker a ( Figures S9C and D). The marker a expression vector is completely identical to the WT vector, except for the 5′ end of the 3xFlag tag sequence.
The additional two common signals are thus likely due to the artificial cis-effect of 3xFlag.
Notably, the additional two common bands found in the short Gli3 expression vectors ( Figure   S9C and D) were never observed with the same dual-tagged expression vectors carrying the 4749-bp full Gli3 ORF sequence ( Figures S6C and S8C). Thus, part or all of the sequence(s) eliminated from the original full 4749-bp ORF was also necessary to suppress the expression of the common additional ITL peptides. On the other hand, the short Gli3 expression vector without 3xFlag (the GLI3 size markers in Figure S9B) did not express the additional common ITL peptides. Thus, both 3xFlag and elimination of part (or all) of the 4749-bp ORF are necessary for the additional two common ITL expressions. Thus, the replacement of 3xFlag may also be a possible future improvement to eliminate such additional ITL products.
Conversely, 3xFlag with the 1110-bp ORF may be useful to experimentally study translation initiation in vitro by enhancing the very minor ITL expressions. The 3xFlag likely has an artificial cis-effect on both native and ITL translations. Molecular studies on a series of deleted fragments of 4749-bp ORF with 3xFlag may reveal the precise ITL mechanisms by enhancing the ITL phenomenon in vitro.
Supplementary Discussion 2: ITL initiation from 3xFlag sequence The largest and strongest unique signal in lane insGafter97G (Figures S9C and D) was similar to or slightly larger than that of marker a, although the ATG codon of Gli3 was eliminated in the insGafter97G vector ( Figure S9A). Instead, as shown in Figure S9E, four out-of-frame ATG sequences in 3xFlag became in-frame to the 1110-bp ORF in this vector. Thus, -11ATG or -5ATG in 3xFlag seemed to be the initiation site of the ITL of the strong unique peptide in the insGafter97G vector (see 3xFlag+1 frame in Figure S9E). No ATG codons exist in 3xFlag in WT and +2 frame, except at the 5′ end of the original -66ATG. Thus, the largest ITL peptide in the insGafter97G vector was likely initiated from an ATG codon inside the 3xFlag tag.
If we use some other available tags at the 5′ end of the expression vector, such unexpected enhanced ITL effects of 3xFlag may be eliminated. For instance, myc or Strep-tag II does not have any ATG codons in any frame other than the original 5′ end ATG. Indeed, the WT and del97G vectors in Figure S9 did not initiate any ITL products within the 3xFlag sequence at all, since no in-frame ATG codons in 3xFlag existed for these two vectors.
Supplementary Discussion 3: Leaky scanning as a model for Gli3 ITL In the high-resolution size analysis in Figure S9, we found that the unique signal in the del97G lane, which corresponded to marker b, is seemingly expressed from +66ATG ( Figure S2). On the other hand, two unique ITL signals were identified in the insGafter97G lane: one was a fainter band corresponding to the +83ATG product ( Figure S3) and the other was a larger and stronger signal similar to marker a ( Figures S9C and D). Thus, all of the three unique ITL bands found in del97G and insGafter97G vectors ( Figure S9A) were initiated upstream of the newly created stop codons ( Figures S2, S3, and S9E). Only leaky scanning can initiate ITL before the appearance of the stop codon, but the translation reinitiation should occur after the stop codon. In conclusion, 4 the ITL peptides detected in vivo (lane 2B2 in Figure 2B) are seemingly expressed by leaky scanning.

Supplementary Materials and Methods
Determination of cDNA sequences of the mutant cell line 2B2 The total RNA of the 2B2 cell line was isolated with ISOGEN (Nippon Gene). The isolated RNA was reverse-transcribed with SuperScript III reverse transcriptase (Invitrogen) and oligo(dT) 20 primers, in accordance with the manufacturer's protocol. The following primers in exon 1 and exon 5 of mouse Gli3 were used for RT-PCR: 5′-CAGGTCTGTGGATTTGGGAC-3′ (exon1-F) and 5′-GATCCTAATGAAGGGCAAGTC-3′ (exon5-R). For direct sequencing, PCR products were sequenced with either the exon1-F or the exon5-R primer ( Figure S1A) using an ABI 3130 Genetic Analyzer (Life Technologies). For colony PCR, the same PCR products were cloned into the pGEM-T Easy vector (Promega), which was then used to transform competent DH5α E. coli cells. Next, each cloned cDNA was amplified by the T7 and SP6 primers and sequenced by the exon1-F and exon5-R primers. DNA sequences of a total of 19 colony PCR products were determined ( Figure S1B).  Putative translation termination and start sites in the del97G allele. The partial nucleotide sequence of mouse Gli3 ORF (NM008130) from the start codon (+1) to +300 is shown as WT GLI3. The putative premature N-peptide and ITL-GLI3 are shown as del97G-N and del97G-C, respectively. Additional residue shifts caused by the one-base-pair deletion of 97G are shown in blue. Possible ATG sequences acting as an ITL initiation codon for ITL-GLI3 are shown in red.

Supplementary Figure Legends
The target region against exon2 is underlined, and the deleted nucleotide of 97G is boxed in red.

Figure S3.
Putative translation termination and start sites in the insGafter97G allele. The partial nucleotide sequence of mouse Gli3 ORF (NM008130) from the start codon (+1) to +300 is shown as WT GLI3. The putative premature N-peptide and ITL-GLI3 are shown as insGafter97G-N and insGafter97G-C, respectively. Additional residue shifts caused by the one-base-pair insertion of G after 97G are shown in blue. Possible ATG sequences acting as an ITL initiation codon of ITL-GLI3 are shown in red. The target region against exon2 is underlined, and the inserted nucleotide of G is boxed in red. Putative translation termination and start sites in the del229A allele. Partial nucleotide sequence of mouse Gli3 ORF (NM008130) from the start codon (+1) to +420 is shown as WT GLI3. The putative premature N-peptide and ITL-GLI3 are shown as del229A-N and del229A-C, respectively. Additional residue shifts caused by the one-base-pair deletion of 229A are shown in blue. Possible ATG sequences acting as an ITL initiation codon of ITL-GLI3 are shown in red.
The target region against exon3 is underlined, and the deleted nucleotide of 229A is boxed in red.     There are four (red color) and no ATG codons in the +1 and +2 frames, respectively. The expected peptide sequences in frames 0, +1, and +2 are shown in black, blue, and gray residues, respectively. An arrow shows the first 6-nucleotide sequence of the Gli3 ORF from +4 by eliminating the Gli3 ATG codon. WT allele No. of colonies 0 8 11 Colony PCR Figure S1 A B +94 10 20 30 40