Improved split fluorescent proteins for endogenous protein labeling

Self-complementing split fluorescent proteins (FPs) have been widely used for protein labeling, visualization of subcellular protein localization, and detection of cell–cell contact. To expand this toolset, we have developed a screening strategy for the direct engineering of self-complementing split FPs. Via this strategy, we have generated a yellow–green split-mNeonGreen21–10/11 that improves the ratio of complemented signal to the background of FP1–10-expressing cells compared to the commonly used split GFP1–10/11; as well as a 10-fold brighter red-colored split-sfCherry21–10/11. Based on split sfCherry2, we have engineered a photoactivatable variant that enables single-molecule localization-based super-resolution microscopy. We have demonstrated dual-color endogenous protein tagging with sfCherry211 and GFP11, revealing that endoplasmic reticulum translocon complex Sec61B has reduced abundance in certain peripheral tubules. These new split FPs not only offer multiple colors for imaging interaction networks of endogenous proteins, but also hold the potential to provide orthogonal handles for biochemical isolation of native protein complexes.

S elf-complementing split fluorescent proteins (FPs) are split FP constructs in which the two fragments can associate by themselves to form a fully functional FP without the assistance of other protein-protein interactions. By fusing one fragment on a target protein and detecting its association with the other fragment, these constructs have demonstrated powerful applications in the visualization of subcellular protein localization [1][2][3] , quantification of protein aggregation 4 , detection of cytosolic peptide delivery 5,6 , identification of cell contacts and synapses 7,8 , as well as scaffolding protein assembly 3,9,10 . Recently, they have also enabled the generation of large-scale human cell line libraries with fluorescently tagged endogenous proteins through CRISPR/Cas9-based gene editing 11 .
So far, the most commonly used self-complementing split FP was GFP 1-10D7/11M3 OPT (which we refers to as GFP 1-10/11 ), engineered from super-folder GFP (sfGFP) 12 . With the splitting point between the tenth and eleventh β-strands, the resulting GFP 11 fragment is a 16-amino acid (a.a.) short peptide. The corresponding GFP 1-10 fragment remains almost non-fluorescent until complementation, making GFP 1-10/11 well suited for protein labeling by fusing GFP 11 to the target protein and over-expressing GFP [1][2][3][4][5][6][7][8][9][10] in the corresponding subcellular compartments. However, there lacks a second, orthogonal split FP system with comparable complementation performance for multicolor imaging and multiplexed scaffolding of protein assembly. Previously, a sfCherry 1-10/11 system 3 was derived from super-folder Cherry, an mCherry variant optimized for folding efficiency 13 . However, its overall fluorescent brightness is substantially weaker than an intact sfCherry fusion, potentially due to its limited complementation efficiency 3 . Although two-color imaging with sfCherry 1-10/11 and GFP 1-10/11 has been done using tandem sfCherry 11 to amplify the sfCherry signal for over-expressed targets, it is still too dim to detect most endogenous proteins.

Results
Engineering split FPs with the spacer-insertion strategy. Inspired by assays previously used to optimize a protease reporter 9 , we devised a general strategy for the engineering of selfcomplementing split FPs. Specifically, we inserted a 32 a.a. spacer (DVGGGGSEGGGSGGPGSGGEGSAGGGSAGGGS) between the tenth and eleventh β-strands of a fluorescent protein (Fig. 1a). This long spacer hinders the folding of the FP, which results in a fluorescence level much lower than its full length counterpart without the spacer. To improve the fluorescence, we then subjected the spacer-inserted FP to multiple rounds of directed evolution in Escherichia coli. In each round, the coding sequence was randomly mutagenized or shuffled. Then, the brightest 1 or 2 colonies from each plate were selected for the next round.
We first aimed to produce a green-colored split FP that has improved brightness compared to GFP. A recent quantitative assessment of FPs 14 reported that the brightness of mNeonGreen (mNG) 15 , a yellow-green fluorescent protein derived from Branchiostoma Lanceolatum, is more than 2 times higher than sfGFP. mNG also demonstrates good photostablility, acid tolerance and monomeric quality. Guided by the crystal structure of the closely related lanGFP (PDB: 4HVF), we chose to split between tenth and eleventh β-strands of mNG and removed the additional GFP-like C terminus (GMDELYK), resulting in a 213-amino acid fragment which we called mNG 1-10 and a 16-amino acid, mNG 11 . Unlike the highly optimized 12 split GFP 1-10/11 system, whose fluorescence signal is only slightly reduced with the spacer insertion (Fig. 1b), inserting the spacer between mNG 1-10 and mNG 11 drastically reduced its fluorescence signal (Fig. 1c). Using our spacer-assisted screening system, after three rounds of random mutagenesis, we identified five substitutions in the 1-10 fragment (K128M, S142T, R150M, G172V, and K213M) and one substitution in eleventh strand (V15M) (Fig. 1c). We named this improved mNG, mNG2. In E. coli colonies grown on LB-agar plates, spacer-inserted mNG2 demonstrated a 10-fold improvement in brightness after directed evolution, which is 60% as bright as a full length mNG (Fig. 1c).
To improve the complementation efficiency of split sfCherry, we subjected the spacer-inserted sfCherry to three rounds of random mutagenesis and one round of DNA shuffling. We identified a new variant, named sfCherry2, which contains two mutations on the 1-10 fragment (E118Q and T128I) and one on the eleventh strand (G12A) (Fig. 1d). In E. coli colonies, spacerinserted sfCherry2 is~9 times as bright as the spacer-inserted original sfCherry (Fig. 1d). We have also used this strategy to split FusionRed, a red fluorescent protein with minimal cell toxicity and dimerization tendencies 16 . Unfortunately, we have not been able to obtain a brightly fluorescent, spacer-inserted variant even after four rounds of random mutagenesis.
Protein labeling by mNG2 1-10/11 in mammalian cells. To test protein labeling using mNG2 11 , we expressed two proteins, histone H2B (H2B) or clathrin light chain A (CLTA) fused to mNG2 11 in HeLa cells. With the co-expression of mNG2 1-10 , we could correctly image the localization of these proteins, similar to that using GFP 11 (Fig. 2a). Interestingly, when GFP 1-10 is expressed by itself, we observed a weak but non-negligible fluorescence background even without the expression of GFP 11 fragment. Comparing HEK 293T cell lines stably expressing GFP 1-10 from the strong SFFV promoter and the weak PGK promotor, we found that the background is positively correlated with the GFP 1-10 expression level (Fig. 2b). This background might be attributed to either weak fluorescence from GFP 1-10 or elevated cell autofluorescence caused by GFP 1-10 expression. In contrast, even when using the SFFV promoter, the signal from mNG2 1-10 -expressing cells is indistinguishable from wild-type cell autofluorescence (Fig. 2b).
This low background fluorescence from mNG2 1-10 is important for the labeling of endogenous proteins using FP 11 knock-in. Previously, utilizing the small GFP 11 tag we have developed a scalable scheme to fluorescently tag endogenous proteins in human cell lines by gene editing using electroporation of Cas9/sgRNA ribonucleoprotein (RNP) and a single-stranded donor DNA, followed by enrichment of integrated cells by fluorescence activated cell sorting (FACS) 11 . Similar to GFP 11 , mNG2 11 is also a 16 a.a. peptide, allowing its DNA and the homology arms to fit in a 200 nucleotide (nt) donor DNA that can be directly obtained from commercial synthesis, which is a key contributor to the efficiency and cost-effectiveness of our method. We compared genetic knock-in using GFP 11 or mNG2 11 for three targets: Lamin A/C (LMNA, inner nuclear membrane), RAB11A and CLTA. In all three cases, we observed similar or stronger fluorescence from mNG2 11 knock-in cells than GFP 11 knock-in cells. The background from non-integrated cells is substantially lower from mNG2 1-10 , despite the use of the low expression PGK GFP 1-10 and high expression SFFV mNG2 [1][2][3][4][5][6][7][8][9][10] . This better separation between FP 11 -integrated and non-integrated cells is especially advantageous for labeling low-abundance proteins, for example, SPTLC1 (Serine Palmitoyltransferase Long Chain Base Subunit 1) (Fig. 2d).
Finally, to compare the brightness of mNG2 11 with that of full length mNG2 on proteins, we constructed plasmids encoding full length mNG2 or mNG2 11 -CLTA fused to mIFP 17 through a self-cleaving P2A site, so that expression level differences can be normalized by mIFP signal level. We cotransfected HEK 293T cells with either of the two plasmids and a plasmid expressing mNG2 1-10 . With 11:1-10 transfected DNA ratio increased from 1:1 to 1:3 and 1:5, the normalized whole cell fluorescence by flow cytometry from mNG2 11 is approximately 50-60% of that from full length mNG2 (Fig. 2e). This relative brightness likely indicates the overall effect of complementation efficiency, chromophore maturation and potential purtubation to chromophore environment by the protein split. For reference, we performed the same measurement on GFP 11 and observed a brightness close to that of the full length sfGFP ( Supplementary  Fig. 1).
Protein labeling using sfCherry2 1-10/11 in mammalian cells. To quantify the improvement of sfCherry2 1-10/11 over sfCherry 1-10/11 for mammalian protein labeling and compare their performances with that of full length sfCherry and sfCherry2, we performed a similar measurement as in the previous section. We used TagBFP instead of mIFP for expression level normalization because sfCherry fluorescence bleeds into the mIFP detection channel (Fig. 3a). We found that sfCherry2 1-10/11 is~10 times as bright as sfCherry 1-10/11 in mammalian cells. We have also observed that full length sfCherry2 is~50% brighter than sfCherry 13 , suggesting a better overall folding efficiency. The normalized fluorescence signal of sfCherry2 1-10/11 reached~30% of full length sfCherry and~20% of full length sfCherry2.
To test sfCherry2 11 as a fluorescent tag for live-cell imaging, we constructed mammalian expression vectors encoding target proteins tagged with sfCherry2 11 at either the N or C terminus and co-expressed each with cytoplasmic sfCherry2 1-10 in HeLa cells ( Fig. 3b-i). For the diverse array of target proteins tested, including nuclear proteins histone H2B and heterochromatin protein 1 (HP1), cytoskeletal proteins β-actin, keratin and vimentin, focal adhesion protein zyxin, CLTA, and mitochondrial outer membrane protein TOMM20, we observed their correct localization. No fluorescent signal was detected with the sfCheryr2 1-10 fragment alone. We also demonstrated sfCherry2 11 labeling of endogenous proteins by knocking it into the ER translocon complex protein Sec61B in HEK 293T cells stably expressing both GFP 1-10 and sfCherry2 1-10 (293 T double1-10 ). After sorting for red-fluorescence-positive cells by FACS, confocal imaging ( Fig. 3j) confirmed the ER specificity of the sfCherry2 11 signal, which is substantially weaker than the other overexpression cases (Fig. 3b-i). We noticed that many cells also display additional punctate structures, which we have verified to be lysosomes (see the last result section). This artifact is common for mCherry-derived FPs when expressed for a prolonged period of time, especially when targeting proteins in secretory pathways 18 . It likely results from the fact that mCherry has a β-barrel structure resisting lysosomal proteolysis and a low pKa, such that it remains fluorescent in the acidic lysosome lumen 19 . We have not observed lysosome labeling in any of the non-ER targets that we have imaged in this work with sfCherry2 11 labeling.
Super-resolution microscopy using PAsfCherry2 1-10/11 . With their capability to change their fluorescence properties upon light (usually ultraviolet) irradiation, photoactivatable (PA) FPs 20 enable tracking of protein trafficking and imaging of labeled proteins using single-molecule switching-based super-resolution microscopy 21 (more commonly known as stochastic optical reconstruction microscopy, STORM, or photoactivated localization microscopy, PALM). Previously, mCherry has been engineered into a PA FP (PAmCherry1) after introducing 10 a.a. substitutions 22 . Because these substitutions are located on the 1-10 fragment, we merged them with the sequence of sfCherry2 1-10 (designated as PAsfCherry2 1-10 ). When imaging HEK 293T cells co-expressing sfCherry2 11 -H2B and PAsfCherry2 1-10 , we observed little initial fluorescence in red channel and a large fluorescence increase after 405 nm light irradiation (Fig. 4a), confirming the photoactivation property of complemented sfCherry2 11 and PAsfCherry2 1-10 .
Combining PAsfCherry2 1-10 with our FP 11 tag knock-in method, we can easily perform STORM imaging of endogenous proteins tagged by sfCherry 11 , which could avoid potential overexpression artifacts. For demonstration, we imaged endogenous CLTA in HEK 293 T cells. Because PAsfCherry2 is non-fluorescent before activation, we knocked-in a tandem GFP 11 -sfCherry2 11 tag in HEK 293T cells stably expressing GFP 1-10 , so that integrated cells can be isolated using GFP fluorescence signal. We then transfected these cells with PAsfCherry2 1-10 plasmid for STORM imaging. Compared to conventional wide-field images, clathrin-coated pits can be clearly seen as sub-diffraction-limit objects in the STORM images (Fig. 4b). We were able to collect on average 260 photons in each photoactivation event (Fig. 4c). We note that the labeling of most pits is incomplete, potentially because we labeled only one of the two clathrin light chain genes that have nearly identical functionalities 23 . In addition, non-homozygous knock-in and inefficient complementation between the two fragments can further contribute to the incomplete labeling. Selecting for homozygous knock-in cells and/or the use of tandem sfCherry2 11 tags 3 may solve the latter two problems.
Dual-color endogenous protein tagging in human cells. The orthogonal sfCherry2 11 now enables two-color imaging of endogenous proteins in order to visualize their differential spatial distribution and interactions. We tested double knock-in of GFP 11 and sfCherry2 11 either sequentially or simultaneously in HEK 293T cells stably expressing both GFP 1-10 and sfCherry2 1-10 (293 T double1-10 ). For sequential knock-in, we first performed  electroporation of Cas9 RNP and single-strand donor DNA for GFP 11 , sorted GFP positive cells by FACS, and then knocked-in sfCherry2 11 , followed by a second round of sorting (Fig. 5a). We targeted Sec61B using GFP 11 and then three proteins with distinctive subcellular localizations using sfCherry2 11 : LMNA, ARL6IP1 (tubular ER), and Sec61B (to verify co-localization). When imaging the double knock-in cells by confocal microscopy, we observed co-labeling of the nuclear envelop by Sec61B-GFP 11 and LMNA-sfCherry2 11 , almost complete colocalization of Sec61B-GFP 11 and Sec61B-sfCherry2 11 , and the exclusion of ARL6IP1-sfCherry2 11 from the nuclear envelope (Fig. 5b).
For simultaneous knock-in, we mixed Cas9 RNPs and donor DNAs for both targets in one electroporation reaction and used FACS to enrich cells positive for both green and red signal (Fig. 5c). We chose to tag ARL6IP1 and Sec61B with GFP 11 and sfCherry2 11 , respectively. We obtained 0.4% double-positive cells, consistent with the multiplication of the 28 and 1.2% efficiency for GFP 11 and sfCherry2 11 single knock-in into the ARL6IP1 and Sec61B gene, respectively ( Supplementary Fig. 2). The lower efficiency for sfCherry2 11 knock-in could be attributed to its overall lower fluorescence signal level compared to GFP 11 , making it more difficult to distinguish from the background by FACS. Confocal microscopy of sorted cells showed similar  Dual-color images reveal peripheral ER with reduced Sec61B.
The endoplasmic reticulum (ER) is a large organelle that spreads throughout the cytoplasm as a continuous membrane network of tubules and sheets with a single lumen 24 . The Sec61 complex, which is composed of alpha, beta, and gamma subunits, is the central component of the protein translocation apparatus of the ER membrane 25 . Researchers have traditionally used Sec61B as an ER marker for imaging because it is thought to be distributed ubiquitously throughout the ER membrane, including nuclear envelope, sheet-like cisternae, and a polygonal array of tubules 26,27 . However, our dual-color images of endogenous Sec61B and ARL6IP1 in HEK 293 T cells using GFP 11 and sfCherry2 11 , showed that certain peripheral ER tubules marked by ARL6IP1 contain very weak to non-detectable Sec61B signal (Fig. 6a). This large reduction of Sec61B signal in certain ER tubules is clearly visible in cross-section, where the Sec61B signal is at the background level despite Sec61B having the brighter GFP 11 label than sfCherr2 11 on ARL6IP1 (Fig. 6b). We have also ruled out the presence of sfCherry2-containing lysosomes in these areas by staining lysosomes with lysotracker ( Supplementary  Fig. 3). By visual inspection in z maximum projects of 9 confocal images containing a total of 108 cells, we identified that 29 of them contain such peripheral ER tubules lacking strong Sec61B presence. Furthermore, we confirmed this differential distribution of ER membrane proteins by swapping the tags on the two proteins (Fig. 6c).

Discussion
In summary, we have devised a simple platform for the engineering of self-complementing split fluorescence proteins. Using this platform, we have developed a bright yellow-green-colored mNG2 1-10/11 system and substantially increased the performance of the red-colored sfCherry2 1-10/11 . These split constructs have allowed us to obtain two-color or super-resolution images of endogenous proteins and have revealed ER tubules with greatly reduced abundance of the translocon component Sec61B. Our platform can be easily extended to the engineering of other self-complmenting split FPs with distinct colors (e.g., mTurquoise2, mTagBFP2) 28,29 , good photoactivation performance (e.g., mMaple3, PATagRFP) 30, 31 , or new functionalities (e.g., pH sensitivity) 32 . We note that for non-selfcomplementing split FPs, which are used to detect protein-protein interactions in bimolecular fluorescence complementation (BiFC) assays 33 , a different engineering platform is needed to ensure minimum affinity between the two FP fragments by themselves. On the other hand, although our optimizated spacer-inserted sfCherry2 has already reached the brightness level of sfCherry without spacer, split sfCherry2 1-10/11 is still much dimmer compared to full length sfCherry. Further improvement could be possible using a longer or more rigid spacer, which adds more spatial hindrince to complemetation, or a spacer containing the self-cleaving P2A site, which better mimicks the actual split system.
With its higher ratio of complemented signal to the background of FP 1-10 -expressing cells compared to GFP 1-10/11 , our mNG2 1-10/11 system will be advantageous for tagging low-abundance endogenous proteins 3 . More importantly, it provides an orthogonal handle for scaffolding protein oligomerization 3 and biochemical isolation of native protein complexes 11 . For the sfCherry2 1-10/11 system, the fact that we have only observed lysosome puncta when labeling ER proteins suggests that this problem could be potentially resolved by increasing its pKa with rational designs 32 . Moreover, our engineering platform can be easily adapted to generating other red split FPs based on novel bright FPs such as TagRFP-T 34 , mRuby3 35 , or mScarlet 36 .
Given the wide use of Sec61B as a marker to label the entire ER, it is surprising that Sec61B has a substantially reduced abundance in certain peripheral tubular structures labeled by ARL6IP1. It is possible that these tubules, often containing a closed end pointing toward the edge of the cell, serve distinct functions from other ER tubules. Further analysis of their protein compositions and contacts with plasma membrane, cytoskeleton and other  Fig. 6 Reduced abundance of Sec61B in certain peripheral ER tubules. a Dual-color image of a Sec61B-GFP 11 and ARL6IP1-sfCherry2 11 knock-in HEK 293 T cell. Arrows indicate ER tubules marked by ARL6IP1 but contain much reduced Sec61B signal. b A cross-section in a, with red arrows indicating petipherial ER tubules with Sec61B signal at the background level, green arrows indicating ER tubules positive for both Sec61B and ARL6IP1, and the black arrow indicating a lysosome marked by LysoTracker Blue. c Dual-color image of a Sec61B-sfCherry2 11 and ARL6IP1-GFP 11 knock-in HEK 263 T cell. Scale bars are 5 µm organelles may help clarify their functional roles. This observation also suggests that the base of these ER tubules may contain a diffusion barrier that hinders the crossing of Sec61B. Because over-expressing exogenous ER shaping proteins can drastically alter ER morphology 37 , this finding highlights the opportunity of visualizing endogenous proteins to study native interaction networks.

Methods
Mutagenesis and screening. The amino acid sequence of mNG was obtained from the published literature 15 . Because the crystal structure of mNG has not been determined, we used the lanGFP (PDB: 4HVF) structure as a guide to decide splitting between K213 and T214 at the middle of the loop between tenth and eleventh β-strands. And we also removed the additional seven-residue GFP-like C terminal (GMDELYK) to minimize the size of mNG11 tag. The amino acid sequence of sfCherry and the split site were from our previous published literature 3 . The spacer-inserted split mNG and split sfCherry were subjected to random mutagenesis using a GeneMorph II Random Mutagenesis Kit (Agilent). The cDNA library pool was transformed into E. coli BL21 (DE3) electrocompetent cells (Lucigen) by electroporation using the Gene Pulser Xcell Electroporation Systems (BioRad). The expression library was plated on nitrocellulose membrane (Whatman, 0.45 µm pore size), which was sitting on an LB-agar plate with 30 mg/ml kanamycin. After overnight growth at 37°C, the nitrocellulose membrane was transferred to a new LB-agar plate containing 1 mM isopropyl-β-Dthiogalactoside (IPTG) and 30 mg/ml kanamycin and cultured for 3-6 h at 37°C to induce the protein expression. Clone screening was performed by imaging the second LB-agar plate using a BioSpectrum Imaging System (UVP). The brightest candidates in each library were pooled (typically 20-30 selected from~10,000 colonies) and used as templates for the next round of evolution. For DNA shuffling, we used the method described by Yu et al 38 . Specifically, we digested plasmids containing spacer-inserted sfCherry 1-10/11 variants with BamHI and XhoI. Fragments of~800 bp were purified from 1% agarose gels using zymoclean gel DNA gel recovery kit (Zymo Research). The DNA concentrations were measured and the fragments were mixed at equal amounts for a total of~2 µg. The mixture was then digested with 0.5 unit DNase I (New England Biolabs) for 13 min and terminated by heating at 95°C for 10 min. The DNase I digests were run on a 2% agarose gel, and the size of 50-100 bp fragments were cut out and purified. Ten microliter of purified fragments was added to 10 µl of Phusion High-Fidelity PCR Master Mix and reassembled with a PCR program of 30 cycles, with each cycle consisting of 95°C for 60 s, 50°C for 60 s, and 72°C for 30 s. After gene reassembly, 1 µl of this reaction was amplified by PCR. The shuffled library was expressed and screened as described above. After the directed evolution was saturated, the brightest clone was selected and the DNA sequences of the constructs were confirmed by sequencing (Quintara Biosciences).
Analytical flow cytometry was carried out on a LSR II instrument (BD Biosciences) and cell sorting on a FACSAria II (BD Biosciences) in Laboratory for Cell Analysis at UCSF. Flow cytometry data analysis (gating and normaliztion) was done using the FlowJo software (FlowJo LLC) and plotted in GraphPad Prism.
Fluorescence microscopy. We transfected human HeLa cells grown on an 8-well glass bottom chamber (Thermo Fisher Scientific) using FuGene HD (Promega). In order to achieve better cell attachment, 8-well chamber was coated with Fibronectin (Sigma-Aldrich) for 1 h before seeding cells. Total plasmid amount of 120 ng per well with the FP 11 to FP 1-10 ratio in 1:2 was used to achieve optimal labeling. Thirty-six to forty-eight hours after transfection, live cells were imaged and then fixed with 4% paraformaldehyde. For lysosome staining, LysoTracker Blue DND-22 (Thermo Fisher Scientific) was added directly to the culture medium (50 nM final concentration) and incubate for 30 min before imaging.
Most of live-cell imaging was acquired on an inverted Nikon Ti-E microscope (UCSF Nikon Imaging Center), a Yokogawa CSU-W1 confocal scanner unit, a PlanApo VC 100x/1.4NA oil immersion objective, a stage incubator, an Andor Zyla 4.2 sCMOS or an Andor iXon Ultra DU888 EMCCD camera and MicroManager software. PAsfCherry2 photoactivation in H2B labeling and split mNG2 vs. split GFP comparison in H2B labeling were imaged on a Nikon Ti-E inverted wide-field fluorescence microscope equipped with an LED light source (Excelitas X-Cite XLED1), a 100X NA 1.40 PlanApo oil immersion objective, a motorized stage (ASI) and an sCMOS camera (Hamamatsu Flash 4.0). Microscopy images were subjected to background subtraction using a rolling ball radius of 100 pixels in ImageJ Fiji 39 software. Analysis of conventional fluorescence microscopy images were performed in ImageJ.
STORM image acquisition and analysis. Super-resolution images were collected using a TIRF-STORM microscope, home-built from a Nikon Eclipse Ti-E inverted microscope. A 405 nm activation laser (OBIS 405, Coherent), 488 nm imaging laser (OBIS 488, Coherent) and a 561 nm imaging laser (Sapphire 561, Coherent) were aligned, expanded, and focused at the back focal plane of the UPlanSApo 1.4NA 100X oil immersion objective (Olympus). Images were recorded with an electron multiplying CCD camera (iXon + DU897E-C20-BV, Andor), and processed with a home-written software. The OBIS lasers were controlled directly by the computer whereas the Sapphire 561 nm laser was shuttered using an acoustic optical modular (Crystal Technology). A quadband dichroic mirror (ZT405/488/561/640rpc, Chroma) and a band-pass filter (ET525/50 m, Chroma for 488 nm and ET595/50 nm, Chroma for 561 nm) separated the fluorescence emission from the excitation light. Because PAsfCherry2 is initially in a dark state, knock-in cells were carefully located through GFP, using low green intensity (11 μW) and a wide-field setting (5 Hz), so as to minimize potential bleaching of PAsfCherry2. Cells were then subjected to STORM imaging, and if fluorophores appeared in the red channel, it was clear that they were indeed expressing both fluorescent proteins. Maximum laser power used during STORM measured before the objective was 16 μW for 405 nm and 25 mW for 561 nm. By increasing the power on the 405 nm laser (zero to 16 μW) during imaging, PAsfCherry2 was photoconverted from a dark to a red state, and could be observed as single fluorophores. These images were recorded at a frame rate of 30 Hz, with an EMCCD camera gain of 60. During image acquisition, the axial drift of the microscope stage was stabilized by a homebuilt focus stabilization system utilizing the reflection of an IR laser off the sample. Frames were collected until sample was bleached. Analysis of the STORM images was performed on the Insight3 software 40 . Cells were imaged in PBS buffer.
Preparation and electroporation of Cas9/sgRNA RNP. All synthetic nuclei acid reagents were purchased from Integrated DNA Technologies (IDT). sgRNAs and Cas9/sgRNA RNP complexes were prepared using the following procedure 11 . sgRNAs were obtained by in vitro transcribing DNA templates containing a T7 promoter (TAATACGACTCACTATAG), the gene-specific 20-nt sgRNA sequence and a common sgRNA scaffold region. DNA templates were generated by overlapping PCR using a set of 4 primers: 3 primers common to all reactions (forward primer T25: 5′-TAA TAC GAC TCA CTA TAG-3′; reverse primer BS7: 5′-AAA AAA AGC ACC GAC TCG GTG C-3′ and reverse primer ML611: 5′-AAA AAA AGC ACC GAC TCG GTG CCA CTT TTT CAA GTT GAT AAC GGA CTA GCC TTA TTT AAA CTT GCT ATG CTG TTT CCA GCA TAG CTC TTA AAC-3′) and one gene-specific primer (forward primer 5′-TAA TAC GAC TCA CTA TAG NNN NNN NNN NNN NNN NNN NNG TTT AAG AGC TAT GCT GGA A-3′). For each template, a 100-μL PCR was performed with iProof High-Fidelity Master Mix (BioRad) reagents with the addition of 1 μM T25, 1 μM BS7, 20 nM ML611, and 20 nM gene-specific primer. The PCR product was purified and eluted in 12 μL of RNAse-free DNA buffer (2 mM Tris pH 8.0 in DEPCtreated water). Next, a 100-μL in vitro transcription reaction was performed with 300 ng DNA template and 1000 U of T7 RNA polymerase in buffer containing 40 mM Tris pH 7.9, 20 mM MgCl 2 , 5 mM DTT, 2 mM spermidine and 2 mM of each NTP (New England BioLabs). Following a 4 h incubation at 37°C, the sgRNA product was purified and eluted in 15 μL of RNAse-free RNA buffer (10 mM Tris pH 7.0 in DEPC-treated water). The sgRNA was quality-checked by running 5 pg of the product on a 10% polyacrylamide gel containing 7 M urea (Novex TBE-Urea gels, Thermo Fisher Scientific).
For the knock-in of mNG2 11 , sfCherry2 11 , or GFP 11 , 200-nt homology-directed recombination (HDR) templates were ordered in single-stranded DNA (ssDNA) form as ultramer oligos (IDT). For knock-in of GFP 11 -sfCherry2 11 in tandem, HDR template was ordered in double-stranded (dsDNA) form as gBlock fragments (IDT) and processed to ssDNA as described below. For the complete set of DNA sequence used for sgRNA in vitro transcription or HDR templates, see Supplementary Tables 2 and 3.
Cas9 protein (pMJ915 construct, containing two nuclear localization sequences) was expressed in E. coli and purified by the University of California Berkeley Macrolab 41 . 293T mNG2 1-10 stable cells or 293T double1-10 cells were treated with 200 ng/mL nocodazole (Sigma) for~15 h before electroporation to increase HDR efficiency 42 . Cas9/sgRNA RNP complexes were assembled with 100 pmol Cas9 protein and 130 pmol sgRNA just prior to electroporation and combined with HDR template in a final volume of 10 μL. Electroporation was carried out in Amaxa 96-well shuttle Nuleofector device (Lonza) using SF-cell line reagents (Lonza). Nocodazole-treated cells were resuspended to 10 4 cells/μL in SF solution immediately prior to electroporation. For each sample, 20 μL of cells was added to the 10 μL RNP/template mixture. Cells were immediately electroporated using the CM130 program and transferred to 24-well plate with pre-warmed medium. Electroporated cells were cultured for 5-10 days prior to FACS selection of integrated cells.
Preparation of sfCherry2 11 -GFP 11 -CLTA ssDNA Template. sfCherry2 11 -GFP 11 -CLTA ssDNA template was prepared from a commercial dsDNA fragment (gBlock, IDT) containing the template sequence preceded by a T7 promoter 11 . The dsDNA fragment was amplified by PCR using Kapa HiFi reagents (Kapa Biosystems) and purified using SPRI beads (AMPure XP resin, Beckman Coulter) at a 1:1 DNA:resin volume ratio (following manufacturer's instructions) and eluted in 25 μL RNAse-free water. Next, RNA was produced by in vitro transcription using T7 HiScribe reagents (New England BioLabs). Following a 4 h reaction at 37°C, the mixture was treated with 4U TURBO DNAse (Thermo Fisher Scientific) and incubated for another 15 min at 37°C. The RNA product was then purified using SPRI beads and eluted in 60 μL RNAse-free water. DNA:RNA hybrid was then synthesized by reverse transcription using Maxima H RT reagents (Thermo Fisher Scientific). Finally, ssDNA was made by hydrolyzing the RNA strand through the addition of 24 μL NaOH solution (0.5 M NaOH + 0.25 M EDTA, in water) and incubation at 95°C for 10 min. The final ssDNA product was purified using SPRI beads at a 1:1.2 DNA:resin volume ratio and eluted in 15 μL water.
Data availability. The data that support the findings of this study are available from the corresponding author upon reasonable request. All relevant DNA sequences are listed in the Supplementary Information.