Stable mammalian producer cell lines for structural biology

The mammalian cell lines HEK293 and CHO have become important expression hosts in structural biology. Generating stable mammalian cell lines remains essential for studying the function and structure of recombinant proteins, despite the emergence of highly efficient transient transfection protocols. Production with stable cell lines can be scaled up easily and high volumetric product yield can be achieved. Protein structure reports of the past two years that used stable cell lines were surveyed for this review. Well-established techniques and novel approaches for generating stable cell lines and stable cell pools are presented, including cell sorting, site-specific recombination, transposons, the Lentivirus system and phage integrases. Host cell line optimization by endoglycosidase overexpression and sequence-specific genome engineering is highlighted.


Introduction
Generating pure, soluble and homogeneous protein is a major step in the overall process of protein structure determination. The choice of the expression system has a great influence on the quality and quantity of the produced recombinant protein. The Human Embryonic Kidney cell line HEK293 and the Chinese Hamster Ovary cell line CHO are excellent host cells for robust secretion of mammalian proteins with appropriate posttranslational modifications [1]. These cell lines are used for production of secreted mammalian and viral proteins and soluble ectodomains of transmembrane proteins, but also for complete membrane proteins (Table 1). Cytosolic proteins and complexes can be produced with stable mammalian cell lines, but the yield is usually low compared with other expression systems. The longawaited crystal structure of the cytosolic mTORC complex was obtained upon overexpression of its subunits in a stable HEK293 cell line [2 ].
For recombinant protein overexpression, an expression vector for the protein of interest is transferred into the cell's nucleus by transfection. In a transient transfection experiment, the protein of interest is harvested a few days later. Alternatively, a stable cell line is generated from transfected cells that have integrated the vector into their genome. Stable cell lines overexpress the target protein uniformly and indefinitely. Protein production with a stable cell line is therefore reproducible and can be scaled up easily. Recent protein structure reports using HEK293 or CHO cell lines were surveyed for this review and it was found that transient transfection and stable cell lines were used with around the same frequency. New and improved technologies for generating stable cell lines are expected to increase their use in the future.
Establishing stable cell lines requires substantial time and effort in comparison to transient transfection processes. An expression vector with the gene of interest has to be inserted into the host cell genome. Using standard methods, the efficiency of genome integration is low. Moreover, only very few cells will integrate the vector into a highly transcribed region and will produce sufficient amounts of recombinant protein. Even then, transgene expression is often silenced upon long term cell culture. Isolating and characterizing a large number of clones is therefore required, which can take several months of laboratory work. Fortunately, stable cell line technology is improving rapidly on the levels of host cell line, integration process and selection of high-producer cells.
Crystal structures of proteins produced by stable cell lines reported during the past two years are listed in Table 1. The table provides an overview of the host cell lines and experimental techniques currently used in structural biology. The most common approach involves transfection with vectors carrying a selectable marker for random chromosomal integration, followed by isolation and screening of single cell clones. This process was reviewed in three recent publications [3,4,5] and protocols for establishing stable cell lines for structural biology with antibiotic selection markers have been published [6,7]. The performance of the antibiotics hygromycin B, neomycin, puromycin and Zeocin as selection markers for stable cell line development was recently compared [8]. Table 1 The Protein Data Bank (PDB) was searched for mammalian protein structures, excluding antibodies, that were released later than July 2012. Selected proteins that had been produced by stable cell lines are listed Most secreted mammalian proteins are glycosylated, which can interfere with crystallization [9 ]. Glycosylation sites that are not required for folding or secretion are therefore removed by mutagenesis [9 ]. The processing of N-linked glycans from the high-mannose type to the larger, complex type requires the enzyme N-acetylglucosaminyl-transferase I (GnTI, MGAT1) ( Figure 1). The GnTI-deficient host cell lines HEK293S GnTI À and CHO Lec3.2.8.1 produce glycoproteins with high-mannose type glycans. These glycans can be readily trimmed further to a single GlcNAc sugar unit by endoglycosidase treatment [10,11]. The HEK293S GnTI À cell line [10] (ATCC CRL-3022) is currently the most popular cell line in structural biology (Table 1). It is used for transient and stable expression. High-mannose type glycosylation is also obtained by cultivation with the GnTI inhibitor kifunensine [12]. Kifunensine was applied for crystal structure determination of the glycoproteins O-fucosyltransferase [13], folate receptor a [14] and RNaseT2 [15] ( Table 1).

Intracellular endoglycosidase overexpression
Recently the 'GlycoDelete' HEK293 cell line was developed that carries a heterologous endoglycosidase in its Golgi apparatus [16 ]. This resulted in robust secretion of deglycosylated glycoproteins ( Figure 1). The Glyco-Delete study demonstrates that protein deglycosylation is possible in the Golgi apparatus, where glycoproteins traveling the secretory pathway have already passed the quality control in the endoplasmic reticulum. Cells that secrete glycoproteins with minimal glycosylation would enable crystallization without further in vitro deglycosylation.
Sequence-specific genome engineering The cell lines CHO Lec3.2.8.1 and HEK293S GnTI À were created by chemical mutagenesis, followed by selection for glycosylation deficiency. Chemical mutagenesis lacks specificity, leading to random mutations throughout the genome. Sequence-specific genome engineering represents an elegant alternative that greatly reduces unwanted mutations. Meganucleases with long recognition sequences cleave genomic DNA at rare sites and can be used to introduce gene-inactivating mutations more specifically [17]. Nucleases linked to programmable, sequence-specific DNA-binding modules, such as zinc finger nucleases (ZFN), TALE nucleases and the CRISPR/Cas9 nucleases allow for modification of arbitrary genetic loci with excellent specificity [18]. Both alleles of a gene can be inactivated with high frequency with these nucleases. Especially with CRISPR/Cas9, mammalian genome engineering has become simple, reliable and cheap [19].

Glutamine synthetase knockout cells
The glutamine synthetase (GS) gene is used widely as a selection marker for stable CHO cells. Cells overexpressing GS can be selected by inhibiting the endogenous glutamine synthetase with the inhibitor methionine sulfoximine (MSX). The GS marker and MSX-selection of stable CHO cell lines were used to produce protein for structure determination of the ICAM-5 ectodomain [20], a TCR:MHC:antigen heterotetramer [21], aminopeptidase N [22], the insulin receptor ectodomain [23] and acetylcholinesterase [24]. A novel CHO host cell was created by knocking out the GS gene and the GnTI gene with specific zinc finger nucleases [25 ]. With GS-deficient CHO host cells, MSX selection of cells stably transfected with a GS vector is much more efficient than with normal cells. The selection of low-producing cell clones was largely prevented by using a GS-deficient host cell line [26]. Moreover, among MSX-selected, stably transfected cell clones, a six-fold higher proportion of top producing clones (>2 g/L) was found when a GSdeficient host cell line was used, in comparison to the original host cell line.
New cell lines developed by sequence-specific genome engineering such as GnTI and GS double knockouts are potentially useful for structural biology. Genome engineering could also be used for removing genes of  unwanted host cell proteins that are co-purified with the target protein.

Methods of stable cell line generation GFP and cell sorting
Transgene insertion into a host cell chromosome upon transfection is a rare event and, to make things worse, most of the integrated transgenes will be inactivated by epigenetic mechanisms. Cells that received an active genome-integrated transgene therefore have to be selected by a marker. Antibiotic resistance is commonly used, but can be an unreliable reporter for high level, uniform transgene expression [27]. Green fluorescent protein (GFP) is a useful alternative [28]. Cells that stably integrate a GFP expression vector are directly identified by intracellular fluorescence and can be isolated by preparative FACS cell sorting. Cell sorting can isolate a small number of high-producer clones from millions of cells. By repeating the process at different time points, cells that express GFP stably over time can be isolated. By this method, clonal cell lines with constantly high GFP expression over several months can be obtained without applying selective pressure [27].
GFP expression can be coupled to expression of the gene of interest by constructing internal ribosome entry site (IRES)-based bicistronic vectors comprising the gene of interest and a GFP gene. IRES-based GFP co-expression in combination with cell sorting and antibiotic selection was used to generate a stable HEK293S GnTI À cell line for production of a soluble integrin a X b 2 heterodimer for structure determination [29]. In an alternative approach, the GFP gene can be excised by site-specific recombination upon clone isolation, thereby bringing a gene of interest, located downstream, under the control of the promoter driving transgene expression (Figure 2) [27,30].
Cell sorting may require optimization to maintain cell viability. In our own experience, the viability of CHO Lec3.2.8.1 cells grown in suspension can be low upon cell sorting. Shear stress was reported to cause low cell viability upon sorting of insect cell lines. Addition of Pluronic acid F-68 improved the survival of sorted insect cells [31 ] and may also have a positive effect on mammalian cells.

Secreted GFP fusion proteins
A secreted GFP marker was used for studying the structure of a sialyltransferase [32 ]. The sialyltransferase was fused to a codon-optimized, folding-enhanced GFP version called 'superfolder' [33]. The fusion protein was secreted by a stable HEK293S GnTI À cell line with high yield (75 mg/L). The GFP tag allowed for direct protein quantification by fluorescence spectroscopy during cell line development and protein purification. It was removed by proteolysis from the sialyltransferase before crystallization.

Recombinase-mediated cassette exchange
The productivity of a stable cell line depends on the genetic locus of transgene integration. Instead of random integration, it would be desirable to target the transgene to a specific locus that allows for strong and stable transgene transcription. This can be achieved by recombinasemediated cassette exchange (RMCE) using site-specific recombinases [34,35]. RMCE requires a 'master' cell line carrying a single copy of a reporter gene at a suitable genetic locus. By RMCE, the reporter gene is exchanged against the gene of interest (GOI). For RMCE with the 84 New constructs and expressions of proteins   site-specific Flp recombinase, the reporter gene is flanked by two distinct Flp recognition target (FRT) sites that have been engineered so that they cannot recombine with each other (Figures 3 and 4). RMCE is initiated by introducing the Flp recombinase and a vector with the gene of interest (GOI), flanked by FRT sites, into the master cell line.
Recombination of FRT sites leads to exchange of the reporter gene and the GOI, placing the GOI at a highly active, stable genetic locus (Figure 4). The same master cell line can be used for generating producer cell lines for many different target proteins [34,35].
The stable CHO Lec3.2.8.1 cell line SWI3a-26 is a master cell line for RMCE that was generated by random integration of a FRT-flanked GFP reporter gene [36]. It contains a single copy of the GFP reporter transgene at a genetic locus that is protected from silencing. The integrated GFP cassette contains a 'selection trap' that allows for selection of recombinant cells upon RMCE [37]. The selection trap is an inactive selection marker, lacking a promoter and a start codon (Figure 4). It is complemented upon RMCE and allows for antibiotic selection of recombinant cells. Using SWI3a-26, production cell lines were established by RMCE for different mammalian glycoproteins [36,38], including the ectodomain of the lysosomal membrane protein DC-LAMP. In consequence, the DC-LAMP domain structure was solved by X-ray crystallography [39 ].
In our experience, RMCE takes about 7 weeks from the day of transfection to cryopreservation of clonal production cell lines [36]. In comparison to random integration, the process is faster and the effort of screening of large numbers of clones is avoided. The multi-host expression vector pFlpBtM allows for protein production in E. coli, transiently transfected mammalian cells and Baculovirusinfected insect cells and for construction of stable cell lines by RMCE with a single vector [38].
RMCE-derived production cell lines contain only a single copy of the transgene. Nevertheless, high-yield antibody production of up to 2 g/L shake flask culture was achieved with an RMCE system that uses Cre-lox recombination [40 ]. With this system, about 5 weeks were required from transfection to completion of stable pool production cultures.
Two independent genetic loci can be targeted with two different transgenes by RMCE with a single transfection. This has been achieved by designing new synthetic FRT site variants [41].

Transfection efficiency and stable pools
Transfection of HEK293 and CHO cells leads to integration of transgene DNA at random chromosomal loci, but the frequency of these integration events is very low with commonly used vectors. Highly efficient systems for chromosomal integration of transgenes accelerate cell line generation and allow for protein production with stable pools, thereby eliminating time-consuming cloning steps. For protein production with a stable pool, the bulk of stably transfected cells are selected and used directly for protein production. The structure of lysosomal integral membrane protein (LIMP2) was solved using a stable pool obtained by transfection with the FC31 integrase system (described below) and antibiotic selection [42]. Efficient chromosomal integration is also achieved with Lentivirus particles and transposons (Figure 3).

Lentivirus
Lentiviral transduction of mammalian cells is very efficient and highly productive cells are generated at a high frequency. The usefulness of the Lentivirus system was demonstrated by establishing stable cell lines and stable pools for production of antibodies and blood coagulation factor VIII [43,44,45]. The Lentivirus efficiently transports the transgene cDNA into the nucleus, where it is integrated into the host cell genome by the viral integrase ( Figure 3). A stable cell line for production of an IgG receptor subunit, which resulted in crystal structure determination, was established with a recombinant Lentivirus [46]. Stable cell line generation by Lentivirus transduction and by non-viral plasmid transfection was compared [45]. Gene delivery into nearly 100% of CHO cells grown in serum-free suspension culture was obtained by Lentivirus. GFP overexpression was up to five times higher in comparison to plasmid transfection. Potential drawbacks of the Lentivirus system are safety concerns, the error-prone replication of the viral RNA genome by reverse transcription and the extra step of virus particle preparation.
Phage FC31 integrase The integration system of the Streptomyces phage FC31 represents a non-viral alternative for active transgene integration. The FC31 integrase performs recombination between the attP site of the phage genome and the attB site in the host bacterial chromosome. In mammalian cells, it mediates integration of plasmids bearing an attB site into chromosomal sequences that have sequence similarity with attP, termed pseudo attP sites [47]. Stable mammalian cell lines are generated by co-transfection with a FC31 integrase expression vector and an expression vector for the gene of interest that has an attB site (Figure 3). This system was used for protein production and structure determination of LIMP-2 [42], tumor antigen 5T4 [48] and acetylcholinesterase [49].

Transposons
High rates of chromosomal integration have also been achieved with transposon vectors. The terminal inverted repeats of the 'piggyBac' transposon are recognized by the transposon's integrase, which leads to integration of the flanked sequence into a chromosomal TTAA site  ( Figure 3). In stable CHO cell development, the piggy-Bac system strongly increased the frequency of stable integration and lead to up to fourfold higher protein yield from pools of transfected cells [50]. Similarly positive results were obtained with the 'Sleeping Beauty' transposon system and HEK293 cells [51]. A stable cell line overexpressing the four subunits of a g-secretase complex was established in one step with the piggyBac system [52 ]. A newly designed vector set for doxycycline-inducible overexpression utilizes multiple-copy integration by the piggyBac integrase and was used with HEK293S GnTI À cells for high-level secretion of 14 proteins with stable pools [53 ].

Conclusions
Protein production with stable cell lines for structural biology relies strongly on glycosylation-deficient host cells. A novel HEK293 cell line, called 'GlycoDelete', was equipped with an intracellular endoglycosidase for secretion of deglycosylated glycoproteins. Host cells can be improved by sequence-specific genome engineering, allowing for efficient and highly specific knock out of multiple genes. A CHO cell was reported that lacks both GnTI and glutamine synthetase (GS) activity, which allows for highly efficient selection of stable cell lines with GS markers.
A survey of recent reports of protein crystal structures indicated that transient transfection of mammalian cells and stable mammalian cell lines were used with around the same frequency. In comparison to transient transfection protocols, the generation of stable cell lines has several bottlenecks. The frequency of stable genome integration upon transfection with plasmid vectors is low and, moreover, most integrated transgenes will be silenced. Top producers are usually rare in the pool of stably transfected cells and their identification requires isolating and characterizing a large number of clones.
Techniques for improved genome integration of transgenes and for improved selection of high-producing stable cell lines have been developed. Using fluorescent proteins as selection markers allows for isolating highproducing stable cells among millions of transfected cells by cell sorting. Novel glutamine synthetase knock out cells generated by sequence-specific genome engineering allow for a more efficient selection of high-producer cells with the glutamine synthetase selection marker. The problem of transgene silencing has been addressed by the recombinase-mediated cassette exchange (RMCE) technique. Here, site-specific recombination is used for targeting the gene of interest to specific genetic loci that  are protected from silencing and that allow for high-level gene expression.
Plasmid DNA introduced into the nucleus of host cells is integrated into the genome by host cell factors at random sites with a low frequency. Much more efficient genome integration is achieved when additional, heterologous integration factors are introduced. This is accomplished with Lentiviruses, piggyBag and Sleeping Beauty transposons and the phage FC31 integrase. Transfection of host cells with these highly efficient systems results in a pool of cells with a high proportion of stable high-producers. Such a stable pool can be used directly for protein production, thereby avoiding time-consuming clone isolation and characterization steps.