Microbial expression systems for membrane proteins

Despite many high-pro ﬁ le successes, recombinant membrane protein production remains a technical challenge; it is still the case that many fewer membrane protein structures have been published than those of soluble proteins. However, progress is being made because empirical methods have been developed to produce the required quantity and quality of these challenging targets. This review focuses on the microbial expression systems that are a key source of recombinant prokaryotic and eukaryotic membrane proteins for structural studies. We provide an overview of the host strains, tags and promoters that, in our experience, are most likely to yield protein suitable for structural and functional characterization. We also catalogue the detergents used for solubilization and crystallization studies of these proteins. Here, we emphasize a combination of practical methods, not necessarily high-throughput, which can be implemented in any laboratory equipped for re- combinant DNA technology and microbial cell culture. transformants. Confocal microscopy visualization of recombinant hA 2A R/ hA 2A R-Ura3p in transformed S. cerevisiae using AlexFluor488 antibodies was done to assess whether hA 2A R/hA 2A R-Ura3p was localized in the membrane or had been internalized to the vacuole. Immunoblots (50µg total membrane protein loaded per well, as determined by BCA assay, and probed with Clontech anti-His 6 antibody) were quanti ﬁ ed using ImageJ. This allowed comparison of recombinant protein yield from A2H1, A2U1 and A2SU1 (BY4741 or spt3 Δ were transformed with pYX222-h A2AR-URA3 and grown under conditions of nutrient selection) compared with the A2 control (BY4741 transformed with pYX222-h A2AR ) or spt3 Δ : hA 2A R ( spt3 Δ transformed with pYX222-h A2AR ). For h β 2A R, B2H3, B2U1 and B2U5 were compared with the B2 control. The functional yield for hA 2A R/hA 2A R-Ura3p or β 2 AR/ β 2 AR-Ura3p in yeast cell membranes or following solubilization with 2.5% DDM, 0.5% CHS was determined by single-point saturation binding using the antagonists [ 3 H]ZM241385 or [ 3 H]CGP 12 177, respectively and 100µg total membrane protein per experiment, as described in [73]. A 1-way ANOVA with a Holm-Sidak ’ s multiple comparison test gave p=0.001 (***) for A2H1 (without DDM treatment) versus A2H1(solubilized with DDM). Additionally, β 2 AR/ β 2 AR-Ura3p was solubilized using 2.5% (w/v) styrene maleic acid (SMA) polymer with a 2:1 ratio of styrene to maleic acid. All data are derived from at least 3 independent biological replicates, with error±SEM in parenthesis, where applicable.


Recombinant membrane protein production in microbes
Few membrane proteins are naturally abundant in their native membranes; in order to characterize them biophysically and biochemically, recombination of their genes with more efficient promoters and regulators of expression are required [1]. Unsurprisingly, the few naturally-abundant membrane proteins (including mammalian and bacterial rhodopsins, aquaporins and complexes involved in respiration and photosynthesis) were amongst the first to have their crystallographic structures solved: the first high-resolution structure of a membrane protein was that of the photosynthetic reaction centre from Blastochloris viridis published in 1995 [2]. In 1998, the first recombinant membrane protein structures were published: those of the prokaryotic proteins MscL [3] and KcsA [4], both produced in Escherichia coli. The first structures of recombinant mammalian membrane proteins were solved in 2005 using protein that had been produced in yeast cells: the rabbit Ca 2+ -ATPase, SERCA1a, was produced in Saccharomyces cerevisiae [5] and the rat voltage-dependent potassium ion channel, Kv1.2 was produced in Pichia pastoris [6]. These early results established microbes as efficient and effective host systems for synthesizing membrane proteins.
While baculovirus-infected insect cells and mammalian cell-lines have been used very successfully for both prokaryotic and eukaryotic membrane protein production [1], we note that microbes have remained a consistently-popular choice because they are quick, easy and cheap to culture and they can produce high-quality protein suitable for subsequent study. In November 2017, Stephen White's database (blanco.biomol.uci.edu/mpstruc/) recorded that almost a third (31%) of all membrane protein coordinate files deposited in the Protein Data Bank (www.rcsb.org/pdb/home/home.do; PDB) were derived from recombinant proteins; notably, 71% of all unique structures were derived from microbial sources. Of these, 64% were produced in E. coli, 4% in P. pastoris and 3% in S. cerevisiae.
This review focuses on current approaches to selecting expression plasmids (especially with respect to their purification tags and promoters), microbial strains and culture conditions to enable the detergent-based purification of functional membrane proteins for biophysical characterization and crystallization trials (for subsequent     4 Detergent used for structure determination (NB: LCP is lipid cubic phase; lsNMR is solution phase NMR; ssNMR is solid state NMR). 5 No tag. 6 Inclusion bodies solubilized in 6 M guanidine hydrochloride. 7 Not specified in PDB or corresponding publication. 8 E. coli membrane protein produced using its native promoter. 9 No induction. 10 Mixed detergent. 11 Not found; article was not accessible. 12 Other; the tag was inserted within the protein.
structural studies by X-ray crystallography, as well as the newly-invigorated technique of electron microscopy [7]). We have experience of automated methods using robots that, once commissioned and optimized, can dramatically increase the number of constructs and hosts explored and reduce the time required to reach success. This review is intended for laboratories without access to such facilities, meaning that the approaches discussed here should be widely applicable.
2. An overview of microbial expression hosts, tags and promoters 2.
1. An overview of microbial host usage The expression systems used in generating high-resolution structures of recombinant membrane proteins have been documented by Stephen White in his analysis of the PDB (blanco.biomol.uci.edu/ mpstruc/). Biophysical studies of membrane proteins (especially NMR and crystallographic techniques) require large quantities (0.1-10 mM) of homogenous, correctly-folded, purified protein; a focus on data extracted from the PDB has therefore allowed us to identify systems that have the capability of producing the required quantity and quality of these challenging targets. This review updates our previous study [8] of E. coli expression systems and extends that work to include S. cerevisiae and P. pastoris. Together, these three host systems account for the production of the vast majority of recombinant membrane proteins in microbes, although Lactococcus lactis (see PDB entry 4US3), Pseudomonas fluorescens (5KUD) and Schizosaccharomyces pombe (2PNO) have also been used successfully as microbial cell factories in a minority of cases.
In November 2017, of the 729 unique membrane protein structures (uMPS) derived from recombinant proteins and deposited in the PDB, 521 were produced in microbial host cells. E. coli was clearly the cell factory of choice (producing 468 uMPS, Tables 1 and 2), followed by the yeast hosts, P. pastoris (31 uMPS, Table 3) and S. cerevisiae (22 uMPS, Table 4). Table 5 summarizes the use of microbial expression systems and the origin of the target uMPS. With a growing number of uMPS being deposited in the PDB, heterologous membrane protein production is becoming dominant over the production of homologous targets. Higher eukaryotic uMPS, in particular, have more recently been obtained using all three microbial systems (Table 5).
Yeast expression systems have been used almost exclusively in the production of large, eukaryotic membrane proteins; in the case of P. pastoris, the targets were mainly of mammalian and plant origin (Table 3). Yeast hosts have mainly produced α-helical membrane proteins, while E. coli has also been used to produce β-barrel proteins, probably because many such proteins are found natively in the E. coli outer membrane (Fig. 1). Fig. 2 shows that above 500-600 amino acids (∼50-60 kDa), the number of uMPS decreases dramatically, suggesting that E. coli cannot efficiently produce large proteins; this may be because ribosomes drop off very long mRNAs leading to incomplete synthesis products. In contrast, yeast expression systems can cope with larger proteins up to 1400 residues in length (∼150 kDa) (Fig. 2). When we interrogated the data for eukaryotic (mammalian, plant, fish, anemone and worm) membrane proteins produced in E. coli (Table 6), we identified 47 uMPS. Of these 47 uMPS, 7 were monotopic membrane proteins and 23 were small peptides or proteins containing only one transmembrane domain. Fifteen uMPS were produced as inclusion bodies and subsequently refolded, 17 were purified in mild detergent and, of those, 3 were membrane proteins with more than 4 transmembrane α-helices that had been crystallized in the presence of detergent (PDB codes: 2Q7M; 4BUO; 4O6Y) and 1 was studied by electron microscopy (3DWW).

An overview of tag usage in microbial expression systems
Construct design is an integral part of defining an appropriate expression system, with key considerations being the size and predicted secondary structure of the target protein as well as the planned purification strategy. SMART (protein domain identification; http://smart. embl-heidelberg.de/help/smart_about.shtml) or Jpred (secondary structure prediction; http://www.compbio.dundee.ac.uk/jpred/) approaches can be used to describe the protein architecture and may help in deciding where to place any tags.
Addition of a polyhistidine tag is the most popular strategy for largescale purification of recombinant membrane proteins on nickel-affinity columns. This is especially true for E. coli, where other affinity purification tags have had very little impact: 392 of the 447 tagged proteins produced in E. coli contain a polyhistidine tag; 18 contain a GFP tag and 17 were fused to maltose binding protein (MBP). Other tags such as Strep and Flag account for no more than 13 uMPS (Fig. 3A, Tables 1 and  2). While the overall numbers are lower for uMPS from yeast-derived proteins, GFP is emerging as a useful tag to track the purification of proteins from yeast membranes (Fig. 3B, Tables 3 and 4). GFP can be particularly useful for monitoring production yields or the oligomeric state of a membrane protein-GFP fusion via fluorescent size exclusion chromatography experiments (F-Sec) [9]. Other tags (e.g. Strep and Flag) are also more frequently used in yeast expression systems (Fig. 3B).
Irrespective of the host used, polyhistidine tag placement is approximately equally favoured at the amino-or carboxyl-terminus of the target protein (Fig. 4A, . For constructs with amino-terminal tags, protein synthesis is usually initiated using a sequence of at least three amino acids before that of the tag. For example, in plasmid pRSET (Invitrogen), the sequence is MRGSHis 6 , while the protein used to solve structure 4V3G contained the following amino-terminal tag: AN-VRLQHis 7 LE (Table 2). Fig. 4B shows that 35% of uMPS produced in microbial host cells contained polyhistidine tags with more than 6 histidines. An interesting example is the insertion of a tandem array of 6 histidines separated by a glycine (see 5DO7, Table 3).
Fluorescent tags are an increasingly popular choice for examination of protein quality [10]. In bacteria, dual Ribosome-Binding-Site (RBS) expression vectors such as pET-Duet (Novagen) enable the cloning of a gene encoding a reporter fluorescent protein downstream of the target gene. This allows the cell population to be assessed by flow cytometry for stability and toxicity of the expression construct and to establish optimal induction conditions. Double RBS vectors from the pET-Duet series have been used to produce nine multi-subunit membrane proteins (see Table 2 for 4HZU, 4HUQ, 4HG6, 4NRE, 4N4R, 5AWW, 4YMS,  3DL8 and Table 1 for 4C48), with five being produced in E. coli host strain C43(DE3).
There is no general rule regarding cleavage sequences, but TEV protease, which is easy to produce in-house, is widely used for membrane protein purification (see 4C00, 3WVF, 4X5M and 4JA3 for examples) because it is still active in the presence of the most commonlyused detergents [11]. Thrombin protease is also widely used (see 2VQI, 2ABM and 3B5D for examples).

Promoter usage for E. coli expression
We analyzed how the 468 membrane proteins in Table 5 had been produced. Some uMPS were produced in more than one expression system and therefore Tables 7 and 8 list a total of 477 combinations of promoter and E. coli host strain. As we previously observed (in our 2015 analysis of 213 uMPS [8]), the T7 RNA polymerase (T7RNAP)-based expression system is the most widely-used followed by the ara, T5 and tet promoter-based expression systems (Fig. 5). The data in Tables 1 and  2 are presented in chronological order, meaning that the later entries reflect the most recent trends in promoter, strain and vector choice.

M e t h o d s x x x ( x x x x ) x x xx x x
Rosetta™ 2(DE3) ( Table 7). The bacterial host BL21(DE3) (including mutant derivatives) is most used (111 uMPS), followed by the two mutant hosts, C43(DE3) and C41(DE3) (54 and 30 uMPS, respectively). The BL21(DE3) host together with plasmids expressing either lysozyme or a rare tRNA (28 and 19 uMPS) follows in fourth and fifth position, respectively. Rosetta™ 2(DE3) was used for a total of 13 uMPS. Bacterial hosts other than those mentioned above have only had a marginal impact in the field (1 to 8 uMPS each). Production of homologous membrane proteins in E. coli shows a similar pattern (Table 8), although native E. coli promoters were more frequently used (Fig. 5). As expected, rare tRNA plasmids were not typically used for the production of homologous membrane proteins. The two most used bacterial hosts were BL21(DE3) and C43(DE3) yielding 41 and 26 uMPS, respectively (Table 8). Table 9 lists the genotypes of the bacterial hosts identified in this analysis. Expression systems that are not T7RNAP-based do not require λDE3-containing hosts. Despite this, it is noticeable that in the case of the ara expression system, C43(DE3) is used more than any other strain: 16 out of 47 non-E. coli uMPS and 5 out of 12 E. coli uMPS were produced in C43(DE3). Whether the lacI super-repressor mutation or another mutation found in this host [12] is advantageous for the regulation of the arabinose promoter remains to be demonstrated. The T7RNAP-based expression system in combination with C41(DE3) or C43(DE3) has been mostly used to produce α-helical membrane proteins, while BL21(DE3) and other BL21(DE3) derivatives were also used to produce β-barrel membrane proteins. The situation is opposite for the arabinose expression system where C43(DE3) hosts produced mainly β-barrels. In the T7RNAP-based expression system, C41(DE3) and C43(DE3) hosts were more frequently used for uMPS containing more than 7 transmembrane domains, while BL21(DE3) was preferentially used for smaller proteins, typically with 1-2 transmembrane domains (Tables 1 and 2).
It is clear that selecting the optimal combination of promoter, tag and bacterial host is key to achieving suitable recombinant membrane protein yields for biophysical studies. In order to provide some guidance, in our experience the following applies to the T7RNAP-based expression system with the C41(DE3) and C43(DE3) bacterial strains, which were originally derived using high copy number plasmids (200-600 copies/cell, such as those containing the pMB1 origin of replication). A non-exhaustive list of suitable plasmids includes pMW7 and derivatives (pHis and pRun) [13,14], pGEM (Promega), pRSET and pDEST (Invitrogen), pIVEX (5prime) and pPR-IBA (IBA). It is important to note that the chosen plasmid should not contain lacI or lacO sequences because further attenuation of the T7 promoter is often not needed for those expression hosts (see also comments on stability testing in Section 3). For BL21(DE3) derivatives, medium copy number vectors (pET series) and those containing lacI and lacO sequences (e.g. pET 3, 9, 14, 17, 20 or 23 from Novagen) are more suitable because they reduce the amount of T7RNAP before induction. Use of the companion plasmid pLyS inhibits T7RNAP after induction. The BL21AI host, which contains the T7RNAP gene under the control of the arabinose promoter or the Lemo21 host [15], which contains a companion plasmid expressing the lysozyme gene under the control of the rhamnose promoter, may also be useful to titrate the amount or activity of T7RNAP. Table 10 lists the yeast promoters and strains that are integral components of yeast expression systems. Table 11 lists the corresponding genotypes. Typically, episomal plasmids are used for expression in S. cerevisiae, while the expression cassette is integrated into the genome of P. pastoris. This situation probably results from the reproduction of early successes with these combinations. Since the P.

Promoter usage for yeast expression
pastoris system depends upon very strong promoters, only a few copies of the gene (as present in stably-integrated strains) are required to  Number of transmembrane domains. 2 N-or C-terminal position. 3 Solubilization detergent. 4 Detergent used for structure determination. 5 Mixed detergents. 6 Not specified in PDB or corresponding publication. 7 Not found or article was not accessible. 8 Inclusion bodies solubilised in 6 M guanidine hydrochloride. 9 No tag.   3 N-or C-terminal position. 4 Solubilization detergent. 5 Detergent used for structure determination. 6 Not specified in PDB or corresponding publication. 7 Mixed detergent. 8 No tag. 9 Not found or article was not accessible. obtain sufficient levels of mRNA, although it is apparent that copy number and protein yield are not linearly correlated. The negative impact of secretory stress and clonal instability are areas on which investigators have focussed attention in the search for productive recombinant P. pastoris strains [16]. A recent report has streamlined the 'time-to-strain pipeline' in P. pastoris [17]. In contrast, in S. cerevisiae, the promoter may be 10-to 100-fold weaker, so the use of episomal plasmids with high copy numbers is advantageous; episomal plasmids are available for P. pastoris [18], but are not yet widely used in structural biology projects. The strong S. cerevisiae promoter, P GAL1 , is induced with galactose while P AOX1 (a very strong P. pastoris promoter) is induced with methanol [19]. In choosing a strong promoter, the idea is that transcription should not be rate limiting. However, high mRNA synthesis rates may be countered by high rates of mRNA degradation [20]. Evidence from bacterial expression systems suggests that lowering promoter efficiency via mutation can lead to improved functional yields of membrane proteins for some, but not all, targets [21]. It has been proposed that the ideal inducible system would completely uncouple cell growth from recombinant synthesis, which requires the host cell to remain metabolically capable of transcription and translation in a growth-arrested state. In this scenario, all metabolic fluxes would be diverted to the production of recombinant protein [22]. While this approach is yet to be demonstrated for membrane protein production in yeast cells, soluble chloramphenicol acetyltransferase was produced to more than 40% of total cell protein in E. coli [23] suggesting that this may be a strategy worth exploring in yeast. As for bacteria, yeast growth rates often (but not always) decline dramatically upon induction of yeast cultures, in part achieving this state.
3. Bacterial expression systems for membrane protein production: P T7 -based expression protocols 3.1. Optimization of culture growth conditions for improved membrane protein production We have previously examined the importance of optimizing growth conditions for improved membrane protein production in bacterial host cells [24,25] and have published an analysis of the T7RNAP-based expression system [8]. Here we combine these insights with our updated analysis of the PDB. For simplicity, we refer only to the T7RNAPbased expression system in this section, but the principles of most of our advice can be applied more widely to other microbial expression hosts.
An important, but simple, test that should be done prior to culturing recombinant strains is to assess whether the selected plasmid/bacterial host combination is stable over time in the medium to be used for largescale production. We suggest assessing individual cultures from five independent colonies. After overnight growth in the presence of a suitable antibiotic, 10 −6 ,1 0 −7 and 10 −8 dilutions should be plated on 2 * TY agar with and without antibiotic. If the same number of colonies is obtained in the absence or presence of antibiotic, then the plasmid is stable and it is appropriate to proceed to growing large-scale cultures. However, if the number of colonies is higher in the absence of antibiotic, the expression plasmid is unstable (even prior to inducing expression of the target gene), and it is not advisable to prepare a largescale culture. In this scenario, it would be prudent to change to a better regulated host strain such as C41(DE3), C43(DE3), Lemo21(DE3) or BL21(DE3) pLysS. Alternatively, some investigators do not plate cells after heat shock but use the whole transformation medium as a preculture [26]. By doing this, they take advantage of the significant variability in target gene expression level from one colony to another, in the hope of achieving a reasonable recombinant protein yield.
In general, however, it is preferable to start from freshly-transformed bacterial cells. A typical approach is to inoculate 5 ml 2 * TY medium with an isolated colony and incubate overnight. The next morning, this should be used to inoculate 500 ml of 2 * TY medium in a 2.5 L flask. The culture should reach an optical density of 0.6 in fewer than 5 h; if not, then the basal expression level of the target gene must be impairing cell growth, which usually affects the stability of the expression plasmid. Instead of the typical induction protocol (0.7 mM IPTG at A 600 = 0.6), two options are also worth trying. The first is not to add IPTG and instead to let the culture grow overnight at 30°C or 37°C. This protocol works well for high copy number plasmids that are not regulated (i.e. they lack the T7lac promoter and/or multicopy lacI or lysozyme gene expression); two membrane protein structures were obtained without inducing the culture in this way [27,28]. The second method is to add IPTG at the beginning of the stationary phase (A 600 = 1) either in trace amounts (10 μM) following the improved protocol of Alfasi and colleagues [29] or at a high concentration (0.7 mM) in the stationary phase (Table 12). However, adding IPTG in the stationary phase is not recommended when using C41(DE3) or C43(DE3) and will result in decreased expression levels of the target gene.

Selection of mutant T7RNAP-based expression strains for toxic genes
C41(DE3) and C43(DE3) were originally selected as part of a strategy to produce a membrane protein target that was toxic to BL21(DE3) host cells [30]. The protocol summarized here allows the selection of a bacterial strain to produce any given toxic target membrane protein. Having a reporter gene such as GFP makes the experiment faster but is not essential; C41(DE3) and C43(DE3) were selected without the use of a fluorescent reporter.
The expression plasmid containing the gene of interest should be transformed into BL21(DE3) using calcium chloride and with 1-10 ng of plasmid. After incubation of the 1 ml transformation culture for 1 h at 37°C, 100 μl are spread onto a 2 * TY plates with antibiotic and onto 2 * TY plates with antibiotic supplemented with either 0.4 mM or 0.7 mM IPTG (this range avoids the non-specific toxicity of IPTG above 0.7 mM). If the vector expressing the target membrane protein does not prevent cell growth on IPTG-containing plates, mutant strains cannot be selected. If there are hundreds of colonies in the absence of IPTG but very few in the presence of IPTG, some mutants may appear at high frequency.
Typically, five selection experiments can be performed in one day: five 250 ml flasks containing 50 ml 2 * TY medium with antibiotic are each inoculated with one bacterial colony. Once the culture has reached A 600 = 0.4-0.6, IPTG is added at 0.7 mM final concentration to induce gene expression. One to two hours after induction, 1 ml culture is harvested and serial 10 −1 to 10 −4 dilutions are plated onto the IPTGand antibiotic-containing plates. The frequency of appearance of mutant hosts varies from 10 −4 to 10 −6 [30]. After an overnight incubation at 37°C, the number of colonies of different sizes is counted. Large colonies have usually lost the ability to express the target gene in contrast to small colonies, which arise at a frequency of 1-20%.

M e t h o d s x x x ( x x x x ) x x xx x x
required to cure C41(DE3) from the pOGCP expression plasmid [30]). If the mutation is in the expression vector, transformation of the isolated plasmid into BL21(DE3) cells should give colonies on IPTG-containing plates; if there are no colonies, then the isolated colony carries the mutation.

Expression of non-toxic or moderately-toxic target genes
Expression of genes encoding non-toxic or moderately-toxic membrane proteins cloned in T7 expression plasmids lead to colony formation on IPTG-containing plates. Toxicity is inversely proportional to the size of colonies on these plates. We have observed that antibiotic use is not required in large-scale cultures, providing that antibiotic has been added to the preculture [24]. The induction protocol must be adjusted depending on the size of the colonies on IPTG plates (Table 12). If the size reduction is marginal compared to plates lacking IPTG (<10%), this may suggest that the production yield of the target membrane protein is very low. To maximize the chance of obtaining high yields, 0.7 mM IPTG should be added at the early exponential phase (A 600 ≤ 0.4). If the size of the colonies is decreased by 10% or more, then IPTG should be added at A 600 = 0.6 at the two concentrations that are most frequently used [8]: 0.4 mM and 0.7 mM (Table 12). Autoinduction has been used with the T7 and arabinose expression systems ( [31] and Table 2 in the cases of 4HYJ, 4KJS and 3FID). In E. coli, glucose is a catabolic repressor that is catabolized before any other carbon source. Autoinduction media take advantage of this; they contain glucose to allow the bacterial cells to grow to high densities, but when the glucose has been exhausted, cells switch on operons involved in the catabolism of the other carbon sources present. Autoinduction media contain a defined amount of lactose that can bind to lacI and stimulate the expression of T7RNAP. Commercial autoinduction media are not cheap, but are a useful option when leaky expression is toxic and prevents cell growth prior to IPTG addition. Another option to circumvent toxicity is to decrease the temperature of the culture 30 min before IPTG addition. In a previous study [8],w e demonstrated that in approximately 50% of studies using T7 expression systems, lowering the temperature (i) prevented the formation of inclusion bodies, (ii) improved the solubility of the recombinant membrane protein, (iii) reduced toxicity or (iv) prevented overgrowth of the culture by cells that had lost the expression plasmid [30].

Collecting proliferated membranes or inclusion bodies from E. coli hosts
Formation of inclusion bodies containing a recombinant membrane protein (IBMP) occurs frequently in bacteria especially for non-E. coli targets. Inclusion body formation is usually not toxic to the cell, the recombinant protein can be accumulated to very high levels and, in some cases, the protein is in an 'amyloid' form which entraps functional protein [4]. Bacterial inclusion bodies have been shown to spontaneously penetrate mammalian cells and can be targeted to specific receptors, opening the way to deliver functional drugs. Due to their natural abundance and the fact that Ni 2+ -affinity chromatography can be performed in denaturing conditions, IBMP can be purified in large quantities. One application is their use as an alternative to peptides for raising specific antibodies against eukaryotic proteins [5,6]. As mentioned in Section 2.1, large scale refolding of inclusion bodies has been attempted in the field of structural biology and some progress has been made especially for NMR analysis. For instance, several G proteincoupled receptors (GPCRs) have been produced in a functional form (after refolding of E. coli-produced inclusion bodies in amphipols [7,8])

M e t h o d s x x x ( x x x x ) x x xx x x
and successfully studied by high resolution NMR [9]. However, refolding of IBMP is challenging because some surfactants maintain misfolded membrane proteins in solution, as exemplified by the mitochondrial uncoupling protein structure (2LCK, Table 2) which is not physiologically relevant [10]. Consequently, although producing IBMP for structural studies could be considered, we have focused on targeting heterologous membrane protein targets to bacterial membranes, ideally in proliferating membranes. Intracellular formation of membranes in E. coli has been observed upon the overproduction of several classes of proteins: 1. integral membrane proteins including the whole ATP-synthase [32] or AtpF, its membrane bound subunit b [33], the chemotaxis receptor Tsr [34], the sn-glycerol-3-phosphate acyltransferase [35] and the fumarate reductase [36]; 2. monotopic membrane proteins including the glycosyltransferase MurG [37]; the monoglycosyldiacylglycerol synthase (MGS) from Acholeplasma laidlawii [38,39] and the N-methyltransferase PmtA from Agrobacterium tumefaciens [40]; and 3. Amphipatic protein oligomers made of caveolin [41,42] or deriving from elastin-like peptide repeats (ELP, [43]).
AtpF is a good example of an E. coli membrane protein that can be produced either as inclusion bodies in C41(DE3) or in a folded state within internal proliferating membranes in C43(DE3). Accumulation of atpF mRNA is similar 3 h after induction in both expression hosts but the time course of expression is delayed by 30 min in C43(DE3) [3]. Optimized expression conditions were 16 h of induction with 0.7 mM IPTG at 25°C. In these conditions, the viability of the cells was restored and overproduction of AtpF did not trigger toxicity.
Despite this example, inclusion body formation is frequent and difficult to avoid with eukaryotic membrane protein targets. When producing a membrane protein in bacteria, it is therefore important to check for the presence of inclusion bodies and to prepare carefully cellular or internal bacterial membranes. Inclusion bodies can be isolated following two centrifugation steps: 600g for 10 min to collect unbroken cells and cell debris in the pellet, followed by 10,000g for 15 min at 4°C to collect inclusion bodies from the supernatant. Bacterial membranes remain in the 10,000g supernatant; they can be pelleted after high speed centrifugation, usually 100,000g for 1 h. To collect bacterial membranes in the absence of inclusion bodies, disrupt the bacteria (at least 1 L of culture) by passing the suspension twice through a French Press or cell disruptor. If a recombinant membrane protein triggers internal membrane proliferation, such as AtpF-induced intracellular membranes [11], those membranes can be immediately collected following low speed centrifugation: 2500g for 10 min (P1 pellet). The pellet contains internal membranes but also unbroken cells and debris that need to be washed away. The supernatant (S1) contains inner and outer membranes, which are collected by centrifugation of S1 at 100,000g for 1 h at 4°C. Proliferated membranes within P1 are then washed and unbroken cells are removed after centrifugation at 2500g for 10 min at 4°C. The supernatant (S2) contains the washed internal membranes, which are collected after 1 h centrifugation at 100,000g. The next step is to separate membrane vesicles according to their specific density on a sucrose gradient. For high purity requirements, continuous gradients are used.

P tac -based protocols: The use of plasmid pTTQ18
As shown in Table 5, E. coli has been engineered and optimized for use as an expression host to produce proteins from both prokaryotic and eukaryotic organisms. This section is concerned exclusively with the overexpression of genes encoding prokaryotic membrane proteins, for which E. coli is usually an ideal expression host. The strain of E. coli illustrated here, BL21(DE3), was selected for its lack of both the lon and ompT proteases, and as a consequence of the previous successes achieved for high-level expression of membrane transport proteins [44][45][46][47][48][49][50][51]. Overexpression of all target genes is initially examined and verified by the culture of E. coli BL21(DE3) host cells, harbouring the  Number of transmembrane domains. 2 N-or C-terminal position. 3 Solubilization detergent. 4 Detergent used for structure determination. 5 Mixed detergents. 6 Not specified in PDB or corresponding publication. 7 Not found or article was not accessible. 8 Inclusion bodies solubilised in 6 M guanidine hydrochloride. 9 No tag. plasmid pTTQ18 [45][46][47] containing the gene of interest, in 50 ml LB medium and inducing with 0.5 mM IPTG at mid-log phase (A 680 ∼ 0.4-0.6). The cells are harvested 3 h after induction of the tac promoter and total membranes are prepared from spheroplasts by the water lysis method (Fig. 7). The total membrane proteins are separated by SDS-PAGE and stained with Coomassie brilliant blue and/or analysed by Western blotting with an anti-His antibody. If a protein is found to overexpress well, scaling-up of bacterial culture volumes is undertaken [44]. This is often performed with 30 or 100 L fermenters [52] and inner membranes containing the protein of interest are prepared from the cells using sucrose density gradients (Fig. 7). Note that whole cell lysates can be used for this screening step, but there is a danger of missing successful expression, because the protein is located only in the membrane fraction comprising less than 10% of total cell protein, potentially leading to false-negative results.
In our extensive experience of using plasmid pTTQ18 for the heterologous production of bacterial membrane proteins in E. coli BL21(DE3), inclusion bodies did not appear. Rather the recombinant protein appeared in the membrane fraction of the disrupted host cell, where it was functionally active in all cases tested.

General choices and considerations for cloning into plasmid pTTQ18
Our preferred cloning strategy is based on the traditional restriction enzyme method, which involves the digestion of both vector and amplified DNA fragments with the relevant restriction enzymes to enable DNA ligation. This method is not high-throughput but is reliable. The pUC-based plasmid pTTQ18 [45] is a high copy number vector, which has been used successfully for the overexpression of diverse membrane transport proteins of the Major Facilitator Superfamily (MFS)   [46,47,50,[53][54][55], the 5-Helix Inverted Repeat Transporter superfamily ('5-HIRT', commonly known as the 'LeuT' superfamily [54,55], twocomponent system (TCS) membrane regulatory proteins [56,57], Proteobacterial Acinetobacter Chlorhexidine Efflux (PACE) family efflux proteins [58] and soluble proteins (e.g. [59]). The efficacy of pTTQ18 as an expression vector for different classes of membrane proteins has been tested and compared with other types of plasmid construct [46,47,54,60]. Also, the desirability of placing a tag (usually (His) n )a t either the carboxyl-terminus or the amino-terminus of the cloned gene has been discussed [61]. Plasmid pTTQ18 contains a polylinker/lacZα region flanked by a hybrid trp-lac (tac) promoter. The tac promoter consists of the -35 region of the trp promoter fused with the lacUV5 -10 region of the lac promoter (Fig. 8). Basal expression of the tac promoter is minimized by binding of the LacI repressor, encoded by the lacI q gene, to the lac operator downstream of the promoter. Also downstream of the tac promoter is the multicloning site, which permits the use of either EcoRI or NdeI restriction enzyme sites at the 5ˊend of the amplified gene, and PstIo rHindIII enzyme sites at the 3ˊend of the gene, for successful ligation. The pTTQ18 plasmid also contains the bla gene for the expression of β-lactamase, conferring ampicillin or carbenicillin resistance (Fig. 8).
The affinity tag of choice in this strategy is the RGSHis 6 motif, which is present on a modified pTTQ18 between the PstI and the HindIII restriction sites, so incorporating the tag onto the carboxyl-terminus of the protein (Fig. 8). The orientation of the carboxyl-terminus of the protein is important since previous experience has shown that, if the carboxyl-terminus is periplasmic, the use of the hexahistidine tag will be unsuccessful (Saidijam, M., Baldwin, S.A., personal communications). A possible cause for this is the inability of the hydrophilic histidine tag to traverse the hydrophobic membrane domain. It is therefore necessary before cloning to assess the predicted topology of the protein, Table 7 Promoter and E. coli strain combinations used for the production of recombinant non-E. coli (heterologous) membrane proteins. 1 Cell-free expression of genes encoding membrane proteins using E. coli lysates; 2 The number of expression hosts is higher than the number of uMPS because some recombinant proteins were produced in several expression hosts. and investigate whether the location of the carboxyl-terminus of the protein is expected to be cytoplasmic or periplasmic, using topology prediction programmes such as TMHMM (www.cbs.dtu.dk/services/ TMHMM/). The location of the RGSHis 6 tag on the plasmid also dictates the use of the PstI restriction site to enable correct fusion of the protein with the tag. However, if PstI cannot be used because of an internal PstI site within the gene of interest, then the RGSHis 6 tag can instead be added at the primer level, and the HindIII restriction site used for restriction/ligation cloning.

Cloning of genes encoding membrane transport proteins
The PCR primers designed for use in many of our studies introduced EcoRI or NdeI and PstIorHindIII restriction sites at the 5ˊand 3ˊends of the gene respectively (Fig. 8). The reaction itself was conducted using a set of different melting/annealing/extension temperatures that varied depending on either the melting temperature of the primers or the GC content of the DNA to be amplified. Following successful amplification, the resulting DNA fragment is purified and digested with the relevant restriction enzymes. We always designed sticky-ended ligation of the DNA fragment with pTTQ18/ RGSHis 6 , using the DNA ligase enzyme. The freshly-ligated DNA is used to transform E. coli XL1-Blue cells for propagation of the plasmid and the resulting carbenicillin-resistant colonies are selected and screened using PCR. Size estimation of the amplified gene can be performed by agarose gel electrophoresis and used to confirm the presence of the  Tables 1-3. The graph shows the promoters used for the heterologous (black) and homologous (grey) production of membrane proteins in E. coli. NS: not specified; Native: the native promoter of the gene encoding the target membrane protein.

Table 9
Genotypes of E. coli strains used to produce recombinant membrane proteins for structural determination. Integrity of the cloned gene is more certainly established by DNA sequencing to confirm that the gene has been cloned without mutation and is inserted into pTTQ18 with the correct orientation. The pTTQ18 plasmid containing the sequenced gene is used to transform E.
coli BL21(DE3) cells for expression studies. An important alternative strategy is to synthesize the gene de novo incorporating the appropriate restriction sites for cloning, and also modifying the codon usage of heterologous genes so they fit better to the codon usage of E. coli.

4.3.
Optimising the production of recombinant membrane transport proteins from plasmid pTTQ18

Production and characterization of the protein YwtG as an exemplar
One example that we can consider in detail is the gene ywtG from Bacillus subtilis. The translated amino acid sequence indicates YwtG is a putative membrane transport protein, which from BLAST similarity searches is predicted to be a MFS sugar transporter. It shares 46% sequence identity with a D-xylose:proton symporter from L. brevis (XylT), 39% with an arabinose:proton symporter from B. subtilis (AraE), 38% with a major myo-inositol:proton transporter from B. subtilis (IolT) and 38% with a D-galactose:proton transporter from E. coli (GalP). Wild type YwtG consists of 457 amino acids with a calculated M r of 49,192.49 and is predicted by TMHMM to consist of 12 transmembrane helices with both the amino-and carboxyl-termini located in the cytoplasm. YwtG contains many of the characteristic elements of the sugar porter sub-family of the MFS including a long, central cytoplasmic loop, and the RGXRR sequence motif found between helices 2 and 3. As the topology of the protein allows for the addition of the carboxyl-terminal hexahistidine tag, the recombinant YwtG protein will contain 17 additional residues, increasing the M r to 51,041.44. Analysis of the ywtG gene revealed that there are no inherent EcoRI or PstI sites within the gene enabling the use of primers designed to introduce an EcoRI and PstI restriction site at the 5ˊand 3ˊends of the gene respectively. The ywtG gene was successfully amplified from B.
subtilis genomic DNA using an annealing temperature of 60°C. This fragment was digested with EcoRI and PstI, yielding a DNA fragment of 1.5 kbp (actual -1.371 kbp; Fig. 9A). This was ligated into pTTQ18 and used to transform E. coli XL1-Blue cells. The resulting carbenicillin resistant E. coli XL1-Blue colonies were PCR screened, which revealed six positives (Fig. 9B). These were cultured and the plasmid DNA extracted. Double restriction digestion analysis of pTTQ18/ywtG with EcoRI and PstI yielded two DNA fragments at 1.3 kbp and 4.6 kbp, which are similar in size to the gene ywtG (1.371 kbp) and pTTQ18/RGSHis 6 (4.59 kbp; Fig. 9C). DNA sequencing was performed on pTTQ18/ywtG, which revealed that the full length ywtG gene had been cloned successfully but one mutation was presentbase number 1000 was changed from guanine (G) to adenine (A), resulting in the YwtG(His) 6 mutant D334N. However, it is not known if this residue is important to structure or function.
E. coli BL21(DE3) host cells harbouring the plasmid pTTQ18/ywtG were grown in LB medium and induced with 0.5 mM IPTG. Total membranes were prepared and separated by SDS-PAGE. The Coomassie Brilliant Blue stained gel revealed a protein band in the membranes from the induced cells with an apparent mass of ∼31 kDa (well below the predicted Mr) that was absent in the uninduced cell membranes, which constituted 16% of the total membrane protein (Fig. 9D). A positive signal was observed on the Western blot that confirmed the identity of the YwtG(His) 6 protein (Fig. 9D). A minor signal was also observed in the uninduced cells, which is possibly due to 'leaky' derepression of the tac promoter on pTTQ18.

Anomalous migration of recombinant YwtG and other membrane transport proteins in SDS-PAGE gels
The relative molecular masses of protein bands observed in Coomassie-stained SDS-PAGE gels and Western blotting film were determined by comparing their migrating distances with those of standard protein molecular weight markers. There is a linear relationship between the log 10 M r of YwtG, BC0935, BC5418 and YhjI and the distance they migrate on the SDS-PAGE gel.
For many membrane proteins, and especially membrane transport proteins, boiling to solubilize in SDS before running the gel leads to irreversible aggregation and insolubility. Instead we routinely  solubilize the protein in SDS at temperatures of 30-60°C for 10-60 min. Such solubilized samples may contain partially-unfolded protein and/or sub-optimal SDS:protein ratios that lead to anomalous migration in the gel. Generally the observed molecular weight is less than predicted (Table 13), though there are often higher molecular weight bands that may represent completely unfolded protein, oligomers, or aggregates. In our experience, the anomalous lower molecular weight bands, as well as the higher ones, are an idiosyncrasy of SDS-protein behavioursee e.g. [50,53,[55][56][57][58] and their presence does not imply anything is wrong with the protein.

Dependence of recombinant protein yields on growth and induction conditions
The extent of growth before and after induction can vary, meaning that it is advisable to conduct trials that aim to maximize the amount of cells without compromising the level of the desired recombinant protein in the membrane. Usually, recombinant protein yields in amounts greater than 2-5% of the total inner membrane protein composition trigger cell toxicity, compromise growth and reduce the biomass yield of cells and membranes, whereas it is necessary to achieve levels of 10-50% in order to facilitate later purification and the minimization of contaminating proteins. Generally, 10% is regarded as satisfactory, 20-30% is desirable and often achieved, and 50% was achieved in only one case out of over 100 recombinant proteins produced. These higher levels seriously compromise cell growth and net production. Thus, a compromise needs to be arrived at where induction is left late to maximize the yield of cells, but not so late that the level of induction is reduced.
For each protein that is taken forward for characterization and purification, we try first to determine whether expression is best in rich or minimal medium. We then try a dose-response curve measuring the level of expression achieved in membranes exposed to zero and Fig. 6. Selection of bacterial strains for improved recombinant membrane protein production using GFP as a gene reporter. Isolation of bacterial mutant hosts was performed as described in Section 3.2 of this review and previously [30]. Briefly, the pMW7-GFP-Xa expression plasmid was transformed into BL21(DE3) cells and a single colony was inoculated in 50 ml 2*TY medium. At A 600 = 0.4, cells were diluted in water and 100 μl of the 10 −1 dilution were plated on an IPTG-containing plate. Plates were illuminated under (A) normal light (two small colonies that did not emit fluorescence are encircled) or (B) UV light (arrows indicate four large colonies that emitted a diffuse fluorescence). Panels (C) and (D) show two other independent experiments with petri dishes illuminated under UV light. for a complete description of the procedure. 5 Pre-warm the medium and use the pre-culture at 10 −2 dilution; when the plasmid is stable, antibiotic is no longer required in the large-scale culture.
increments within 0.01-2 mM IPTG. In some cases we have also explored 'autoinduction' using lactose/glucose mixtures [44]. This economises on expensive IPTG, but can take some time before achieving good results. An optimal situation is arrived at where larger scale (30-100 L) cell growth is conducted in fermenters [52] and extended to A 680 = 0.6-1.0, when inducer is added for 1-3 h before harvesting, cooling and freezing the concentrated cell suspension at −80°C for storage. In the great majority of expression studies using constructs in the pTTQ18 plasmid we have simply used a growth temperature of 37°C and maintained it during induction, but there are indications that lowering the temperature at the time of induction is beneficial. Provided the concentrated cells are kept frozen at −80°C, the recombinant proteins in the inner membrane that we have studied appear to be immortal. Aliquots of cells can be thawed, membranes prepared and the proteins purified any time later (yes, years), though their stability is not necessarily guaranteed during and after purification, of course.

Yeast expression systems for membrane protein production
Yeast is both microbial and eukaryotic, meaning it is quick, cheap and easy to culture, whilst having the post-translational pathways present in higher eukaryotic host cells that are absent in bacteria [62]. The two yeast species most widely used for recombinant membrane protein production are S. cerevisiae and P. pastoris [63,64] (Table 5). Both grow quickly in a range of complex and defined media (doubling times are typically 2.5 h when glucose is the carbon source) in vessels ranging from multi-well plates to shake flasks and bioreactors [64]. P. pastoris is notable for being able to grow to very high cell densities under controlled conditions where oxygenation rates are high (>100 g/L dry cell weight; >500 A 600 units/mL [19]) and therefore has the potential to produce large amounts of recombinant membrane protein for structural analysis. High-resolution crystal structures of the adenosine A 2A [65] and the histamine H 1 [66] GPCRs have been solved using recombinant protein derived from P. pastoris. More recently, a 2.9 Å resolution crystal structure was published of the first plant multidrug and toxic compound extrusion (MATE) transporter to be structurally characterized; the crystals were formed using recombinant protein synthesized in P. pastoris [67].
S. cerevisiae is notable for being supported by a more extensive literature than P. pastoris. Its genetics are also better understood (http:// www.yeastgenome.org/). This means that there is a much wider range of tools and strains for improved membrane protein production in this yeast. Recent examples of its use include the generation of the 4.4 Å cryo-EM structure of the rat TRPV2 channel [68] and the 3.0 Å crystal structure of the wild-type human GLUT1 glucose transporter in complex with cytochalasin [69].
The experimental strategy for obtaining the structure of the histamine H 1 receptor provides an example of making best use of the two yeast species' strengths: crystals were obtained from protein produced in P. pastoris, while initial screening to define the best expression construct was performed in S. cerevisiae [70]. In principle, many of the tools established for S. cerevisiae could be transferred to P. pastoris (for which a genome sequence was published in 2009 [71]) combining the strengths of both yeast species, although such work would be timeconsuming. In our laboratory, we often start with P. pastoris and, if the production is not straightforward, use S. cerevisiae to troubleshoot [64]. In the following sections, we include the production of the human GPCR, adenosine A 2A receptor (hA 2A R), in both species as an exemplar (Figs. 10 and 11).

Saccharomyces cerevisiae
In studies examining the host response to recombinant membrane protein production, the unfolded protein response [72] and altered ribosomal biogenesis [73] have been identified as major determinants of high yields in yeast, although the precise mechanistic reasons for this remain unclear. A comprehensive strain collection exists from which potential expression hosts can be selected, supported by information in the Saccharomyces Genome Database (http://www.yeastgenome.org/). The yeast deletion collections comprise over 21,000 mutant strains with precise start-to-stop deletions of approximately 6000 S. cerevisiae ORFs [74]. The collections include heterozygous and homozygous diploids as well as haploids of both MATa and MATα mating types. Individual strains or the complete collection can be obtained from Euroscarf (http://web.uni-frankfurt.de/fb15/mikro/euroscarf/) or the American Type Culture Collection (http://www.atcc.org/). Dharmacon sells the Yeast Tet-Promoters Hughes Collection (yTHC) with 800 essential yeast genes under control of a tetracycline-regulated promoter that permits experimental regulation of essential genes. A number of specificallyengineered S. cerevisiae strains also exists including those with 'humanized' sterol and glycosylation pathways [75]. Protease-deficient strains are a consistently-popular choice in membrane protein structural biology projects (Table 10 and 11). Often, the standard BY4741 laboratory strain (MATα, ura3Δ0, leu2Δ0, met15Δ0, his3Δ1) is a good start, but it is not always the most successful, as shown for the expression of hA 2A R ( Fig. 10A).
We previously selected four strains of S. cerevisiae for their ability to produce the aquaporin Fps1 in sufficient yield for further study [73]. Yields from the yeast strains spt3Δ, srb5Δ, gcn5Δ and yTHCBMS1 (supplemented with 0.5 μg/mL doxycycline) that had been transformed with an expression plasmid containing 249 base pairs of 5ˊuntranslated region (UTR) in addition to the primary FPS1 open reading frame (ORF) were 10-80 times higher than yields from wild-type cells expressing the same plasmid. One of the strains increased recombinant yields of hA 2A R and soluble green fluorescent protein (GFP); all but gcn5Δ were found to exhibit a block in translation initiation. Expression of the eukaryotic transcriptional activator GCN4 was increased in these strains and they also exhibited constitutive phosphorylation of the eukaryotic initiation factor, eIF2α. Both responses are indicative of a constitutively-stressed phenotype.
Investigation of the 5ˊUTR of FPS1 in the expression construct revealed two untranslated ORFs (uORF1 and uORF2) upstream of the primary ORF. Deletion of either uORF1 or uORF1 and uORF2 further improved recombinant yields in our four strains; the highest yields of the uORF deletions were obtained from wild-type cells. Frame-shifting the stop codon of the native uORF (uORF2) so that it extended into the FPS1 ORF did not substantially alter Fps1 yields in spt3Δ or wild-type cells, suggesting that high-yielding strains are able to bypass 5ˊuORFs in the FPS1 gene via leaky scanning, which is a known stress-response mechanism. Yields of recombinant hA 2A R, GFP and horseradish peroxidase could be improved in one or more of the yeast strains suggesting that a stressed phenotype may also be important in highyielding cell factories [76].
From these studies we concluded that regulation of Fps1 levels in yeast by translational control might be functionally important and the presence of a native uORF (uORF2) may be required to maintain low levels of Fps1 under normal conditions, but higher levels as part of a stress response. We also concluded that constitutively-stressed yeast strains may be useful high-yielding microbial cell factories for recombinant membrane protein production [76]. 5.1.2. Using selective advantage to improve membrane protein yields in S. cerevisiae Making the production of a target recombinant protein a condition for yeast cell survival should give producers a selective advantage over non-producers. This principle has been examined previously for the production of membrane proteins in prokaryotic hosts [77][78][79]. However, functional yields were not assessed; instead total yields were quantified by immunoblot [77][78][79]. We therefore investigated whether yeast cells could be given a selective advantage to produce high yields of hA 2A R by fusing it with the orotidine-5-monophosphate decarboxylase polypeptide (Ura3p). Ura3p catalyzes the sixth step in the de novo biosynthesis of uridine monophosphate in yeast and is required by ura3 deletion strains when they are cultured in uracil-deficient growth medium [80].
In the 1-step process, yeast cells were grown on solid uracil-deficient medium immediately following transformation; this generated a single colony designated A2U1 (A2SU1 was similarly generated following transformation of the yeast deletion mutant strain, spt3Δ). In the 2-step process, yeast cells were cultured on solid histidine-deficient medium following transformation and colonies were spotted onto solid uracildeficient medium (generating A2H1; Table 14 and Fig. 10).
The total yield of hA 2A R-Ura3p fusion proteins was analysed by immunoblot following transformation and selection on nutrient-deficient medium. Table 14 shows that the yield of hA 2A R-Ura3p from A2H1 was almost 7-fold higher than the yield of hA 2A R from the A2 control. No hA 2A R-Ura3p was detected from the A2U1 transformant. The yield of hA 2A R-Ura3p from A2SU1 was just over half that of A2, with spt3Δ:hA 2A R, showing no signal. Changes in the expression levels could also be determined using immunofluorescence staining (Fig. 10) where increased levels of the newly-synthesized A2 receptors could be seen along with the Ura3p-fusions in A2H1 and A2SU1 (Fig. 10). This suggests that making the production of a recombinant GPCR a condition for cell survival through nutrient selection is an effective method to increase total yield. To determine whether the hA 2A R-Ura3p we had produced was correctly folded and thereby estimate the functional yield of the hA 2A R moiety, a radio-ligand binding assay was performed [81]. Radio-ligand binding analysis was done using the well-characterized antagonist [ 3 H]ZM241385 [82] on 100 µg of total membrane extract from A2, A2H1 and A2U1. Table 14 shows that A2H1 produced only a minimal increase (1.6 ± 0.1 pmol mg −1 ) of correctly-folded hA 2A R-Ura3p compared to the A2 control (1.1 ± 0.1 pmol mg −1 ). The yield of hA 2A R-Ura3p from A2U1 was negligible (0.2 ± 0.01 pmol mg −1 ). These findings suggested that the protein produced using this strategy was a heterologous mixture of correctly folded (binding-competent) and misfolded (binding incompetent) protein. In contrast the functional yield of hA 2A R-Ura3p from A2SU1 was increased almost 3-fold (3.0 ± 0.2 pmol mg −1 ) over the A2 control and 6-fold over the mutant strain control spt3Δ:hA 2A R (Table 14). Notably, ligand binding activity could be recovered from A2H1, but not A2 or A2SU1, by solubilising the hA 2A R-Ura3p in n-dodecyl β-D-maltopyranoside (DDM ; Table 14). We validated single-point binding with full saturation curves for A2, A2H1 and A2SU1 in membranes. The B max values from these experiments validate the single-point saturation values. Affinity was determined using competition binding experiments, with the pK d being 8.3-8.6 for A2, A2H1 and A2SU1.
In order to rationalize why the functional yield was lower than the total yield, we examined the localization of hA 2A R-Ura3p using confocal microscopy following staining of yeast spheroplasts with a mouse anti-hexahistidine antibody followed by an Alexa488-conjugated goatanti-mouse antibody. Fig. 10A shows confocal images for BY4741 expressing no recombinant protein (panel i), the control plasmid pYX222-A2AR (panel ii) and A2H1 (panel iii); in the latter image, a vacuolar localization of the recombinant protein is observed. Fig. 10A, panel iv shows that vacuolar accumulation of the hA 2A R -Ura3p fusion could be reduced by using the BY4741 spt3Δ strain. Homologous competition radio-ligand binding with [ 3 H]ZM241385 (Fig. 10B) demonstrated that hA 2A R and hA 2A R-Ura3p had comparable pK d values (8.3-8.6) as reported in the literature [83].
When nutrient selection was used as a strategy to increase the yield of another recombinant GPCR, the human β2 adrenergic receptor (hβ 2A R), high yielding transformants were found to have been generated using the 2-step method. B2U1 and B2U5 gave a 3.2-and 2.5-fold increase in total yield and a 5-and 7-fold increase in functional yield (when solubilized with DDM) of hβ 2A R-Ura3p, respectively, compared to the B2 control. The 1-step method (leading to transformant B2H3) did not result in an increased functional yield following DDM-solubilization. However solubilization using 2.5% styrene maleic acid copolymer (SMA; 2:1 styrene to maleic acid ratio), increased the functional yield of the control B2 and B2H3 compared to DDM solubilization. The constructs used in these experiments were not truncated thermostable constructs that had been optimized for recombinant expression. Rather, they were the full wild-type sequences (that had been codon optimized for yeast). The data in Table 14 suggest that prior to solubilization, either the receptor was not correctly-folded in yeast membranes or that it was expressed below the limit of detection when we assayed 100 µg membranes. Extraction by surfactant and subsequent concentration resulted in function being detected suggesting that this process had recovered correctly-folded recombinant protein. This approach demonstrates the power of applying a selective advantage strategy to recombinant GPCR production and provides insight into the role of targeting and quality control.

Pichia pastoris
One notable and highly-beneficial feature of producing recombinant membrane proteins in P. pastoris (and that has been reviewed extensively elsewhere) is that exceptionally high yields of correctlyfolded protein can be obtained, especially under the tightly-controlled conditions achieved in bioreactor cultures. For example, yields of both human aquaporin 1 (hAQP1) and hA 2A R in bioreactors were more than double those achieved in equivalent shake flask cultures [19]. Moreover, the bioreactors produced higher quality membrane protein as determined by functional assay (more than 150 pmol/mg is reported in several studies [84]). Isolation of hAQP1 was possible at 90 mg/L, while yields of 13 mg/100 g cells were reported for a codon-optimized Pglycoprotein construct [19]. P. pastoris is therefore a highly attractive system for the production of folded, eukaryotic membrane proteins although yields remain protein dependent. P. pastoris expression plasmids are usually integrated into the yeast genome to produce a stable production strain. Since it is not possible to control precisely the number of copies that integrate, or indeed where they integrate within the genome, the optimal clone must be selected experimentally [85]. One approach is to screen on increasing concentrations of antibiotic (usually zeocin) to obtain so-called 'jackpot' clones. However, the correlation between the copy number of the integrated expression cassette (as determined by resistance to increasing zeocin concentrations) and the final yield of recombinant protein is not always positive [16]. Sometimes clones with lower copy numbers are more productive, suggesting that the cellular machinery is overwhelmed in jackpot clones (resulting in misfolded or degraded protein). Consistent with this idea, hA 2A R yields were increased 1.8-fold when the corresponding gene was co-expressed in P. pastoris with the stressresponse gene HAC1 [86]; Hac1 drives transcription of UPR genes.
In contrast to the situation in S. cerevisiae, far fewer P. pastoris Fig. 8. Strategy for cloning and expressing genes of bacterial membrane proteins using plasmid pTTQ18-His 6 . Each target gene was inserted into the multiple cloning site (MCS) downstream of the tac promoter in the plasmid pTTQ18-His 6 in order to amplify gene expression. Two different restriction enzymes, EcoRI and PstI were used to ensure correct orientation of the gene on ligation into plasmid pTTQ18-His 6 , as well as to prevent re-ligation of the plasmid. First, the membrane protein gene was amplified by PCR using bacterial genomic DNA as template and introducing EcoRI at the 5ˊ-end and PstI at the 3ˊ-end, followed by digestion with these two enzymes and ligation with EcoRI-PstIdigested pTTQ18-His 6 . The resulting plasmid construct with the gene inserted was then transformed into E. coli XL10-Gold cells, followed by colony PCR to identify positive clones. The sequence encoding the hexahistidine tag (yellow) is incorporated into the pTTQ18 plasmid so it is in frame with the ligated gene. This works well when the carboxyl terminus of the recombinant protein is finally located inside the cell membrane. However, if the carboxyl terminus is destined to be outside the cell membrane, translocation of the fused positively-charged histidines appears to compromise expression. In this latter case, fusion a Strep II tag (red) can be used instead. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) strains are available in which to integrate the expression plasmid for the generation of a recombinant production strain (Table 10 and 11). The wild-type strain, X33, the histidine auxotroph GS115, and the slowmethanol-utilization strain KM71H, have all been used to produce membrane proteins for structural studies [19]. Protease-deficient strains such as SMD1163, which lacks proteinase A and proteinase B, are also available (Table 10 and 11). In all these strains, P. pastoris (like S. cerevisiae) post-translationally glycosylates membrane proteins by adding core (Man) 8 -(GlcNAc) 2 groups, but not the higher-order structures found in humans and other mammals; compared to S. cerevisiae, the mannose chains also tend to be shorter. However, the effects of these non-native modifications are not necessarily detrimental and need to be assessed on a caseby-case basis [84]. The high-resolution structure of a glycosylated form of the Caenorhabditis elegans P-glycoprotein (using recombinant protein produced in P. pastoris) demonstrates that yeast glycosylation does not necessarily hinder crystal formation [87]. Nonetheless, in order to overcome potential bottlenecks in producing, purifying, characterizing and crystallizing human proteins in yeast, engineered strains have been developed including strains with 'humanized' glycosylation [88,89] and sterol pathways.
Most proteins produced in P. pastoris for structural biology use variations of the standard methanol induction protocol. Fig. 11 shows an example of the recombinant production and purification of hA 2A R following the 'Pichia Fermentation Process Guidelines' (Invitrogen). The hA 2A R protein in this study was tagged with an amino-terminal decahistidine-tag and incorporated an N154Q mutation to prevent glycosylation (the corresponding gene was expressed from the pPICZαA expression plasmid). Cells were cultured in a bioreactor and depletion of glycerol in the initial glycerol batch phase was indicated by a spike in the dissolved oxygen (DO) reading. This was followed by a fed-batch phase with a 50% (w/v) glycerol solution and a 3 h starvation phase to achieve complete glycerol consumption. During the final hour of starvation, the temperature was reduced from 30°C to 22°C and allowed to stabilize. Theophylline, a non-selective hA 2A R antagonist (10 mM) was then added to the culture to increase stabilization during expression.

Table 13
Predicted and calculated sizes of membrane proteins. Protein sizes were calculated from both Coomassie-stained and immunoblotted SDS-PAGE gels. YwtG from Bacillus subtilis is predicted to be a sugar transport protein, BC0935 from Bacillus cereus a dicarboxylate/α-ketoglutarate transporter, BC5418 from B. cereus a sugar/ metabolite transporter and Yhj1 from B. subtilis a glucose transporter. The data for YwtG are from a different experiment from that shown in Fig. 9, explaining the minor discrepancies in apparent molecular masses. The cells were induced with 100% methanol at an initial feed rate of 1.92 ml/h for 17 h to allow adaptation to methanol. When a steady DO rate and fast DO spike time were obtained, the feed rate was increased to 3.96 ml/h for the remainder of the culture duration. The entire methanol fed-batch phase lasted approximately 40 h with a total of ∼125 ml of methanol fed per litre of initial volume. The cells were then harvested by centrifugation. In a study of the regulation of carbon substrate utilization, we cultured wild-type P. pastoris cells in methanol and found that a higher proportion of the total mRNA pool was associated with two or more ribosomes (and therefore judged to be highly translated) compared to the same cells cultured in any other non-inducing growth condition Confocal microscopy visualization of (i) control BY4241 cells, (ii) recombinant hA 2A R produced from transformant A2, (iii) recombinant hA 2A R-Ura3p expressed after a 1-step selection from transformant A2H2 and (iv) recombinant hA 2A R-Ura3p expressed after a 2-step selection from transformant A2SU1. Cells were grown to A 600 =4-5 and were visualized using rabbit anti-His 6 (Clontech) as the primary antibody and an Alexa-Fluor488-conjugated antirabbit secondary antibody. (B) Homologous competition binding experiments were performed for hA 2A R/hA 2A R-Ura3p produced from A2, A2H2 and A2SU1 using labelled ([ 3 H]) and unlabelled ZM241385. Experiments were done on 100 µg total membrane protein, with A2 acting as the control. Error bars represent the standard deviation (n = 3).

Table 14
Characterization of recombinant hA 2A R/hA 2A R-Ura3p-and β 2 AR/β 2 AR-Ura3p-producing transformants. Confocal microscopy visualization of recombinant hA 2A R/ hA 2A R-Ura3p in transformed S. cerevisiae using AlexFluor488 antibodies was done to assess whether hA 2A R/hA 2A R-Ura3p was localized in the membrane or had been internalized to the vacuole. Immunoblots (50 µg total membrane protein loaded per well, as determined by BCA assay, and probed with Clontech anti-His 6 antibody) were quantified using ImageJ. This allowed comparison of recombinant protein yield from A2H1, A2U1 and A2SU1 (BY4741 or spt3Δ were transformed with pYX222-hA2AR-URA3 and grown under conditions of nutrient selection) compared with the A2 control (BY4741 transformed with pYX222-hA2AR)orspt3Δ:hA 2A R (spt3Δ transformed with pYX222-hA2AR). For hβ 2A R, B2H3, B2U1 and B2U5 were compared with the B2 control. The functional yield for hA 2A R/hA 2A R-Ura3p or β 2 AR/β 2 AR-Ura3p in yeast cell membranes or following solubilization with 2.5% DDM, 0.5% CHS was determined by single-point saturation binding using the antagonists [ 3 H]ZM241385 or [ 3 H]CGP 12 177, respectively and 100 µg total membrane protein per experiment, as described in [73]. A 1-way ANOVA with a Holm-Sidak's multiple comparison test gave p = 0.001 (***) for A2H1 (without DDM treatment) versus A2H1(solubilized with DDM). Additionally, β 2 AR/β 2 AR-Ura3p was solubilized using 2.5% (w/v) styrene maleic acid (SMA) polymer with a 2:1 ratio of styrene to maleic acid. All data are derived from at least 3 independent biological replicates, with error ± SEM in parenthesis, where applicable.  [90]. This observation suggests that high recombinant protein yields in methanol-grown cells are due not just to promoter strength, but also to the global response of P. pastoris to growth on methanol [90]. We have also demonstrated pre-induction expression under the control of P AOX1 [91], suggesting that the uncoupling of growth and protein synthesis in P. pastoris cells has not yet been achieved and may provide opportunities for future optimization studies.
6. An overview of detergent usage for microbially-produced recombinant membrane proteins For membrane protein investigations, the choice of the detergent is crucial, as a suitable one is needed to prepare a pure, stable and monodispersed protein in solution but also to grow well-ordered crystals without preventing crystal contacts. As a consequence, the best detergent for solubilization is often not the best for crystallization and a detergent exchange procedure during purification is a common approach (Tables 1-4). Notably, more than 50% of the membrane proteins in the PDB have been crystallized in a detergent or a detergent mixture that is different from the detergent used for membrane protein solubilization (Fig. 12).
Folded membrane proteins in their native membranes can usually be solubilized with detergent. However after production in heterologous membranes, it is frequently found that recombinant membrane proteins are difficult to solubilize. Our simple solubilization screen compares the solubility of the target membrane protein at 1 mg/mL in three different detergents ten times above their critical micellar concentration (cmc): DDM (1% final concentration), FC12 (1%) and SDS (2%). After 1 h incubation at 4°C on a stirring wheel, insoluble material is removed by ultracentrifugation at 100,000g for 30 min. The pellet is resuspended with TEP buffer (0.25 mM EDTA, 0.1 mM phenylmethylsulfonyl fluoride and 10 mM Tris-Cl pH 7.8; same volume as that of the supernatant) and both solubilized and non-solubilized fractions are loaded onto an SDS-PAGE gel. If the target membrane protein is solubilized only by SDS, it is likely to be in inclusion bodies (these structures are most frequently associated with bacterial expression systems). If it is solubilized by all three detergents, then it is likely to be well-folded. If DDM cannot solubilize the target membrane protein, experience tells us that it is likely to be misfolded. However, there are many combinations of detergent that can be used to potentially overcome this problem.
The data presented in Fig. 12 show significant differences between detergents used in solubilization and crystallization of β-barrel (Fig. 12A, Supplementary Table 1) and α-helical ( Fig. 12B-D, Supplementary Table 1) membrane proteins. For monotopic and α-helical membrane proteins, the most common detergents used for solubilization are by far the maltosides (78%, 72% and 59% for E. coli, S. cerevisiae and P. pastoris expression systems, respectively) and with DDM contributing more than 80% to this detergent family. DDM is a mild, low cmc, long alkyl chain detergent and has been found to be very stabilising explaining its success in maintaining dynamic membrane proteins in solution. However its large micelles are not well adapted to form ordered, well-diffracting crystals because they limit essential crystal contacts. Therefore, while maltosides are the major detergents used for crystallization of α-helical membrane proteins produced in E. coli (48%), an important contribution is made by the smaller glucosides (15%) and especially OG, followed by detergent mixtures (15%) and Cymal detergents (6%). For membrane proteins produced in S. cerevisiae and P. pastoris, the profile of detergent usage is similar with the exception of neopentylglycol detergents that are used more often (7% and 14%, respectively).
Glucosides (OG, NG) have a high cmc and form small micelles allowing better packing in crystal lattices and resulting in better diffracting crystals [92,93]. OG in particular has been used successfully for channel proteins [92] such as the five OG-crystallized aquaporins derived from P. pastoris. The success of detergent mixtures also suggests that combinations of detergents (mostly with small micelle-sized detergents) have a useful contribution to make.
For β-barrels, the three most successful detergents for solubilization are detergent mixtures (24%), the zwitterionic amine oxide detergents (17%) and the maltosides DDM/DM (12%). Strikingly, C8E4, C10E5 or C10E6 are the best detergents for crystallization accounting for half of the structures (42%) followed by detergent mixtures (18%), and amine oxide detergents (16%), in agreement with an earlier study [94]. The high stability of the β-barrel fold supports the use of smaller-micellesize detergents and more destabilizing detergents.

Conclusions
Microbes have an important role to play in membrane protein structural biology projects. E. coli, P. pastoris and S. cerevisiae have together been used to produce 71% of all unique structures in the PDB that were derived from recombinant sources. In this review we have focused on an analysis of the host strains, tags and promoters that, in our experience, are most likely to yield protein suitable for structural and functional characterization. We have also exemplified some of our preferred protocols. There are, of course, may other factors that could be considered including codon optimization, mutagenesis, the use of other microbes, engineering of the membrane lipid composition and an in-depth analysis of the culture medium composition. We note, however, that in many cases the approaches we have catalogued provide the requisite quantity and quality of protein for further study.
One of the major challenges in the forthcoming years will be to overcome the barrier of producing complex eukaryotic membrane proteins in microbial systems. There are numerous reports of misfolded recombinant proteins being produced in human cells and tuning human promoters to favour efficient folding is still in its infancy. In contrast there are several initiatives to 'humanize' microorganisms. For instance S. cerevisiae has been engineered to synthetize cholesterol instead of ergosterol in order to favour the activity of human GPCRs in yeast membranes. Some T7RNAP-based E. coli strains are fully devoid of lipopolysaccharides and are now recognized as being as safe as Lactobacillus. Finally, the genetic diversity of microorganisms is now a source of inspiration. For instance some groups are developing semisynthetic hosts based on magnetotactic bacteria that contain sophisticated intracellular organelles [95]. We therefore anticipate that microbes will continue to make important contributions to the production of recombinant membrane proteins from a range of prokaryotic and eukaryotic organisms.