Regulation of multispanning membrane protein topology via post-translational annealing

The canonical mechanism for multispanning membrane protein topogenesis suggests that protein topology is established during cotranslational membrane integration. However, this mechanism is inconsistent with the behavior of EmrE, a dual-topology protein for which the mutation of positively charged loop residues, even close to the C-terminus, leads to dramatic shifts in its topology. We use coarse-grained simulations to investigate the Sec-facilitated membrane integration of EmrE and its mutants on realistic biological timescales. This work reveals a mechanism for regulating membrane-protein topogenesis, in which initially misintegrated configurations of the proteins undergo post-translational annealing to reach fully integrated multispanning topologies. The energetic barriers associated with this post-translational annealing process enforce kinetic pathways that dictate the topology of the fully integrated proteins. The proposed mechanism agrees well with the experimentally observed features of EmrE topogenesis and provides a range of experimentally testable predictions regarding the effect of translocon mutations on membrane protein topogenesis. DOI: http://dx.doi.org/10.7554/eLife.08697.001


Introduction
Integral membrane proteins (IMPs) are central to cellular functions that include signal transduction, transport across the cell membrane, and energy conversion. Performing these roles requires integration of the IMPs into the membrane with the correct topology (i.e., the correct orientation of the fully integrated IMP relative to the membrane). In most cases, membrane integration proceeds via the Sec translocon, a conserved protein-conducting channel located in the endoplasmic reticulum membrane in eukaryotes or in the plasma membrane in bacteria (White and von Heijne, 2004). During this process, the ribosome or other molecular motor docks to the cytoplasmic opening of the translocon, feeding the nascent protein into the translocon channel (Shao and Hegde, 2011); conformational changes in the lateral gate (LG) helices of the translocon then allow sufficiently hydrophobic segments of the nascent protein to integrate as transmembrane domains (TMD) (Hessa et al., 2005;Egea and Stroud, 2010;Zhang and Miller, 2010;Gogala et al., 2014). The orientation of a single TMD relative to the membrane is determined by factors that include the hydrophobicity of the TMD and the charge and length of the soluble loops that flank the TMD Spiess, 2001, 2003;Devaraneni et al., 2011). However, the extent to which these factors influence the topology of multispanning IMPs is less clear.
The conventional model of multispanning IMP topogenesis assumes that a single dominant topology is established via the successive integration of TMDs that thread back-and-forth across the membrane in alternating orientations (Blobel, 1980;Wessels and Spiess, 1988;Sadlish et al., 2005). In this cotranslational model, the dominant IMP topology is determined by the orientation of the N-terminal TMD and is primarily dictated by the features of that leading TMD (Hartmann et al., 1989;Borel and Simon, 1996;Dale et al., 2000). However, the cotranslational model is challenged by dualtopology proteins, which exhibit both possible orientations of the fully integrated IMP with respect to the membrane in approximately 1:1 stoichiometry (Rapp et al., 2006(Rapp et al., , 2007. The most thoroughly studied dual-topology protein is the bacterial multidrug transporter EmrE (Chen et al., 2007), which can be biased in favor of a single dominant topology by introducing positive charges to any of its soluble loops (Rapp et al., 2006(Rapp et al., , 2007Seppälä et al., 2010). The dominant topology of each EmrE mutant retains the loop with the additional positive charges in the cytoplasm (Seppälä et al., 2010), apparently satisfying the empirical trend known as the 'positive-inside' rule which notes that the combined charges of the cytoplasmic loops (i.e., K+R bias) of an IMP correlates with its dominant topology (von Heijne, 1986). Surprisingly, adding charges to even C-terminal loops can influence the dominant topology of EmrE, suggesting that such mutations have a long-range effect on the orientation of previously-translated TMDs. This finding is inconsistent with the cotranslational model and raises interesting questions about IMP topogenesis. At what point is IMP topology established with respect to ribosomal translation? Are TMD orientations locked-in during the period in which the nascent IMP is attached to the ribosome (i.e., cotranslationally) or do TMD orientations remain subject to change even upon completion of ribosomal translation (i.e., post-translationally)?
In this work, we simulate the topogenesis of EmrE and its mutants to address limitations in the cotranslational model of IMP topogenesis by understanding when IMP topology is established (co-or post-translationally) and how topology is regulated. We use a coarse-grained (CG) model that enables access to a timescale of minutes while retaining sufficient chemical accuracy to capture the forces that drive membrane integration (Zhang and Miller, 2012a). The distribution of topologies predicted by the simulations are in good agreement with previous experimental findings (Rapp et al., 2007;Seppälä et al., 2010). The simulation results show that TMDs in the dual-topology mutants do not completely integrate by the end of translation; instead, the slow post-translational flipping of loops across the membrane allows misintegrated TMDs to reorient and insert into the membrane. The fully integrated topology is determined by the position of the loop that undergoes flipping most slowly. This work elucidates the mechanism by which dual-topology protein topology is established, reconciles dominant protein topologies with the positive-inside rule, and predicts the role that the translocon plays in mediating multispanning IMP topogenesis. Other examples of post-translational eLife digest Proteins are long chains of smaller molecules called amino acids, and are built inside cells by a molecular machine called the ribosome. Many important proteins must be inserted into the membrane that surrounds each cell in order to carry out their role. As these proteins are being built by the ribosome, they thread their way into a membrane-spanning channel (called the translocon) from the inner side of the membrane. Short segments of these integral membrane proteins (called transmembrane domains) then become embedded in the membrane, while other parts of the protein remain on either side of the membrane.
For a membrane protein to work properly, the end of each of its transmembrane domains must be on the correct side of the membrane (i.e., the protein must obtain the correct 'topology'). The conventional model for this process suggests that topology is fixed when the first transmembrane domain of a protein is initially integrated into the membrane, while the ribosome is still building the protein. This model can explain most integral membrane proteins, which only have a single topology. However, it cannot explain the family of membrane proteins that have an almost equal chance of adopting one of two different topologies (so-called 'dual-topology proteins').
Van Lehn et al. have now used computer modeling to simulate how a bacterial protein called EmrE (which is a dual-topology protein) integrates into the membrane via the translocon. The results reveal that a few transmembrane domains in EmrE do not fully integrate into the membrane while the ribosome is building the protein. Instead, these transmembrane domains slowly integrate after the ribosome has finished its job.
These findings contradict the conventional model and suggest that some membrane proteins only become fully integrated after the protein-building process is complete. The next step in this work is to experimentally test predictions from the computer simulations.

Coarse-grained model
The cotranslational integration and topogenesis of EmrE and its mutants is simulated using a recently developed CG model (Zhang and Miller, 2012a), which we employ essentially unchanged from its original introduction. Figure 1 illustrates the CG representation of a nascent protein and the protocol for simulating membrane integration. The ribosome, translocon, and nascent protein are all composed of CG beads. Each bead has a diameter of σ = 0.8 nm to represent approximately three amino-acid residues. This bead diameter is similar to the Kuhn length of polypeptides (Staple et al., 2008) so that the nascent protein can be treated as a freely jointed chain. The surrounding solvent and lipid bilayer are included implicitly, a technique that is used in other CG models of the translocon (Rychkova and Warshel, 2013). The time-evolution of nascent protein configurations is calculated using Brownian dynamics with a 100 ns timestep. The kinetics of the LG are modeled as stochastic transitions between a closed conformation, which prevents the nascent protein from exiting from the Figure 1. Schematic of Sec-mediated cotranslational integration of EmrE and corresponding simulation representation. (A) At top, an illustration of the structural motifs in EmrE, including indication of the charged residues in the soluble loops with black circles and the transmembrane domain (TMD)/loop numbering scheme that is employed in the text; below, the corresponding sequence of coarse-grained (CG) beads that represent the EmrE amino-acid sequence. TMDs and loops are assigned based on the hydropathy plot and consensus topology prediction shown in Figure 1-figure supplement 1. (B) At top, a schematic illustration of the sequential integration of TMDs to obtain a multispanning N peri /C peri topology, in which both the N-and C-terminal loops are positioned in the periplasm, according to the cotranslational model; below, representative simulation snapshots of EmrE as the nascent protein grows during translation, integrates into the membrane, and exits the channel in the N peri /C peri multispanning topology. The nascent protein is colored according to the legend at top, the ribosome is brown, and the translocon is green with translocon charges labeled explicitly. DOI: 10.7554/eLife.08697.003 The following figure supplements are available for figure 1: channel interior to the membrane, and an open conformation, which removes the barrier to membrane entry. All bead positions are projected onto the plane that passes along the translocon channel axis between the helices forming the LG. This off-lattice 2D approximation reflects the cylindrical geometry of the channel and is inspired by previous models of biopolymer translocation through nanopores (Huopaniemi et al., 2006). Beads representing the ribosome enclosure and translocon are placed to approximate their structures (Van den Berg et al., 2004;Frauenfeld et al., 2011). Two negative charges are placed on a bead at the cytosolic end of the translocon LG, whereas two positive charges are placed on a bead at the periplasmic end of the translocon LG. This charge distribution reflects the position of conserved charged residues (White and von Heijne, 2004) near the translocon LG that have been previously shown to affect single-spanning protein topogenesis (Goder et al., 2004). Full details of the model are provided in Appendix 1.
The CG model is well-suited to simulating the kinetics of cotranslational IMP integration, a process that is challenging for atomistic models (Zhang and Miller, 2010;Gumbart et al., 2011;Zhang and Miller, 2012b;Rychkova and Warshel, 2013) due to the large system size (>100,000 atoms) and the long timescale (minutes) of translation. We note that the model does not include nascent protein secondary/tertiary structure, charged lipids, protein chaperones, or an electrostatic potential across the membrane. However, the model does include explicit LG/translation dynamics, electrostatic interactions with the translocon, water/bilayer transfer free energies, and a direct mapping between the nascent protein sequence and the CG representation. The model thus captures the major physicochemical features of the translocon-membrane system (White and von Heijne, 2004). Moreover, the model has been shown to accurately predict features of single-spanning IMP integration and topogenesis (Zhang and Miller, 2012a), including the sigmoidal dependence of stop-transfer efficiency on TMD hydrophobicity (Hessa et al., 2005), the inversion of signal-anchor orientation during translation (Goder and Spiess, 2003), and the effect of translation rates and sequence features on signal-anchor orientation (Goder and Spiess, 2003). In particular, the model has been shown (Zhang and Miller, 2012a) to correctly describe integration processes that are governed either by thermodynamics (Hessa et al., 2005) or kinetics (Goder and Spiess, 2003), and it has provided a means of understanding the competition between such effects. The model has also been shown to correctly predict the dominant topology for a three-TMD multispanning IMP with a strong positive-inside bias (Zhang and Miller, 2012a). The strong agreement between simulation and experimental results presented in this work further indicates that IMP topological determinants are captured at this CG resolution.

EmrE protein
The EmrE amino-acid sequence includes four hydrophobic domains and five hydrophilic loops, according to both the hydropathy plot and consensus topology prediction shown in Figure 1-figure supplement 1. The hydropathy plot was calculated using the Wimley-White hydrophobicity scale (Wimley et al., 1996). The black line in the hydropathy plot indicates the water-octanol transfer free energy per residue and the overlaid red line shows a moving average using a 7-residue window. The consensus topology prediction was generated by the TOPCONS 1.0 server (Bernsel et al., 2009) and agrees with previous representations of EmrE structural elements (Seppälä et al., 2010). Shaded regions in the hydropathy plot indicate the predicted TMDs and loops.
In the CG model, each TMD is represented by four CG beads and each soluble loop is represented by five CG beads, as seen in Figure 1A. The CG beads assume one of four types as determined by the associated amino-acid residues in the nascent protein; these CG bead-types include V (moderately hydrophobic), L (very hydrophobic), Q (neutral-hydrophilic), and K (positively charged). Among these types, the CG beads vary with respect to their charge and their water/membrane transfer free energies (Appendix table 1). In the hydropathy profile, the N-terminal TMD (TMD1) is less hydrophobic than the other three TMDs, so its beads are assigned the V bead type. All other TMD beads are assigned the L bead type. Beads in each soluble loop are assigned to either the K or Q bead type, depending on the location of positive charges in the amino-acid sequence; positive charges are highlighted in red in the EmrE wild-type amino-acid sequence in Figure 1-figure supplement 1. Each K bead type is assigned a +2 charge, following previous work (Zhang and Miller, 2012a). Negative charges are excluded from the CG representation of EmrE, because EmrE exhibits a small number of such charges (Figure 1-figure supplement 1) and because the experimentally studied EmrE mutations focus only on the addition/removal of positively charged residues (Seppälä et al., 2010). Nonetheless, the effect of negatively charged residues in the CG simulation was explicitly tested in Using the CG model, we consider a series of EmrE mutants from Rapp et al. (2007) and Seppälä et al. (2010). We include EmrE mutants with single charge mutations-K3, T28R, A52K, L85R, and R111-from Seppälä et al. (2010) and EmrE mutants with single dominant topologies-EmrE(N cyto ) and EmrE(N peri )-from Rapp et al. (2007). We also consider a series of mutants in which the protein has either zero positive charge or positive charges in only a single loop-nEmrE, nK3, nT28R 1 , nT28R 2 , nT28R, nA52K, nL85R, and nR111-from Seppälä et al. (2010). This list includes all 16 of the EmrE and nEmrE mutants with single added charges studied experimentally by Seppälä et al. (2010); mutants with added C-terminal His residues or an extra TMD are not considered. Finally, we include a 'cotranslationally-biased', or CB, mutant that has elongated, 10-bead hydrophilic loops and two positives charges in the first, third, and fifth loops to create a strong K+R bias that favors a N cyto /C cyto topology (i.e., with both the N-terminal and C-terminal loops in the cytoplasm) according to the positive-inside rule (von Heijne, 1986;Rapp et al., 2006); this protein is expected to be strongly biased towards membrane integration via the cotranslational mechanism, providing a useful comparison with the other EmrE mutants. The CG representation of each mutant is listed in Appendix table 2; for each mutant, charge mutations are reflected by changing between Q-type and K-type beads at the appropriate point in the sequence. Despite its simplicity, we emphasize that the CG representation captures the major features of EmrE and its mutants, including the number of TMDs/loops and the distribution of charges.

Simulation protocol
As illustrated in Figure 1B, the dynamics of the ribosome/nascent protein/translocon complex is directly simulated using the CG model. Each CG trajectory is initiated with a short nascent protein attached to the ribosome exit channel; as a function of time, the nascent protein grows in length (while remaining attached to the ribosome) until it completes translation and is released from the ribosome. The dynamics of the nascent protein continue to be simulated until the protein reaches a fully integrated topology.
Simulations are initialized from equilibrated configurations of the nascent protein, initially comprised of 9 CG beads, with the C-terminus attached to the ribosome exit channel (Figure 1-figure supplement 2). Translation is performed by adding a new CG bead to the C-terminus of the nascent protein and attaching it to the ribosome exit channel; the previous C-terminus is released from the exit channel. The simulation is then continued for 125 ms before the next bead is added, a simulation time which corresponds to a translation rate of 24 residues/s (Bilgin et al., 1992). At the end of translation, the C-terminus is released from the ribosome exit channel and simulations are continued until all beads in the TMDs are at least 4.5σ from the origin and integrated with either a N cyto /C cyto or N peri /C peri topology. The ribosome remains bound to the translocon for the duration of all simulations (Potter and Nicchitta, 2002;Schaltetzky and Rapoport, 2006). The distance threshold ensures that the final configuration of the protein has exited from both the ribosome and translocon channel.
The trajectory termination criteria are designed to examine the effects of the Sec-facilitated membrane integration process on EmrE topogenesis. Specifically, it is assumed that upon reaching configurations in which all of the TMDs are integrated into the membrane, the protein topology remains irreversibly fixed for all subsequent times; physical processes that may lead to this irreversibility include the dimerization of EmrE proteins to form functional channels in the membrane (Lloris-Garcerá et al., 2012) or the degradation of undimerized EmrE proteins prior to topological inversion (Woodall et al., 2015). Given the symmetry of the membrane-protein interactions in the absence of the translocon, if the CG trajectories were allowed to run for infinitely long times to reach full equilibration after diffusing away from the translocon, the relative probability of the N cyto /C cyto and N peri /C peri topologies would be equal, regardless of the protein sequence. The employed trajectory termination criteria thus isolate the role of the non-equilibrium integration process in determining IMP topology. Demonstration of the robustness of the reported results to the cutoff values employed in the trajectory termination criteria are provided in the Robustness checks for the trajectory termination criteria section of the 'Materials and methods'.
The integration and orientation of a TMD is interpreted from the positions of hydrophobic beads in each TMD and the third bead in each hydrophilic loop. The coordinate system is defined with the x-axis perpendicular to the bilayer (Figure 1-figure supplement 2). The origin is placed at the center of the channel such that negative x-values indicate cytoplasmic positions. A TMD is considered integrated if −2σ ≤ x ≤ 2σ for all four hydrophobic beads, corresponding to positions within the implicit bilayer, and if all y-positions are outside of the translocon interior. A loop is considered to be in the cytoplasm if the position of the reference bead satisfies x < −σ and in the periplasm if x > σ. The N cyto /C cyto topology is reached if the first, third, and fifth loops are positioned in the cytoplasm and the second and fourth loops are positioned in the periplasm. The N peri /C peri topology has the opposite loop positions as shown in Figure 1B.
For each mutant, 250 independent trajectories are performed for a total of 4000 CG trajectories and nearly 6000 min of aggregate simulation time. Error bars measure the standard error between 2 blocks of 125 simulated trajectories. Complete system configurations are saved every 50 ms while loop positions and TMD orientations are saved every 1 ms.

Simulations match experimental observations of topology
For all 16 of the EmrE and nEmrE mutants with single added charges studied by Seppälä et al. (2010), Figure 2 compares the experimentally observed IMP topologies with the prediction from the CG  model. Specifically, the figure compares the fraction of fully integrated proteins that adopt the N cyto /C cyto topology, with the remainder in the N peri /C peri topology. The top and bottom rows show variants of EmrE and nEmrE respectively. Each mutant differs only in the number and location of charges in the hydrophilic loops. A schematic of each mutant drawn with the dominant topology predicted from simulations is included; positive charges are indicated as filled-in circles with additional charges relative to EmrE (top row) or nEmrE (bottom row) highlighted in red. The topologies determined experimentally in Seppälä et al. (2010) are expressed as the fraction of N cyto /C cyto topologies by dividing the cell activity of each protein coexpressed with the EmrE(N peri ) mutant by the total growth of the protein coexpressed with either the EmrE(N peri ) or EmrE(N cyto ) mutant (Seppälä et al., 2010), as described in the Experimental interpretation of EmrE topology section of the 'Materials and methods'.
It is clear from Figure 2 that the simulations are in excellent qualitative agreement with the experiments by correctly predicting the near 1:1 stoichiometry of wild-type EmrE and identifying the dominant topology for nearly all of the proteins considered. Figure 2-figure supplement 2 illustrates that the distribution of topologies determined experimentally and the distribution of topologies measured from the simulations are linearly correlated (Pearson correlation coefficient, r = 0.92); points lying in the two shaded quadrants of the graph correspond to proteins for which the simulations and experiments predict consistent topologies. All mutants, with the exception of A52K, have the same dominant topology in the simulations as in the experiments within the statistical error. The agreement between simulations and experiments suggests that the CG model correctly reproduces the essential molecular features of topogenesis; in the following, we analyze the ensembles of CG trajectories that give rise to these computed IMP topologies.

Dual-topology proteins exhibit slow post-translational integration
To investigate the molecular processes that govern the establishment of EmrE topology, we first examine the kinetics by which fully integrated topologies are reached. As a function of time, Figure 3A shows the fraction of CG trajectories in which the studied protein reaches a fully integrated topology for several EmrE mutants and the CB mutant. 0 s corresponds to the end of translation and negative values of time correspond to the period that precedes the end of ribosomal translation in which the nascent protein is still attached to the ribosome. Over 90% of the CB mutant trajectories reach the N cyto /C cyto topology within 3 s following the completion of translation and thus rapidly integrate as expected for the cotranslational model (Blobel, 1980;Wessels and Spiess, 1988;Sadlish et al., 2005); mechanistic features of individual TMD integration steps are discussed in the Cotranslational integration pathways section of the 'Materials and methods'. In contrast, all variants of EmrE reach a fully integrated topology much more slowly, requiring hundreds of seconds for some CG trajectories to fully integrate (see also Figure 3-figure supplement 2).
The slow post-translational integration of the dual-topology EmrE mutants is due to the fact that a significant fraction of trajectories exhibit configurations in which some TMDs are not fully integrated at the end of translation. As a function of time, Figure 3B shows the fraction of CG trajectories in which each TMD is integrated for both the CB mutant (top) and EmrE (bottom). TMDs in the CB mutant integrate sequentially with near 100% efficiency during translation, which is consistent with the standard cotranslational model of topogenesis (c.f. Figure 1) and explains the rapid timescale for fully integrating into a multispanning topology shown in Figure 3A. In contrast, the TMDs of EmrE exhibit only partial integration, even at long times following the completion of translation. Snapshots of a typical misintegrated TMD in EmrE are shown in Figure 3B. Various experiments have indicated that such configurations with misintegrated TMDs arise due to frustration from charges placed in consecutive loops (Gafvelin and von Heijne, 1994), the strong orientational preference of a neighboring TMD (Öjemalm et al., 2012), or the weak stop-transfer efficiency of marginally hydrophobic TMDs (Moss et al., 1998). Consistent with these experimental observations, the simulations in Figure 3B find that the weakly hydrophobic TMD1 of EmrE integrates the least efficiently, followed by TMD4 which is flanked by two charged loops.

The proposed mechanism
Kinetic annealing of the end-of-translation ensemble Analysis of the simulated CG trajectories reveals a straightforward molecular mechanism by which the multispanning topology of EmrE and its mutants is established. This mechanism, which we refer to as kinetic annealing of the end-of-translation (EOT) ensemble, is illustrated in Figure 4 and involves two basic steps. In the first step, the cotranslational integration (or misintegration) of each TMD leads to an ensemble of IMP configurations associated with the time at which ribosomal translation completes; we call this set of configurations the EOT ensemble. In the second step of the proposed mechanism, configurations in the EOT ensemble anneal toward a fully integrated topology as a function of time as loops posttranslationally flip across the membrane. The rate at which the soluble loops undergo posttranslational flipping is a key determinant of the fully integrated topology. We will show that this mechanism explains the unexpected elements of EmrE topogenesis observed experimentally, including the topogenic effect of C-terminal mutations (Seppälä et al., 2010).
The first step of the proposed mechanism is presented in Figure 4A and Figure 4B in greater detail. As illustrated in Figure 4A, the EOT ensemble of each mutant is determined cotranslationally as TMDs exit the translocon. Differences in loop charges in the various mutants leads to variation in the corresponding EOT ensembles, because electrostatic interactions between highly-charged loops and the translocon favor their cytoplasmic retention (Goder et al., 2004;Zhang and Miller, 2012a). Figure 4A shows representative members of the EOT ensemble for the EmrE, T28R, and nR111 mutants with the most-charged loop of each mutant highlighted in red. The EOT ensemble is defined as the set of configurations visited by a given nascent protein within 1 s of simulation time following the termination of ribosomal translation. The schematics indicate how various TMDs integrate or misintegrate to give rise to heterogeneity in the EOT ensemble of configurations, while the loops with added charges preferentially obtain cytoplasmic positions. Figure 4B further quantifies the cytoplasmic bias of charged loops by showing the EOT ensemble averaged loop positions with respect to the membrane for all five loops in each mutant, expressed as the fraction of EOT configurations with a given loop in the cytoplasm. The increased cytoplasmic localization exhibited by the L2 and L5 loops in T28R and nR111 respectively highlights the effect of adding positive charges. Similarly, the reduced cytoplasmic retention of L2 and L4 in nR111 relative to EmrE is due to the removal of charges from these loops.
The second step of the proposed mechanism is presented in Figure 4C in greater detail. For each of the three mutants, the figure schematically illustrates the post-translational kinetics of two representative configurations from the EOT ensemble. Black horizontal arrows indicate how the flipping of soluble loops across the membrane leads to transitions between intermediate  configurations. The most-charged loop is again highlighted in red. Each configuration posttranslationally anneals toward a fully integrated topology as loops stochastically flip across the membrane to correct the misintegrated TMDs. The soluble loops undergo flipping at different rates, with charged loops flipping more slowly. The slowest-flipping loop thus determines the fully integrated topology that is most kinetically-accessible from a given configuration in the EOT ensemble, because the other loops will more rapidly flip. The EmrE examples ( Figure 4C, left) demonstrate how the equal distribution of L2 positions with respect to the membrane in the EOT ensemble leads to two different fully integrated topologies, giving rise to the dual-topology behavior.
The T28R examples ( Figure 4C, right) show that increasing the charge of L2, thereby biasing its cytoplasmic localization in the EOT ensemble ( Figure 4B), leads to a dominant N peri /C peri topology. Finally, the nR111 examples illustrate how C-terminal charges can have a long-range topogenic effect by biasing the fully integrated proteins towards a dominant N cyto /C cyto topology.
The proposed mechanism predicts that the final topological distribution of each EmrE mutant is determined by both the distribution of configurations in the EOT ensemble and the available posttranslational kinetic pathways that lead to fully integrated protein topologies. In the following, we provide detailed analysis of the simulated CG trajectories to support these elements of the proposed mechanism.

Charge mutations bias loop positions in the EOT ensemble
To investigate the first step of the proposed mechanism ( Figure 4A and Figure 4B), we examine the degree to which changing the number of charges in a given soluble loop shifts the position of that loop in the EOT ensemble. Figure 5   the cytoplasm. In each case, the addition of positive charge to a soluble loop leads to an increase in its degree of cytosolic localization, as is consistent with previous simulations and experiments of single-spanning TMDs (Goder et al., 2004;Zhang and Miller, 2012a). These results support the first step of the proposed mechanism and show that interactions of the nascent protein with its translocon/ribosome/ membrane environment lead to significant shifts in the EOT ensemble of configurations.

Rate of loop-flipping depends on charge mutations
To investigate the second step of the proposed mechanism ( Figure 4C), we examine the molecular processes by which configurations in the EOT ensemble reach a fully integrated topology. The energetic cost for flipping a hydrophilic loop across the hydrophobic membrane increases with the hydrophilicity of the loop; the loop-flipping frequency observed during simulations is thus expected to decrease for loops with larger numbers of charges. Figure 6 shows the computed loop-flipping frequencies for each loop in the EmrE mutants. In this analysis, loop-flipping events are determined by comparing loop positions with respect to the membrane in 1-ms time intervals, as described in the Calculation of loop-flipping frequency section of the 'Materials and methods'. The number of charges in each loop are marked with dots. As expected, highly-charged loops exhibit a decreased loop-flipping frequency. The figure also reveals that the terminal L1 and L5 loops have a lower loop-flipping frequency than the intermediate L2-4 loops. Loop-flipping events are not found to be strongly concerted, as two or more loops were observed to flip concurrently in only 0.015% of all 1-ms time intervals in which at least one loop-flipping event was observed. However, the loop-flipping frequency of a given loop is impacted by the orientation of its neighboring TMDs; on average, a loop with a single misintegrated neighboring TMD flips 1.5 times more frequently than the same loop with zero misintegrated neighboring TMDs, while a loop with two misintegrated TMDs flips 3.7 times more frequently than the same loop with zero misintegrated neighboring TMDs. Additional details on these calculations are presented in the Calculation of loop-flipping frequency section of the 'Materials and methods'.
The most important feature in Figure 6 is the identification of a slowest-flipping loop for each mutant (red boxes). The slowest-flipping loop typically exhibits a loop-flipping frequency that is orders of magnitude slower than the other loops, although in four cases (K3, L85R, nEmrE, and nT28R 1 ), two loops have slow loop-flipping frequencies that are within a factor of two. The variation in loop-flipping frequencies explains the difference in kinetics in Figure 3A, such that mutants with multiple slowflipping loops (K3) reach a fully integrated topology more slowly than mutants with a single slowestflipping loop (EmrE, T28R) or mutants largely devoid of charge (nK3). These results confirm that the loop-flipping frequency of a given loop depends strongly on the number of charges on that loop, indicating that charge mutations can impact the determination of the slowest-flipping loop.

Position of slowest-flipping loop in EOT ensemble determines fully integrated topology
We now investigate the degree to which the position of the slowest-flipping loop in the EOT ensemble correlates with its position in the fully integrated topology. For the simulated CG trajectories, Figure 7A demonstrates strong correlation (R 2 = 0.85) between the position of the slowest-flipping loop in the EOT ensemble and the corresponding position in the fully integrated  Figure 7A indicate that the complexity of post-translational kinetics can be distilled to a much simpler picture in which the key parameter is the location of the slowest-flipping loop at the end of ribosomal translation. The fully integrated topology is almost completely determined at the time at which ribosomal translation ends, despite the fact that the kinetics of loop-flipping takes hundreds of seconds to complete.
In Figure 7A, the K3 and L85R mutants deviate most significantly from the plotted correlation between the EOT ensemble and the final topology; as seen in Figure 6, these two mutants exhibit a pair of slow loop-flipping frequencies rather than a single, well-separated slowest loop-flipping frequency. For a more detailed analysis of these special cases that involve a pair of slow loop-flipping frequencies, we direct the reader to the Alternative definition of the slowest-flipping loop position for mutants with two slow-flipping loops section of the 'Materials and methods' and the corresponding results in Figure 7-figure supplement 1. However, we emphasize that the close agreement between the results in Figure 7A and The results in Figures 2, 3 neglect the possibility that misintegrated proteins may be degraded prior to reaching a fully integrated topology. Several bacterial proteases that degrade membrane proteins have been characterized which provides insight into the approximate degradation timescale (Dalbey et al., 2011). For example, FtsH is a membrane-embedded protease that degrades misassembled IMPs over timescales ranging from 2 min (for SecY) to 15 min (for YccA) in Escherichia coli (Ito and Akiyama, 2005), and even longer timescales for degradation have been observed in eukaryotic systems (Buck and Skach, 2005;Feige and Hendershot, 2013); very recently, FtsH was also shown to degrade undimerized EmrE on a sub-30 min timescale (Woodall et al., 2015). In comparison to the simulated trajectories (Figure 3-figure supplement 2), these degradation timescales are relatively slow, supporting the assumption that IMP integration and post-translational annealing reaches completion prior to significant degradation. Nonetheless, if degradation of EmrE occurs on faster timescales, it could potentially impact the reported topologies from the simulations. To investigate this effect, Figure 7B shows the relative fraction of N cyto /C cyto and N peri /C peri protein topologies for the CG trajectories that have reached fully integrated topologies as a function of time, excluding all trajectories for which at least one TMD is misintegrated. If it is assumed that fully integrated proteins are resistant to degradation (or that rapid dimerization following the full integration of EmrE protects the proteins from degradation [Woodall et al., 2015]), then each point in Figure 7B represents the distribution of topologies that would be observed if all misfolded proteins were uniformly degraded at the corresponding time. Data are shown for degradation times ranging from 5 s to 100 s following the end of translation; the dashed lines indicate the overall fraction of N cyto /C cyto topologies for each mutant after all CG trajectories reach fully integrated topologies, corresponding to the results from Figure 2. Figure 7B shows that the distribution of topologies is nearly constant with respect to degradation time, preserving the correlation between the position of the slowest-flipping loop at the end of translation and in the fully integrated topology. These results suggest that the predicted distribution of protein topologies from simulation is relatively robust with respect to possible degradation processes that occur on the same timescale as post-translational annealing.

Discussion
The results of our CG simulations support a mechanism for multispanning IMP topogenesis in which an ensemble of configurations with misintegrated TMDs undergo kinetically-controlled TMD reorientations to reach a fully integrated topology. Introducing charge mutations to the soluble loops of a multispanning IMP leads to shifts in both the distribution of loop positions in the EOT ensemble ( Figure 5) and changes in the kinetics of loop-flipping events that lead to the fully integrated topologies ( Figure 6). The combination of these effects is found to govern the observed distribution of fully integrated topologies in the CG simulations (Figure 7). This proposed mechanism explains the experimental finding that adding charges to any of the soluble loops of EmrE, even a loop near the Cterminus, affects the observed topology (Seppälä et al., 2010). The proposed mechanism also agrees with recent experiments that find EmrE to undergo partial topological rearrangements that correspond to the loop-flipping events described here (Woodall et al., 2015). Furthermore, the mechanism can explain deviations from the positive-inside rule if the position of the slowest-flipping loop in the EOT ensemble enforces a topology in which the majority of the positive charges are in periplasmic loops, as seen for the K3 mutant (Figure 2).
In addition to explaining existing experimental data for the topogenesis of the EmrE mutants, the proposed mechanism yields a number of new and experimentally testable predictions. A simple overarching prediction of the mechanism is that changes to the ribosome or translocon that affect the EOT ensemble may lead to significant shifts in topology. Figure 8 shows the average position of the slowest-flipping loop in the EOT ensemble after slowing translation from 24 residues/s to 6 residues/s to model the addition of cycloheximide (Goder and Spiess, 2003), removing the periplasmic positive charge from the channel, or removing the cytoplasmic negative charge from the channel (Goder et al., 2004). For single-spanning IMPs, the rate of translation and the removal of translocon charges were previously found to significantly affect TMD orientation in both simulations and experiments (Goder and Spiess, 2003;Goder et al., 2004;Zhang and Miller, 2012a). We find that slowing translation has a minimal effect on the mutants studied here, and Figure 8-figure supplement 1 confirms this finding for other translation rates. Given that these EOT loop positions are unchanged, and given that the post-translational dynamics is unaffected by the ribosomal translation rate, these results suggest that changing translation rate will not affect the final distribution of fully integrated topologies. In contrast, Figure 8 shows that removing either the cytoplasmic or periplasmic charge on the translocon significantly decreases the cytoplasmic retention of the slowest-flipping loops by increasing the periplasmic accessibility of highly charged loops. Most notably, it is found that for two of the EmrE mutants (indicated in dashed boxes) the translocon charge mutations dramatically shift the slowest-flipping loop position in the EOT ensemble from being primarily cytosolic to being primarily periplasmic, suggesting that the dominant topology for these EmrE mutants will be similarly reversed by the translocon charge mutations. These changes in IMP topology due to channel mutations are experimentally testable predictions of the proposed mechanism. A notable aspect of the CG model is the absence of asymmetric features in the membrane or environment that favor either the N cyto /C cyto or N peri /C peri topology under equilibrium thermodynamic conditions, such as the electrostatic potential across the inner membrane of E. coli or an asymmetric distribution of charged lipids (Bogdanov et al., 2008;Vitrac et al., 2013;Bogdanov et al., 2014). In the CG model, neglecting interactions with the Sec translocon, both the N cyto /C cyto or N peri /C peri topologies are energetically equivalent and would be observed with equal probability if simulations were continued for an infinitely long time. The prediction of a dominant topology by the CG model arises from the initial distribution of configurations in the EOT ensemble (due to interactions of the nascent protein with the translocon complex) and from the available kinetic pathways that allow the configurations in the EOT ensemble to reach fully integrated topologies. We note that the changes in topology predicted in Figure 8 would be unexpected from a model in which the dominant topology of an IMP is determined by thermodynamic equilibration, since the equilibrium distribution of the protein topologies would be unaffected by transient interactions with the translocon or ribosome during initial membrane integration.
We further note that direct comparison of the experimental and simulation timescales for the kinetic annealing of misintegrated TMDs is limited by both the accuracy of the CG model as well as the neglect of external chaperone proteins, such as TRAP, TRAM, or other members of the Sec complex (Sommer et al., 2013;Zhu et al., 2013;Aviram and Schuldiner, 2014;Jung et al., 2014), that may catalyze loopflipping. However, since the topological predictions of the proposed mechanism are primarily sensitive to which soluble loop flips most slowly-as opposed to the actual timescale of loop-flipping-we expect that the presented conclusions are relatively robust with respect to these effects. This robustness is directly illustrated in Figure 7B, which shows that the relative fraction of proteins that reach each fully integrated topology is nearly constant as a function of time.

Conclusions
In this work, we utilize a recently developed CG computational approach (Zhang and Miller, 2012a) that enables the direct simulation of Sec-facilitated membrane integration of proteins on biological timescales to investigate the topogenesis of the dual-topology EmrE protein and its mutants. In addition to demonstrating excellent agreement with the experimentally observed topologies of EmrE and its mutants (Seppälä et al., 2010), the simulations reveal a novel mechanism for the regulation of topogenesis in multi-spanning membrane proteins, in which initially misintegrated configurations of the proteins undergo post-translational annealing to reach final, fully integrated topologies. The energetic barriers associated with this post-translational annealing process enforce kinetic pathways that dictate the topology of the fully integrated proteins. The inclusion of charged residues on the soluble loops of the IMP can lead to significant changes in the distribution of fully integrated topologies by both altering the ensemble of protein configurations at the end of ribosomal translation, as well as by altering the available kinetic pathways that lead to fully integrated topologies.
This analysis leads to a number of experimentally testable predictions regarding IMP topogenesis. In particular, the results of Figure 8 predict that the mutation of charged residues near the cytoplasmic or periplasmic openings of the translocon channel will lead to significant shifts in the observed topology of several EmrE mutants. More generally, we note that any effect of channel mutations on the fully integrated IMP topology would indicate that kinetic effects during translation influence topogenesis, as suggested by the proposed mechanism. Additionally, we predict that the introduction of IMP mutations that significantly alter the EOT ensemble with respect to the cytosolic localization of the slowest-flipping soluble loop, either by introducing charge mutations or by changing TMD hydrophobicity, will influence the multispanning IMP topology.
Although the current manuscript primarily focuses on the mechanism of topogenesis in the dualtopology EmrE protein, the mechanism and simulation analysis presented here has broader implications for topogenesis in other multispanning IMPs. For EmrE and its mutants, we find that a significant fraction of the IMP configurations are misintegrated upon completion of ribosomal translation and undergo subsequent post-translational annealing to reach fully integrated topologies. In contrast, a CB mutant exhibits an essentially fully integrated ensemble of configurations at the time that ribosomal translation completes. For other IMPs, a combination of these behaviors may well be expected (Lu et al., 2000;Lambert and Prange, 2001;Kanki et al., 2002;Skach, 2009;Öjemalm et al., 2012;Bowie, 2013;Virkki et al., 2014), with some fraction of the nascent protein configurations reaching fully integrated topologies at the completion of ribosomal translation and some fraction reaching misintegrated configurations that subsequently undergo post-translational annealing. Indeed, the importance of chaperone proteins such as YidC or Sec62 that posttranslationally rescue misintegrated TMDs (Sommer et al., 2013;Zhu et al., 2013;Aviram and Schuldiner, 2014;Jung et al., 2014) may be connected to this necessary process of annealing initially misintegrated IMP configurations towards fully integrated topologies. The emerging understanding of the role of the Sec translocon in regulating IMP topogenesis, as well as advances in the methodologies for probing and modifying interactions between the nascent protein and the translocon complex, hold intriguing possibilities for the prediction and control of protein folding in cellular environments.

Materials and methods
Calculation of loop-flipping frequency To examine the effect of neighboring TMDs on the loop-flipping frequency, we separately calculate the loop-flipping frequency of each loop for configurations in which zero, one, or two of the neighboring TMDs is misintegrated (discussed in the Rate of loop-flipping depends on charge mutations section of the 'Results').

Experimental interpretation of EmrE topology
In Seppälä et al. (2010), the dominant topologies of EmrE mutants are determined by measuring the growth of E. coli cells in the presence of ethidium bromide (EtBr). EtBr is toxic to E. coli, but antiparallel EmrE dimers, in which the two monomers forming the dimer have opposite topologies, confer drug resistance. EmrE dimerization can also be suppressed by including an E14D mutation. The topology of an EmrE variant with the E14D mutation can thus be inferred by coexpressing the mutant with another EmrE variant of known topology, as any resulting drug resistance (and cell growth) can be attributed to the formation of antiparallel dimers. To enable a direct comparison between the topologies measured from simulations and the experimental results, we convert the experimentallymeasured cell activities from Seppälä et al. (2010) to the fraction of N cyto /C cyto topologies by assuming a linear relationship between cell growth and the population of antiparallel EmrE dimers. The fraction of N cyto /C cyto topologies is calculated as where A(N cyto ) and A(N peri ) are the experimentally-measured cell activities for cells coexpressing the EmrE(N cyto ) and EmrE(N peri ) mutants, respectively. Greater cell growth in the presence of the EmrE (N peri ) mutant, which exhibits a single dominant N peri /C peri topology, indicates that the mutant of interest exhibits a larger fraction of the opposite N cyto /C cyto topology, and vice versa for growth in the presence of the EmrE(N cyto ) mutant. Experimental values for the activities of the EmrE and nEmrE mutants are taken from Figure 2 and Figure S1 of Seppälä et al. (2010), respectively; these values are used to compute the fraction of N cyto /C cyto topologies reported in Figure 2  Alternative definition of the slowest-flipping loop position for mutants with two slow-flipping loops In Figure 7A, the average position of the slowest-flipping loop relative to the membrane in the EOT ensemble is compared with the average position of that same loop in the ensemble of fully integrated configurations at the end of the CG trajectories. Four mutants (K3, L85R, nEmrE, and nT28R 1 ), however, have two slow-flipping loops with similar loop-flipping frequencies (Figure 6), and two of these mutants (K3 and L85R) deviate most significantly in terms of the correlation in Figure 7A.
To better understand the effect of multiple slow-flipping loops on the correlation between the EOT ensemble and the final topology, the current section provides additional analysis in which a more sophisticated definition of the 'slowest-flipping loop' is employed for the four mutants that exhibit a pair of slow-flipping loops. Below, we present this alternative definition, which leads to a slightly better correlation between the EOT ensemble and the ensemble of fully integrated configurations, as plotted in Figure 7-figure supplement 1.
The alternative definition of the slowest-flipping loop for mutants with two slow-flipping loops is given by ϕ EOT and ϕ FI , which report on the average position of the two slow-flipping loops in the EOT ensemble and in the ensemble of fully integrated configurations, respectively.
The quantity ϕ FI reports on the average position of the two slow-flipping loops in the ensemble of fully integrated configurations at the end of the CG trajectories. For the L85R and nEmrE mutants, the two slow-flipping loops (L2/L4 and L1/L5, respectively) reach positions on the same side of the membrane in either fully integrated topology; for these mutants, ϕ FI is defined as the fraction of fully integrated configurations for which both slow-flipping loops are positioned in the cytoplasm. For the K3 and nT28R 1 mutants, the two slow-flipping loops (L1 and L2) reach positions on opposite sides of the membrane in either fully integrated topology; for these mutants, ϕ FI is defined as the fraction of fully integrated configurations for which L1 is positioned in the cytoplasm and L2 is positioned in the periplasm. For the nEmrE, K3, and nT28R 1 mutants, ϕ FI is equivalent to the fraction of CG trajectories that reach the fully integrated N cyto /C cyto topology, whereas for the L85R mutant, ϕ FI is equivalent to the fraction of CG trajectories that reach the fully integrated N peri /C peri topology.
The quantity ϕ EOT reports on the average position of the two slow-flipping loops in the EOT ensemble. For each mutant, ϕ EOT is defined as ϕ L85R EOT = 0:5 (2) where f (cyto) Li is the fraction of configurations in the EOT ensemble for which loop Li is in the cytoplasm. As for the previous definition of ϕ FI , this definition accounts for the fact that the two slow-flipping loops of the L85R and nEmrE mutants reach the same side of the membrane in the fully integrated topologies, while the two slow-flipping loops of the K3 and nT28R 1 mutants reach opposite sides of the membrane in the fully integrated topologies. The definition in Equation 2 additionally assumes that the post-translational annealing of misintegrated configurations in the EOT ensemble is equally rate-limited by the two slow-flipping loops. Using these alternative definitions for the position of the slowest-flipping loop (i.e., ϕ FI and ϕ EOT ), Figure 7-figure supplement 1 compares the average position of the slowest-flipping loop in the EOT ensemble to the average position of that same loop in the ensemble of fully integrated configurations. Having more carefully accounted for the effect of both slow-flipping loops in the K3, L85R, nEmrE, and nT28R 1 mutants, this figure reveals a slight improvement in the correlation (R 2 = 0.88 vs R 2 = 0.85) in comparison to the results in Figure 7A.

Robustness checks for the trajectory termination criteria
Alternative trajectory termination criteria are tested to ensure the robustness of the simulated distribution of multispanning topologies presented in Figure 2. As a first alternative, the original set of CG trajectories are extended by 50 s, and the distribution of topologies is determined from the position of the slowest-flipping loop at the end of the extended trajectories. As a second alternative, the distribution of topologies is calculated from the subset of original CG trajectories that reach fully integrated topologies in which all beads in the TMDs are at least 20σ, rather than 4.5σ, from the origin. These robustness checks are presented in Figure 2-figure supplement 1 and exhibit excellent correlation with the results obtained using the original protocol.
Additionally, Figure 7B shows results in which the CG trajectories are terminated at a range of fixed times following the end of ribosomal translation. Again, the results using this alternative trajectory termination criterion are in good agreement with the results obtained using the original protocol (indicated in dashed lines in Figure 7B).

Cotranslational integration pathways
From the ensemble of CG trajectories, it is possible to examine the pathways by which individual TMDs undergo Sec-facilitated cotranslational integration. In particular, following the definitions of Cymer et al. (2014), it is possible to characterize each cotranslational TMD integration event as corresponding to either the 'channel-sliding', 'interface-sliding', or 'in-out' pathways. Simulation snapshots illustrating the three pathways are shown in Figure 3-figure supplement 1.
Each pathway is defined in terms of the series of intermediate states that are visited by the TMD prior to membrane integration. To characterize these intermediate states, the following geometric regions are defined (see Figure 1-figure supplement 2). The channel region is defined as that for which −2σ ≤ x ≤ 2σ and −2σ ≤ y ≤ 2σ, the membrane region is defined as that for which −2σ ≤ x ≤ 2σ and y > 2σ, the ribosome region is defined as that for which −11σ ≤ x < −2σ and −8.5σ ≤ y ≤ 4.5σ, and the cytoplasm region is defined as the region outside of the ribosome for which x < −2σ. Finally, a bead is considered to overlap the LG if it is within a distance of σ to any lateral-gate bead.
We now define the four intermediate states.
Intermediate state 1 (IS1) is that for which the TMD partially enters the channel; it is defined as the set of configurations for which at least two TMD beads are in the channel region and zero TMD beads are in the membrane region. Intermediate state 2 (IS2) is that for which the TMD fully spans the membrane while in the channel; it is defined as the set of configurations for which all four TMD beads are in the channel region and the two hydrophilic beads that flank the TMD occupy opposite sides of the membrane. Intermediate state 3 (IS3) is that for which the TMD accesses the membrane interior via the LG; it is defined as the set of configurations for which at least one TMD bead occupies the membrane region, the remaining three TMD beads occupy either the channel or membrane regions, and at least one TMD bead overlaps with the LG. Intermediate state 4 (IS4) is that for which the TMD accesses the cytoplasm region without accessing the channel region; it is defined as the set of configurations for which each of the four TMD beads occupies either the ribosome, membrane, or cytoplasm regions and for which at least one of the hydrophilic beads that flank the TMD occupies the cytoplasm region.
In this analysis, cotranslational TMD integration events are defined as those for which the TMD reaches a membrane integrated configuration (for which all four beads of the TMD span the membrane region and the two hydrophilic flanking beads occupy opposite sides of the membrane) before reaching a misintegrated configuration (for which both hydrophilic flanking beads occupy the same side of the membrane and for which all TMD beads and both flanking beads lie outside of the channel and ribosome regions). Using the definitions of intermediate states, the cotranslational integration pathways are defined as follows. In the 'channel-sliding' pathway, the TMD partially enters the channel, then crosses the LG, then fully integrates into the membrane; a trajectory thus exhibits this pathway if a TMD visits IS1, IS2, and membrane integration in chronological order and without visiting any other intermediate states. In the 'interface-sliding' pathway, the TMD enters the cytoplasm through the gap between the translocon and ribosome, prior to undergoing membrane integration; a trajectory thus exhibits this pathway if a TMD visits IS4 on the way to membrane integration. In the 'in-out' pathway, the TMD fully spans the channel prior to membrane integration; a trajectory thus exhibits this pathway if a TMD visits IS3 on the way to membrane integration without visiting IS4.
At right, Figure 3-figure supplement 1 shows the relative fraction of cotranslational TMD integration events that exhibit each of these three pathways. It is clear that the dominant cotranslational integration pathway for all four TMDs in both the EmrE and nEmrE mutants is the 'channel-sliding' pathway. This same pathway was also observed in the previous study of singlespanning proteins using the CG model (Zhang and Miller, 2012a) and similar configurations were observed in long-timescale atomistic molecular dynamics simulations of the early stages of cotranslational membrane integration (Zhang and Miller, 2012b). We find that only a small number of CG trajectories exhibit the 'interface-sliding' pathway. Finally, we note that the dominant cotranslational integration pathway is likely to depend on the IMP sequence, and the 'channel-sliding' behavior may be less dominant in other IMPs with less hydrophobic TMDs.
Sðx; ϕ; ψÞ = 1 4 1 + tanh x − ϕ 0:25σ This form of the solvation energy and switching function defines the implicit bilayer as the region where ϕ x < x < ψ x and y < ψ y or y > ϕ y , where ϕ x = −2.0σ, ψ x = 2.0σ, ϕ y = −1.5σ, and ψ y = 1.5σ. The transfer free energy, g, for each bead type is approximated from the Wimley-White hydrophobicity scale which measures water-octanol transfer free energies (Wimley et al., 1996). Values of g for the different bead types are summarized in Appendix τ LG = 500 ns is the timescale for attempting LG opening/closing and ΔG tot is the change in the free energy for opening the LG. ΔG tot depends on the presence of the nascent protein beads in the channel and is defined as where M is the number of beads occupying the translocon, ΔE is the difference between the total LG/protein LJ interactions in the closed state and total LG/protein LJ interactions in the open state, ΔG empty = 16ϵ is the free energy cost for opening the LG when there is no nascent protein in the channel, and χ empty is the fraction of the channel that is empty for a given timestep. The first term promotes LG opening when hydrophobic beads enter the channel, the second term prevents the LG from closing when occluded by a nascent protein, and the third term promotes LG closing once the nascent protein exits the channel. Additional details on the development and numerical testing of the CG model are provided in Zhang and Miller (2012a).

Wall potential
A modification that appears in the current implementation of the CG model is a wall potential that prevents the nascent protein from returning to the translocon once it completes translation and diffuses a given distance away from the channel. The potential has the form U wall ðyÞ = & 1 2K wall ðy − 10σÞ 2 y < 10σ 0 y ≥ 10σ ; where the spring constant, K wall , is set to 10 ϵ/σ 2 . The potential is only added to the system when all beads of the nascent protein have y-positions greater than 12σ. Inclusion of the wall potential was found to avoid artifacts associated with the nascent protein interacting with the translocon long after exiting the channel. These artifacts were expected to be accentuated in the CG model due to its reduced dimensionality; nonetheless, the results are qualitatively unchanged if the wall potential is not included.