Proton‐Detected Solid‐State NMR of the Cell‐Free Synthesized α‐Helical Transmembrane Protein NS4B from Hepatitis C Virus

Abstract Proton‐detected 100 kHz magic‐angle‐spinning (MAS) solid‐state NMR is an emerging analysis method for proteins with only hundreds of microgram quantities, and thus allows structural investigation of eukaryotic membrane proteins. This is the case for the cell‐free synthesized hepatitis C virus (HCV) nonstructural membrane protein 4B (NS4B). We demonstrate NS4B sample optimization using fast reconstitution schemes that enable lipid‐environment screening directly by NMR. 2D spectra and relaxation properties guide the choice of the best sample preparation to record 2D 1H‐detected 1H,15N and 3D 1H,13C,15N correlation experiments with linewidths and sensitivity suitable to initiate sequential assignments. Amino‐acid‐selectively labeled NS4B can be readily obtained using cell‐free synthesis, opening the door to combinatorial labeling approaches which should enable structural studies.


Introduction
Membrane proteins in lipids can be studied by magic-anglespinning(MAS) solid-state NMR [1] which, different from solution NMR, does not face overall tumbling limitations. Thus the molecular weighto fb iomolecular complexes that can be addressedi ss olely limited by spectral resolution and sensitivity. Still, analysis faced am ajor obstacle:c omplexm embrane proteins often need dedicated expression systems to functionally fold, but which may deliver low yields only and are thus often incompatible with the high sample quantities (> 10 mg) neededf or classical 13 C-detected NMR approaches. This restricted NMR studies mostly to well-expressed model mem-brane proteins, such as rhodopsin, [1a, c, j, 2] at runcated variant of influenzaAM2 channel, [1e, f] KcsA [1d, k] or BmrA. [3] Applications of the more sensitive 1 H-detected MAS NMR approach to membrane proteins emergedu sing extensively deuterated proteins and 10-20 kHz MAS frequency. [4] Only recently,M AS frequencies around1 00 kHz have allowed to study fully protonb ackexchanged or even fully protonated proteins. [5] The increase in MAS frequency resulted in ac oncomitantd ecrease in sample amount,t ob elow the milligram, [6] which presents ar eduction in protein of about af actor 100 comparedt o 13 Cd etection, that is roughly compensated by the sensitivity gain inherent to 1 H-detection techniques. [7] The first 60 to 100 kHz MAS schemes have been applied to well-established membrane protein systems, [8] such as proteorhodopsin, [9] outer membrane beta-barrels, [1m, 10,11] VDAC, [11,12] at runcatedv ariant of influenza AM 2c hannel, [13] KcsA [14] or BamA. [15] Sample preparation of these proteins mainly followed previously established protocols using 13 Cd etection, and included formation of 2D crystals [1m, 12, 13] or reconstitution into liposomes at low lipid-to-protein ratio (LPR, w/w)bydialysis. [9,15] Such conditions are however not generally applicable to membrane proteins, and achieving optimal membrane reconstitution remains ac ritical step.
With sample amountsd ecreasing below the milligram, the use of eukaryotic protein expression systems becomes feasible for NMR sample preparation.I nt his context,t he wheat-germ (WG) cell-free protein synthesis (CFPS) is ap romising approach, since it provides one of the highest yields amongste ukaryotic CFPS systems, reaching routinely milligram amounts. [16] WG-CFPS thus presents an efficient alternative to cellular eukaryotic expression systems, and has the advantage that various NMR isotope labeling schemes,i ncluding amino-acid selective labeling, can be easily implemented. [16d, 17] Importantly,a lso deuteration in combination with complete amide protonation Proton-detected 100 kHzm agic-angle-spinning (MAS) solidstate NMR is an emerging analysism ethod for proteins with only hundreds of microgram quantities, and thus allows structural investigation of eukaryotic membrane proteins. This is the case for the cell-free synthesized hepatitis Cv irus (HCV) nonstructural membrane protein 4B (NS4B). We demonstrate NS4B sample optimization using fast reconstitution schemes that enablel ipid-environments creening directly by NMR. 2D spectra and relaxation properties guide the choice of the best sample preparation to record 2D 1 H-detected 1 H, 15 Na nd 3D 1 H, 13 C, 15 Nc orrelation experiments with linewidths and sensitivity suitable to initiate sequential assignments. Amino-acid-selectively labeled NS4B can be readily obtained using cell-free synthesis, opening the door to combinatoriall abeling approaches which should enable structuralstudies.
can be achieved directly during synthesis, [18] avoiding ad enaturationa nd refolding step, which can compromise the native fold of amembrane protein.
We here investigated the nonstructural protein 4B (NS4B) of the hepatitis Cv irus (HCV). Around 70 million people are chronically infectedw ith HCV and have ah igh risk to develop severe liver disease, including hepatocellular carcinoma. NS4B is at ransmembrane protein that is essential for HCV genome replication and virion assembly. [19,20] It has as equence length of 261 amino acid (aa) residues (apparentm olecular weight 27 kDa) and is an oligomeric a-helical transmembrane protein constituted of three subdomains. [21] The central subdomain containsf our predicted transmembrane segments. [22] The Nand C-terminal subdomains each comprise two putative a-helices, presumably lying on the membrane surface. Currents tructural information is limitedt ot he two isolated amphipathic helices located at the Nterminus, AH1 (aa 4-32, PDB ID:2 LVG) and AH2 (aa 42-66, 2JXF) as well as to the C-terminal helix H2 (aa 229-253, 2KDR) which were all investigated by solutionstate NMR on synthetic peptides representing the described helices. [22,23] Overall, structurala nd topological information on full-length NS4B remains however sparse. [24] As NS4B is difficult to express in large quantities using conventional systems such as Escherichia coli,w ee stablished WG-CFPS for NS4B in ad etergent-solubilized form. [25] The protein can then be reconstituted into liposomes using Bio-Bead-enhanced dialysis. [26] This is however at ime-consuming step and simplifying this process is thus key to enable further sample optimization.
Here, we showh ow lipid reconstitution of NS4B can be optimized thanks to fast lipid-insertion schemes combined with direct screening using 2D 1 H-detected MAS-NMR spectra.W e show that the achieved line-narrowing results from ad ecrease in inhomogeneous rather than homogeneous linewidth. Under the optimized conditions, 2D and 3D 1 H-detected correlation spectra can be recorded, both on fully and selectively labeled NS4B, initiating the crucial step of NMR sequential backbone assignments.

Results and Discussion
Cyclodextrin-mediated reconstitution yields NS4B proteoliposomes Previously,w eh ave shown that NS4B can be expressed in a soluble form using WG-CFPS [25a] and that furtherr econstitution into liposomes allowed to record quite well-resolved 13 C-detected solid-stateN MR spectra. [25b] Resolution in spectra using the more sensitive 1 Hd etection remained however limited. [25b] Efficient sample optimization was hamperedb yt he lengthy dialysis step for lipid reconstitution and complete detergentr emoval. [3a] In previousw ork, we adapted af asterm ethod consisting of ac ombined detergent removal using cyclodextrin, as initially proposed by DeGripe tal., [27a] and proteoliposomes eparation on as ucrose gradient, [27b, c] which speeded up the procedure by an order of magnitude. As in this approach parameters have to be fine-tuned to avoid protein loss, [27b] we here use as implified approach( Figure 1B)c onsisting of graduala ddition of cyclodextrin to the detergent solubilized protein and lipids, followed by gradient centrifugation. This separation into two steps is indeed easier to handle and minimizes protein loss.
In order to assess the correct amount of cyclodextrin (CD) for protein reconstitution,w efirst determined the minimal amount of CD necessary to bind and remove detergent molecules from the n-dodecyl b-d-maltoside (DDM)-solubilized NS4B micelle complexes in the absence of lipids (Figure S1 a,c in the Supporting Information), [27b, c] as outlined in Figure 1A. Full precipitation of the protein wasused as aread out for successfuld etergent removal. We tested two different cyclodextrins, a-cyclodextrin (a-CD) and methyl-b-cyclodextrin (mb-CD), and found that at otal of 110nmolo fa-CD fully precipitated 0.25 nmolo fN S4B 0.1 %D DM ( Figure S1   1 %D DM buffer, A) in absence,and B) in presenceo fd etergentsolubilized lipids. "A", fraction of unbound DDM in ab uffer;"B" fractionof DDM bound to protein; "C" and "D" DDM and Triton X-100 respectively used for lipid solubilization. Quantities are given in blackand red for detergent andcyclodextrin, respectively(for raw data seeFigureS1a nd for calculations see Ta ble S1). The detergent-to-lipid ratio in fractions "C" and "D" is 10, and the lipid-to-protein ratio is 2.
In as econd step, we estimated the amounto fC Dn ecessary to remove additional DDM or Triton X-100 detergents used for lipid solubilization. Experimental details of the precipitation assay in the presence of solubilized lipids or additional Triton X-100 are summarized in Figure S1 ba nd c, as well as in the Supporting resultsa nd Table S1. As ar esult ( Figure 1B), 1950 nmol of a-CD is needed to reconstitute 1nmol of NS4B into DDM-solubilized lipids,w hile 1730 nmol of mb-CD is required for reconstitution into Triton X-100-solubilized lipids. From these experiments, the number of NS4B-boundD DM molecules was estimated as 160 (Supporting results and Ta ble S1), in agreement with previously published data, where micelles with model membrane proteins contain around 150-300 DDM monomers. [28] Further,w ed etermine that a-CD binds at an approximate ratio of 1.6 to DDM, mb-CD at 1.1 to DDM and mb-CD at 1.5 to Triton X-100 (Supporting results and Ta ble S1), which correlates with previouse xperiments showing that cyclodextrin molecules do not necessarily bind detergents in 1:1ratio. [27a, 29] We then preparedp roteoliposomes using the above-described protocol and cyclodextrin amounts. The reconstituted protein was analyzed on as ucrose gradient, andavisible opaque band in the gradient was detected, whichc orrelated with the expected protein bands of NS4B on an SDS PAGE gel ( Figure S1 d, e). No furthers ignals were detected by SDS PAGE, indicating that virtually all protein formed proteoliposomes. The protocol established here allowed to gain af actor of 10 in time, which also makes the reconstitution step more robust with respect to protein degradation.
After this biochemical assessment, we directly optimizedt he amount and type of cyclodextrin by NMR. To enhancereadability of the spectra, we used as electively Gly-and Tyr-labeled NS4B sample (dGY NS4B). We reconstituted NS4B into PC/Chol (phosphatidylcholine/cholesterol) at LPR 2u sing different conditions, as detailedi nF igure S2 and Table S2. As ar esult, the 2.5-fold excesso fm b-CD over the minimal amount determined,c ombinedw ith PC/Chol solubilized in Triton X-100, yielded the spectra with bestr esolution and signal-to-noise ratio (SNR), and this condition was thus set as the sample preparation standard. Finally,2 D 1 H, 15 Nc orrelation spectra (Figure S3) confirmed that NS4B lipid reconstitution using cyclodextrin indeedy ielded very similarN MR spectra as the more time-consumingr econstitutionu sing Bio-Bead-enhanced dialysis, [25b] and could thus be used as starting point for further optimizations.

Proton linewidths depend on the lipid environment
Next, we compareds pectra recorded at different LPRs to identify optimal conditions. For maximal protein NMR signal, the LPR, in principle, should be as lowa sp ossible. On the other hand, as ufficient amount of lipids is important to ensure that the protein is well folded. [30] We therefore compared dGY NS4B reconstituted into PC/Choll ipids at variousL PRs. The resulting 2D 1 H, 15 Nc orrelation spectra (Figure 2) revealt hat the spectral linewidth improves by about af actor of two from LPR 0.25 to LPR 4( Figure S4, Ta bleS3). Increase of LPR requires an increase in measurement time, which becomes prohibitive at LPRs exceedinga bout 2. We therefore chose LPR 2a st he best compromise betweenanarrow spectrall inewidth and high SNR (Figures 2and S4, Ta ble S3).
Not only LPR but also lipid membrane thickness, hydrophobic mismatch, the presence of cholesterol and the bilayer fluidity,a mong others, can influence protein folding and the dynamic properties of the protein in the membrane. [31] We thus investigated the dependence of NMR spectral parameters on selected lipids. For this, [ 2 H, 15 N, 13 C] (dUL) NS4B as well as [ 1 H, 15 N, 13 C] (UL) NS4B was reconstituted at LPR2into different lipids as detailed in Figure3.
To obtain ap roxy for the spectralq uality of NS4B in the differentl ipid environments, we determined the average 1 H FWHM and SNR for 9-10 isolated peaks (Table 1). Acomparison (Figure 3and Table 1) of linewidths showedthat NS4B reconstituted into PC, both in the presence and absence of 30 %c holesterol, as well as NS4B reconstituted into DMPC lipids, gave spectra with an average spectral linewidth between 100-120 Hz for isolated peaks. In contrast, NS4B reconstituted into DPPC lipids showedo nly poorly resolved spectra.
In solid-state NMR, the total linewidth (D total )h as ah omogenous (D homo )a nd inhomogeneous (D inhomo )c ontribution (D total = D homo + D inhomo ). The homogeneous part represents coherent effects (D coherent )c aused by incomplete averaging of the dipolar interaction, and incoherent effects (D incoherent )driven by molecular dynamics, andt hus D homo = D coherent + D incoherent .T he inhomogeneous part reflects sample and field inhomogeneity. [32] To determine the homogenous contribution (D homo = R 2 '/p)t ot he total linewidth we measured bulk 1 HR 2 ' relaxation-rate constants of amide groups using aH ahn-echo experiment. [33] As D coherent dependsm ainly on the geometry of the protons pin system, [32] it should be similar in comparable secondary structure elements. In rigid, a-helicalp arts of deuterated (and HN back-exchanged) proteins, typical values of D homo are around 30 Hz at 100 kHz MAS in the absence of dynamics (D incoherent ' 0). [34] For dUL NS4B, we found values around 60-80 Hz (Table 1) suggesting that D incoherent is about3 0-50 Hz, which in-dicatest hat the linewidth may be influenced by dynamics. This value derived for D incoherent is similar to the measured 1 HR 11 transverse relaxation rate constant( measured at as pinlock field of 13 kHz and tabulated in Table 1a st he rate constant divided by p and thus representing alinewidth). D total is broadest for the protonated systems, but the differences are small, which might be due to the significant experimental error bars. We also measured 15 N R 11 and R 2 ' rate constants (Table 1), which are, as expected, much lower than for protons. Interestingly,acomparison of D homo of DUL NS4B showedo nly small differences fort he differentl ipid environments, including in the badly resolved spectra of NS4Bi nD PPC lipids (Table 1). Therefore, the difference in total linewidth betweenN S4B in DPPC ( Figure 3F)a nd in the other lipids ( Figure 3A,D ,E )m ust be predominantly due to inhomogeneous line broadening and mightb ec orrelatedt ot he different phase-transitiont emperatures of the various lipids (Table 1, see also discussion below).
To assess whether the observed linewidth is related to the lipid phase-transition temperature (T m ), we recorded spectra of NS4B in differentl ipid environments and at different temperatures ( Figure S5). While for NS4B in PC liposomes, which show a T m of À6 8C, [35] the temperature dependence seems only weak ( Figure S5 A,C), for DMPC, withaT m of 24 8C, the spectral resolution clearly decreases from 21 to À6 8C ( FigureS5B,D ). This suggests that spectralr esolution is indeed influenced by the experimental temperature relative to T m .T his is similar to previousf indings for the b-barrel membrane protein OmpX in lipid nanodiscs. [10b] Ac omparison of D homo and also the 1 H R 11 / p rate constantsi ndicate only very little difference between the two temperatures, independento ft he lipid environment (Table S4). Thus, in conclusion, the difference in spectral resolution cannotb ee xplained by dynamics-related homogenous line broadening alone (Table S4), but inhomogeneous line broadening below the lipid phase transition temperature has an important contribution as well.
We also recorded the spectra of NS4B prepared in the presence of ATP. NS4B is able to hydrolyze ATP, [36] which might be essential for NS4B function in the HCV life cycle. [37] We thus speculated that the addition of ATPd uring the reconstitution step could affect NS4B foldinga nd improve sample quality.    15 N" R 11 /p", 15 Nb ulk relaxation-rate constants calculatedb yt aking the inverse of T 2 ' and T 11 times that weref itted from intensity decaycurves ( Figure S7). Chemical shifts and linewidths in 1 H, 15 Nc orrelation spectrum of NS4B-ATP sample were however comparable to the sample prepared in the absence of ATP( Figure 3B). At the same time, approximately 20 %i mprovement of SNR as ad irect or indirect effect of ATPw as observed, which resulted in selection of this sample for 3D experiments(see below).
3D experiments on fully and selectively labeled NS4B for sequential backbone assignments As et of 1 H-detected 3D experiments at 110kHz MAS was acquired to evaluate the suitability of the dUL NS4B sample for sequential backbonea ssignment, ac rucial step for further structuralN MR analysiso ft he protein. Three spectra, hCANH, hCONH and hCAcoNH, [6a, 34, 38] werer ecorded at 110kHz MAS and an hCBcaNH spectrum [34,39] was acquired at 60 kHz MAS frequency in a1 .3 mm rotor (Table S6). Out of 248 expectedr esonances,p rolines and flexible C-terminal tag not counted, approximately 190, 155 and 100 resonances were picked by the automatic peak-pickingr outine of the CCPN softwarep ackage in the hCANH, hCONH and hCA-coNH spectra,r espectively.O ut of the 27 expected Gly residues, if nine residues in the flexible tag are neglected,2 2r esonances in the Gly spectral region were visible in the hCANH spectrum. Although 3D spectra still show signal overlap, more than 60 peaks in the hCANH spectrum could be connected to their counterparts in the hCONH spectrum, ass hown fora number of examples in Figure S6.
To reduce the significant signal overlap in central regions of the spectra,t he recording of 4D spectra [40] would be useful. For SNR reasonsw ei nstead turned to selectively labeled samples to decongest the spectra and obtain anchor points for the sequential backbone assignments. We prepared ad euterated Gly,V al and Leu selectively labeled NS4B sample (dGVL NS4B). Three-dimensional hCANHa nd hCONH spectra were recorded and their 13 Cp rojections are shown on the 1 H, 15 Nd UL NS4B planes in Figure 4B and C, respectively.A lthough, in principle, all labeled residues in the sequence will contribute resonances in the hCANH spectrum, for the hCONH spectrum only pairs of either Gly,V al and Leu will contributet oaresonance signal, as both the C'(iÀ1) and the N(i)h ave to be isotopically labeled.
The automated peak picking selected 44 peaks in the hCANHs pectrum, out of 74 expectedr esonances. Eleven resonances were picked in an hCONH spectrumo ut of expected 24 intraresidue correlations.E ight out of 11 correlations in the dGVL NS4B hCONH spectrum could be connected to residues in the hCANH dUL NS4B spectrum. Out of those,t wo Gly and one Leu could be assigned, namely L123, G125 and G238.O ne signal in the hCONH spectrum was assigned to ad ouble Gly motif, which can be found once in the NS4B sequence and five times in the twin strep tag sequence. It is likely that this single GG pair belongst oN S4Br ather than the tag, which is presumably flexible and therefore invisible in aC P-based experiment.
Combining information from the 3D hCANH, hCONH, hCA-coNH and hCBcaNH spectra [34] of uniformly and selectively labeled samples, we were able to identify two amino-acid stretches, comprising residues from Val119 to Gly125 (VV 120 SGLVG) and from Leu237 to Thr241 (LGSL 240 T). Corresponding strip plots are shown in Figure 5. Dd Ca ÀDd Cb secondary chemical-shiftd ifferences of the amino-acid stretch VV 120 SGLVg ave positive values ( Figure 5B), suggesting a-helical secondary structure, in agreementw ith previously proposed topological models [22,36] (Figure 5C). On the other hand, within the amino-acid stretch LGSL 240 T, ap ositive Dd Ca ÀDd Cb secondary chemical-shift difference for N-terminal leucine is observed, whereas Ser,L eu and Thr show values close to 0  Figure 5B), suggesting no defined secondary structure for that part. This is in conflict with at opological model based on as olution-state NMRa nalysiso fa ni solated peptide, which suggested full a-helicalc haracter for the entire region. [22, 23b] However,o ur data might support am ore complex picture which was proposed for full-length NS4B,i nw hich the C-terminus containing ap utativew alker Bm otif has been described to interactw ith the loop between trans-membrane helices 2a nd 3, comprising aw alker Amotif. [36] Further spectral backbonea ssignment is currently challenged by al ack in spectral resolution and sensitivity.I ndeed, the high spectral overlap of the mainly a-helicalN S4B resonances, in combination with significant transverse relaxation and concomitant low signal to noise due to the membrane insertion at relatively high LPR, are problematic for less sensitive experiments, such as for example, the hCBcaNH experiment,a nd also experimentsw hich allow for the identification of sequential N(i)a nd N(iÀ1) connectivities. [42,43] Although such experiments are feasible in principle, their overall low transfer efficiency preventsd ata collection with present-day equipment. Currently,r ecordingo f3 Dh CANH and hCONH spectra on different selectively labeled samples is therefore the most promising approach to obtain furthers equential backbone assignments.S ophisticatedc ombinatorial labeling schemes devised in the contexto fs olution NMR studies [44] can inspire further solid-state NMR approaches for sequential backbone assignment employing ac ombination of selectively labeled samples. Still, also here SNR is an issue, and higher magnetic field strength should boost the SNR and reduce signal overlap in the near future.E ven faster magic angle spinning will increase transfer efficiencies not only in CP as used here, but also in J coupling-based experiments. For instance,c hanging from 100 kHz to 200 kHz MAS frequency will elongate the transverse t 2 ' times by roughlyafactor of 2 [32,45] and possibly overcompensatet he SNR loss by the smallers ample amount in faster and therefore smaller rotors.

Conclusion
We show fort he cell-free synthesized a-helical integral membrane protein NS4B of HCV in membranes that solid-stateN MR spectra couldb er ecorded in ar easonable amount of measurement time on am embrane protein sample reconstituted in lipids at LPR 2i na0.7 mm rotor.W es creened for optimal sample preparation using rapid lipid reconstitution via cyclodextrin addition, and assessedt he best lipid-to-protein ratio directly on 1 H-detected solid-state NMR spectra. Relaxation measurements confirmed the expected narrower homogeneous linewidth of deuterated protein compared to protonated one, and revealed that inhomogeneous line broadening was substantial, and strongly dependentont he lipid chosen, which is likely relatedt ot he lipid phase transition temperature. The evaluation of different lipids showed that reasonably resolved spectra can be reproducibly recorded, and that most conditions yield similar spectra, with the notable exception of DPPC. Three-dimensional experimentsw ere recordeda nd, in principle, providet he basis for sequential backbone assignments. Still, spectral overlap is substantial, and we showed that selectively labeled samples, straightforwardt oo btain by CFPS, can be used to identify anchorp oints for sequential assignments. . 3D correlation spectra of deuterated NS4B allow to establish sequential connectivities. A) Selecteds trip plots representing the assignment of aminoacid stretches VV 120 SGLVG and LGSL 240 Tusing 3D hCANH (green), hCONH (violet) and hCAcoNH (red) spectra recorded on dUL NS4B, and hCANH( cyan) and hCONH (magenta) of dGVL NS4Bi nP C/Chol lipidsL PR 2, at 110kHz MAS.T he dUL NS4Bh CBcaNH spectrum (orange) was acquired at 60 kHz MAS.B )Secondary chemical shift differences, Dd Ca ÀDd Cb of the two stretches from (A). C) Putative topology model of the NS4B protein adapted from Gouttenoiree tal., [22] in which NS4B is proposed to contain eight a-helices. N-and Cterminush arbor four presumablyamphipathic a-helices:A H1 (aa 4-32, PDB ID:2 LVG), AH2 (aa 42-66, 2JXF), H1 (predicted between aa 201-212)and H2 (aa 229-253, 2KDR). The centralpart (aa 70-190)contains fourp redictedt ransmembrane segments. The black boxesi ndicatethe location of the two assigned aminoa cid stretches VV 120 SGLVG and LGSL 240 T, respectively.Random-coil chemical shifts used for the calculation of secondary chemical shifts were taken from Zhange tal. [41] . Further improvement in resolution is however compulsoryt o progress to complete backbone assignment, andw ith current hardware, the use of combinatorial labeling is thus the most promisinga pproach.