Systematic localization of Gram-negative bacterial membrane proteins

The molecular architecture and function of the Gram-negative bacterial cell envelope is dictated by protein composition and localization. Proteins that localize to the inner (IM) and outer (OM) membranes of Gram-negative bacteria play critical and distinct roles in cellular physiology, however, approaches to systematically interrogate their distribution across both membranes and the soluble cell fraction are lacking. We employed multiplexed quantitative mass spectrometry to assess membrane protein localization in a proteome-wide fashion by separating IM and OM vesicles from exponentially growing E. coli K-12 cells on a sucrose density gradient. The migration patterns for >1600 proteins were classified in an unbiased manner, accurately recapitulating decades of knowledge in membrane protein localization in E. coli. For 559 proteins that are currently annotated as peripherally associated to the IM (Orfanoudaki and Economou, 2014) and display potential for dual localization to either the IM or cytoplasm, we could allocate 110 to the IM and 206 as soluble based on their fractionation patterns. In addition, we uncovered 63 cases, in which our data disagreed with current localization annotation in protein databases. For 42 of them, we were able to find supportive evidence for our localization findings in literature. We anticipate our systems-level analysis of the E. coli membrane proteome will serve as a useful reference dataset to query membrane protein localization, as well as provide a novel methodology to rapidly and systematically map membrane protein localization in more poorly characterized Gram-negative species.

patterns for >1600 proteins were classified in an unbiased manner, accurately 23 recapitulating decades of knowledge in membrane protein localization in E. coli. For 24 559 proteins that are currently annotated as peripherally associated to the IM 25 (Orfanoudaki and Economou, 2014) and display potential for dual localization to 26 either the IM or cytoplasm, we could allocate 110 to the IM and 206 as soluble based 27 on their fractionation patterns. In addition, we uncovered 63 cases, in which our data 28 disagreed with current localization annotation in protein databases. For 42 of them, 29 we were able to find supportive evidence for our localization findings in literature. We 30 anticipate our systems-level analysis of the E. coli membrane proteome will serve as 31

Introduction Systematic assignment of membrane protein localization 162
Sucrose density gradients are conventionally analyzed by immunoblotting to 163 compare the abundance of a given protein within a high or low sucrose density 164 fraction ( Figure 1A-B). To systematically analyze protein localization, we used the 165 combined averages of the high and low sucrose fractions (log2 of f08, f09 and f10 for 166 high, and log2 of f02, f03 and f04 for low) and calculated the difference between the 167 two log2 averages, which we referred to as the "sucrose gradient ratio" (Table S3). 168 High values indicate a greater abundance within higher sucrose density fractions, as 169 expected for OM proteins. The reverse is true for IM proteins, which exhibit low 170 values due to their enrichment within the low sucrose density fractions. 171

172
To assess whether our calculated sucrose gradient ratio reflected known protein 173 localization, we grouped these values based on localization annotation (modified 174 from STEPdb database, Table S2). As anticipated, most IM protein categories (i.e. 175 IM-integral, IM-peri and IMLP) displayed a low sucrose gradient ratio, whereas the 176 two OM protein categories, OMPs and OMLPs, showed high sucrose gradient ratios 177 ( Figure 2A). This striking concordance with curated annotations and our calculated 178 localization confirms the accuracy of our methodology. We chose the 90th percentile 179 of IM protein distribution (solid blue line) and the 10th percentile of OM protein 180 distribution (solid red line) as cut-offs to define IM and OM protein localization using 181 the sucrose gradient ratio ( Figure 2B), respectively. All proteins that fell between 182 these two cutoffs were classified as soluble proteins. 183 184 Interestingly, although soluble proteins were expected contaminants in our 185 experiments, they did not always behave as expected upon sucrose density gradient 186 fractionation. In general, soluble proteins localized in the cytoplasm and periplasm 187 displayed midrange sucrose gradient ratios, which is in agreement with the majority 188 of them being contaminants and non-specifically associated with either IM or OM 189 vesicles. We noted that the IM-cyto category displayed bimodal characteristics with 190 one peak being consistent with IM localization and another peak that aligned with 191 soluble proteins (Figure 2A, Table S3). This suggests that the IM-cyto category of 192 proteins referred to as "peripheral IM proteins" in STEPdb and originally described in 193 another study (Papanastasiou et al., 2013), consists of a mixture of proteins that 194 have clear preferential localizations either to the cytoplasm or to the IM. We therefore did not use this category for benchmarking our data, but rather kept it to later 196 definitively allocate the primary localization of this large group of proteins. Taken 197 together, these data show that quantitative proteomic-based analysis of sucrose 198 gradient fractionated membrane vesicles can rapidly and systematically localize 199 proteins to the IM or OM.  In order to increase the confidence of our calls for protein localization, we carried out 225 two further steps. First, we ran k-means clustering for the dataset where the two 226 replicates were treated separately. Out of this, we identified 140 (out of 1605) 227 proteins whose fractionation patterns between replicates resulted in clustering to for the two replicates. We reasoned that this is due to irreproducibility between the 230 replicates and removed these proteins from further analysis. Second, we assessed 231 the similarity of our two methods (k-means vs. sucrose gradient ratio) in assessing 232 protein localization. To do this, we used the thresholds of sucrose gradient ratio for 233 IM and OM defined in Figure 2B. We found a large overlap between the two methods 234 for all three localization categories: IM, OM and soluble ( Figure 4A). In total, we 235 identified 1368 proteins (out of 1465 possible) to agree between the two methods, 236 which we further used as our high confidence protein localization dataset. When 237 considering all 1605 proteins, the two methods agreed in 1456 proteins ( Figure S4A). 238 In general, both quantification methods for protein localization worked well, and in 239 combination provided more confident identification calls (true positive rates are 95% 240 and 48% for overlap and non-overlap sets, respectively; using STEPdb annotations 241 as true positive). The clustering method worked better than the ratio cutoffs for OM 242 proteins, but on the other hand, cluster 4 seemed to have the most inconsistent calls 243 for protein localization ( Figure S3).  including lower confidence ones). This separation is corroborated by the melting 250 temperatures of these two group of proteins. We have previously reported that IM 251 proteins are more thermostable than their cytoplasmic counterparts (Mateus et al., 252 2018). In agreement with this, IM-cyto proteins categorized here to be soluble had 253 similarly low melting temperatures as cytoplasmic proteins ( Figure S5). In contrast 254 IM-cyto proteins categorized as IM proteins had higher melting temperatures, albeit 255 not as high as integral IM proteins, presumably due to their peripheral interaction 256 rather than integral association with the IM ( Figure S5). Thus, in the experimental 257 conditions we tested, IM-cyto annotated proteins resulted in a mixture of soluble and 258 IM proteins, which through their fractionation patterns, we could allocate their 259 predominant protein localization. 260

Identification of potentially mis-annotated proteins 262
STEPdb combines robust computational predictions with a wealth of experimental 263 information to allocate protein localization in E. coli, and hence we used it here as a 264 gold-standard dataset to benchmark our data and decide on thresholds for making 265 localization calls. In doing so, we noted that a small fraction of our allocations of IM, 266 soluble and OM proteins conflicted with their STEPdb annotations. Namely, 63 (out 267 of 1368) high-confidence proteins, including both membrane and soluble, were found 268 in clusters that at least partially conflicted with their corresponding STEPdb 269 localization annotation (Table S4). We manually curated these proteins based on  (Table S4). We found corroborating evidence for 12 277 more such cases in literature or in other prediction databases. Importantly, in these 278 cases rather than relying on the combined result from multiple in silico prediction 279 algorithms, our data is able to provide the high-confidence experimental evidence 280 needed to verify the localization for these proteins. 281

282
We also noted unexpected fractionation patterns for certain proteins upon sucrose 283 density fractionation. Firstly, we detected several proteins annotated as solely 284 periplasmic in STEPdb fractionated as membrane proteins in our experiments. Many 285 of them have known interacting membrane partners, which is presumably the reason 286 they co-fractionate with either the IM (EnvC, FdoG, NapG, and RseB) or the OM 287 (LptA) ( Figure 4C, Table S4). The situation was similar for periplasmic components of 288 IM ABC transporter complexes (FhuD, PstS, and SapA) which co-fractionated with 289 IM proteins, possibly as a consequence of a direct conditional association with IM 290 proteins upon active transport (Moussatova et al., 2008), but were only annotated as 291 periplasmic in STEPdb (Table S4). In total, there were 19 cases for which STEPdb 292 had incomplete annotation. 293

Conversely, FecB, a known periplasmic component of an ABC transporter complex is 295
annotated both as IM-peri (peripherally associated to IM) and periplasmic in STEPdb, 296 but only identified as soluble in our experimental conditions. In this case, we are 297 failing to detect the IM-association because the transporter is likely inactive in the 298 conditions we probe, and the STEPdb annotation is more accurate (Table S4). 299 Moreover, IM proteins known to form trans-envelope complexes (e.g. TamB and 300 TonB) failed to cluster as either IM or OM proteins in the fractionation experiments. 301 Overall, we could reasonably explain 51 out of the 63 cases where STEPdb and our 302 results disagreed, out of which we could find additional information that supports our 303 localization call (42 proteins) or the original STEPdb annotation (9 proteins) 304 (summarized in Table S4). Overall, these findings demonstrate that our quantitative 305 assessment of protein localization captures accurately the in vivo biological state. 306

308
We quantified the membrane proteome using TMT-labelling MS, which allowed us to 309 experimentally identify localization in a systematic and unbiased manner for the 310 majority of membrane proteins in E. coli. We verified current knowledge of membrane 311 protein localization for proteins that was determined experimentally and/or predicted 312 bioinformatically. The advantage of this method is that instead of assessing 313 membrane protein localization via conventional immunoblot of sucrose density 314 gradient fractions, quantitative proteomic approaches can be used to rapidly and 315 quantitatively assess protein localization in an antibody-independent manner. 316

317
Comparison of our data with the curated STEPdb annotation revealed high 318 concordance. In addition, our data provided a predominant location for a large part of 319 the E. coli membrane proteome referred to as peripherally associated membrane 320 proteins (Papanastasiou et al., 2013). STEPdb categorizes proteins that peripherally 321 interact with the cytoplasmic face of the IM as a "peripheral IM protein", which we 322 referred to here for simplicity as IM-cyto (Table S2) This absence of co-fractionation with the IM proteome, suggests that many of these 327 proteins are mainly cytoplasmic in exponentially growing cells in LB, and their 328 previous identification in membrane protein fractions in this study and others is likely 329 because they are recurrent contaminants. We cannot exclude that some of these 330 proteins have conditional, low affinity or transient association with the IM and proteins 331 therein, or a small fraction of the total protein amount is at any given point associated 332 with the IM. In contrast, about one third of the IM-cyto proteins exhibited clear IM 333 fractionation patterns and thus can be confidently assigned as IM-associated proteins. 334

335
We found 63 proteins out of 1368 which were inconsistent with the reported 336 localization annotation in STEPdb. We were able to explain 51 by additional literature 337 data. Those proteins have a wrong or missing annotation in STEPdb (42) or their 338 function/activity makes their sucrose gradient fractionation patterns misleading (9). In 339 most cases, sucrose gradient fractionation failed to make the right call when the 340 protein was spanning the envelope or had presumably dual membrane localization. It is likely that the new localization is also correct for most of 12 remaining proteins 342 (Table S4). Thus, our data are helpful for improving protein localization, even for an 343 organism as intensively studied as E. coli, which has been subjected to a plethora of 344 targeted and systematic studies and researchers can benefit from carefully curated 345 databases, such as STEPdb.

CODE AVAILABILITY 392
The code and pipelines used for data analysis are available upon request. 393

DECLARATION OF INTEREST 395
The authors declare no competing interests. 396  Table S2. Localization annotation as in Figure 1C. can be found in Table S4.  Table S1. TMT-labelling MS results and normalization (signal sum values) 517 Table S2. Localization annotation used in this study based on STEPdb 518 Table S3. Membrane ratio and K-means clustering data 519

Membrane vesicle isolation and sucrose density fractionation 531
Membrane vesicles were isolated and fractionated essentially as previously 532 described (Anwari et al., 2010) with the following deviations. Phosphate Saline Buffer 533 (PBS) was used as the base buffer instead of Tris. After sucrose gradient separation, 534 1 mL fractions were collected step-wise from the top of the gradient, yielding 11 535 fractionated samples that were analyzed by Coomassie staining and Western blotting 536 using SDS-PAGE gels as described below. Fractions 2 to 11 (f02-f11), as well as an 537 aliquot of the total input membrane sample (diluted 10 times in H2O), were labeled 538 to the manufacturer's instructions as described below. In brief, 0.8 mg of the TMT 556 reagents was dissolved in 42 µL of 100 % acetonitrile and 4 µL of this stock was 557 added to the peptide sample and incubated for 1 hour at room temperature. The 558 reaction was quenched with 5% hydroxylamine for 15 minutes at room temperature. 559 Then the 10 samples labelled with unique TMT10plex labels were combined into one 560 sample. The combined sample was then cleaned up using OASIS® HLB µElution 561 Plater (Waters). The samples were separated through an offline high pH reverse 562 phase fractionation on an Agilent 1200 Infinity high-performance liquid 563 chromatography system which was equipped with a Germini C18 column (3 µm, 110 Å, 100 x 1.0 mm, Phenomenex). The fractionation was performed as previously 565 described (Reichel et al., 2016). Samples were pooled in into a total of 12 fractions. 566 567

Mass spectrometry data acquisition 568
Chromatography was performed using an UltiMate 3000 RSLC nano LC system 569 formic acid in acetonitrile) from 2% to 4% in 6 min, from 4% to 8% in 1 min, then 8% 576 to 25% for a further 71 min, and finally from 25% to 40% in another 5 min. The outlet 577 of the analytical column was coupled directly to a Fusion Lumos (Thermo) mass 578 spectrometer using the proxeon nanoflow source in positive ion mode. For the full scan (MS1) a mass error tolerance of 10 ppm, and for MS/MS (MS2) 599 spectra of 0.02 Da was set. Further parameters were set: Trypsin as protease with 600 an allowance of maximum two missed cleavages; a minimum peptide length of seven 601 amino acids; at least two unique peptides were required for a protein identification. 602 The false discovery rate on peptide and protein level was set to 0.01. Triethanolamine, 2% SDS). Bio-rad systems were used, applying 100 V per chamber. 620 For Coomassie staining, gels were incubated in staining solution (50% methanol, 621 40% H2O, 10% acetic acid, 1 g Brilliant Blue R250 per 1 L) for 1 hour, and destained 622 with destaining solution (40% ethanol, 10% acetic acid, 50% H2O) until the desirable 623 signal was achieved. Incubations were performed at room temperature with constant 624 moderate mixing by rocking. 625 626