Cross-linking Measurements of In Vivo Protein Complex Topologies*

Identification and measurement of in vivo protein interactions pose critical challenges in the goal to understand biological systems. The measurement of structures and topologies of proteins and protein complexes as they exist in cells is particularly challenging, yet critically important to improve understanding of biological function because proteins exert their intended function only through the structures and interactions they exhibit in vivo. In the present study, protein interactions in E. coli cells were identified in our unbiased cross-linking approach, yielding the first in vivo topological data on many interactions and the largest set of identified in vivo cross-linked peptides produced to date. These data show excellent agreement with protein and complex crystal structures where available. Furthermore, our unbiased data provide novel in vivo topological information that can impact understanding of biological function, even for cases where high resolution structures are not yet available.

Protein interactions and topologies are key features that enable specificity, function and the evolution of highly integrated, regulated networks in biological systems. Primary challenges associated with the study of biological systems include identification of protein interactions and measurement of topological features of proteins and their interactions in vivo. Advancements such as the Yeast Two-Hybrid (1), coimmunoprecipitation (2), and Tandem Affinity Purification tags (3) have greatly increased the ability to identify hundreds or even thousands of interactions from complex biological samples (2, 4 -6). Despite the many thousands of protein interactions that are now known (7) however, for only a tiny fraction is there any knowledge of their in vivo topology. On the other hand, if topologies of interactions were more widely known, this information could improve understanding of underlying fundamental factors that drive interactions, improve develop-ment of highly specific modulators of protein interactions, improve interaction prediction capabilities, and improve comprehension on biological systems. Unfortunately, exceedingly few methods exist to allow unbiased measurement of proteinprotein interaction topological features in cells.
Chemical cross-linking has great potential for in vivo interaction topological studies (8 -10). Cross-linked peptides contain information about interacting protein identities and can uniquely define regions of protein sequences that are near one another when proteins are present within the native cellular environment. Challenges associated with in vivo crosslinking analysis that have precluded this achievement include the difficulty in identification of cross-linked peptides and the severe dynamic range constraints resultant from the overwhelming majority of noncross-linked peptides. Our efforts to overcome these challenges resulted in development of Protein Interaction Reporter (PIR) 1 technology (11) that uses a novel type of cross-linker and mass spectrometry to identify peptides that are close to one another within protein complexes in cells. These efforts resulted in the first reported identification of cross-linked peptides from live cells (9) including the first in vivo identification of an interaction among two outer membrane cytochrome c proteins, an interaction that appears to be critical to electron transport properties of Shewanella oneidensis (12).
Here we present the first application of PIR technology to the study of interactions in E. coli cells where 65 cross-linked peptide pairs were unambiguously identified. To date, this constitutes the largest in vivo cross-linked peptide data set ever produced. In this system, we are also able to compare many of our results with known protein and protein complex crystal structures that demonstrate excellent agreement with our in vivo data. Importantly, this comparative analysis was also used to define distance constraints that enable refinements of structural prediction of in vivo protein complexes never before possible. Furthermore, within our large-scale cross-linking data set, we identified cross-linked peptides from the periplasmic, C-terminal domain of Outer membrane protein A (OmpA). These results illustrate for the first time that OmpA can exist within cells as a dimer or multimer, presenting new possibilities for improved comprehension of the function of OmpA and its possible role in ion transport. Finally, structural predications of this domain, together with our crosslinking data provide the first ever in vivo topological picture of periplasmic domain of the OmpA dimer and serve to illustrate the unique capabilities of PIR technology for in vivo interaction identification and topology studies.

MATERIALS AND METHODS
Material-PIR cross-linker was synthesized in-house following the protocol discussed elsewhere (9,11,13). Tergitol solution (70% Nonidet P-40 in water), urea, iodoacetamide and ammonium acetate were purchased from Sigma-Aldrich (St. Louis, MO) and used without further purification. Tris(2-carboxyethyl) phosphine hydrochloride, monomeric avidin ultralink resin and mass spectrometry-grade Trypsin endoproteinase were purchased from Pierce (Rockford, IL). Amicon Ultra Ϫ0.5 ml 10K centrifugal filters were from Millipore (Billerica, MA), C18 Sep-Pak Cartridges were from Waters (Milford, MA), Macro Spin Columns were from the Nest Group, Inc. (Southborough, MA), and phenylmethylsulfonyl fluoride was from GBionsciences (Maryland Heights, MO). Anti-OmpA antibody was a generous gift from Professor Prasadarao Nemani at the Children's Hospital Los Angeles and University of Southern California. Amino-terminal FLAG-BAP fusion protein and mouse monoclonal anti-FLAG M2 antibodies were purchased from Sigma-Aldrich (St. Louis, MO).
In Vivo Cross-linking and Cell Lysis-A 50 ml volume of E. coli K12 cell suspensions was harvested at O.D. 0.6 -0.8 for cross-linking reaction. The cells were pelleted and washed five times with 1 ml phosphate buffer saline (PBS) each time before cross-linking. The cells were then suspended in 1 ml PBS solution and PIR cross-linker was added to the suspension with a final concentration of 1 mM. The mixture was incubated at 4°C for 1 h. The cells were then washed five times with 1 ml PBS each time and then suspended in 1 ml PBS solution with 0.1% Nonidet P-40. The cells were lysed by sonication and concentrated by centrifugation at 4°C and 15,000 ϫ g for 45 min. The soluble fraction was used directly in the next step. The insoluble fraction was dissolved in 200 l of 8 M urea in 100 mM Tris⅐HCl buffer and then the proteins were precipitated with cold acetone. The protein pellet was then dissolved in 200 l urea buffer and diluted with 0.1% Nonidet P-40 in PBS to 1 ml.
Enrichment of the Cross-linked Products-Two complementary sample preparation methods were used. The first one involves affinity purification of the PIR labeled proteins followed by on-bead digestion of the purified proteins. The cross-linked peptides were further enriched by adding a new aliquot of the avidin slurry. The eluent from the second enrichment step was used for mass spectrometry analyses. The second method includes strong cation exchange fractionation of the cross-linked peptides after digestion of the E. coli proteins. The cross-linked peptides were further enriched with avidin-biotin affinity purification.
Strong cation exchange chromatography: The cell lysate was reduced with 5 mM tris(2-carboxyethyl) phosphine hydrochloride and alkylated with 10 mM iodoacetamide. Then the sample was digested with trypsin (trypsin to sample ratio 1:300, to specifically cleave at C terminus of arginine and lysine) at 37°C overnight. After digestion, the sample was purified with a C18 sep-pak column. Desalted peptides were then resuspended in 0.5% formic acid and 5% acetonitrile and loaded on a HIL-SCX macro spin column. The columns were then washed with 5% acetonitrile and 0.5% formic acid to remove low charged peptides and eluted stepwise with increasing concentrations of ammonium acetate solutions. Ammonium acetate was removed by speedvac before further analysis.
Avidin affinity purification: As a complementary method to the strong cation exchange method, avidin capture affinity purification can also enrich cross-linked products. The purification generally followed the previous literature (9). In brief, for a 1 ml E. coli cell lysate, a 100 l aliquot of avidin beads slurry was added and incubated at room temperature for 2 h. The beads were washed with 1 ml 100 mM NH 4 HCO 3 five times. The beads were then suspended with 100 l NH 4 HCO 3 solution, reduced with 1 l of 0.5 M tris(2-carboxyethyl) phosphine hydrochloride and alkylated with 1 l of 1 M iodoacetamide. The sample was incubated in 1 ml of 10 ng/l trypsin solution at 37°C for 2 h to digest proteins. Then 10 l 100 mM PMSF was added to the solution to quench digestion, and the mixture was incubated at room temperature for 30 min. A second avidin capture was performed by adding 100 l avidin slurry to the mixture and incubated for 2 h. Then the beads were washed with 0.1% Nonidet P-40 in NH 4 HCO 3 solution three times and with NH 4 HCO 3 3 times. Finally the beads were eluted with 75% acetonitrile and 0.5% formic acid 4 times, each time with 400 l eluting buffer.
Acid Cleavage-Cross-linked products were cleaved in 95% trifluoroacetic acid and 5% water with 2 h incubation at room temperature. The reaction mixture was then re-suspended in five-fold volume of cold triethylether and incubated in Ϫ80°C freezer overnight. The mixture was then centrifuged at 14,000 ϫ g for 30 min to precipitate the peptides. The peptides were then re-suspended in 0.1% formic acid for mass spectrometry analyses. Acid cleavage of the PIRlabeled molecules results in release of all peptides that then can be analyzed with conventional liquid chromatography tandem MS (LC-MS/MS) to identify peptide sequences and sites of reactivity. Although these efforts do not allow identification of cross-linked sites, they do result in efficient identification of lysine residues that are reactive with PIR molecules in cells. This database of sequences can be used to determine subcellular location of PIR-reactive proteins, as is shown in supplemental Fig. S3.

Mass Spectrometry Analyses
LC-MS Identification of Cross-linked Relationships-The enriched cross-linked products were further separated with UPLC (Waters nanoAcquity, Milford, MA) and detected with linear trap quadrupole Fourier transform MS (LTQ-FT MS) (Thermo Fisher Scientific Inc., Waltham, MA). A 30-cm long C18 column was made in-house by packing fused silica capillary (360 m ϫ 75 m) with MAGIC C18AQ 100A 5U beads (Michrom Bioresources, Inc., Auburn, CA). A 2-cm long trap column was prepared similarly by packing fused silica capillary (360 m ϫ 100 m) with MAGIC C18AQ 200A 5U beads. The following LC gradient was employed in both LC runs: 0 -60 min 5-60% buffer B, 60 -85 min flushing with 80% buffer B and 85-120 min equilibrating with 5% buffer B (buffer A: 0.1% formic acid in DI water; buffer B: 95% acetonitrile and 0.1% formic acid in DI water). The MS instrument resolution was set at 25,000 to obtain high mass accuracy data. Two scan types, the first one with in-source collisioninduced dissociation (ISCID) off and the second one with ISCID on (80 V), was applied alternatively. Searches with software tools X-links (14) or Blinks (15) were used to identify the PIR relationships. Relationship mass tolerance was set to 10 ppm. The m/z values of the identified released peptides and the LC elution times were then used to generate a mass-and-time targeted inclusion list, which was used in the second data-dependent LC-MS/MS run.
LC-MS/MS Identification of Released Peptides-In the second LC-MS/MS, the same columns and gradient as the first LC-MS experiment were used to maintain the retention time information. The MS/MS experiments were set up as follows: one MS with 25,000 resolution was followed by five data-dependent MS/MS experiments with the precursor masses and retention time values obtained from the mass-and-time targeted list. Dynamic exclusion repeat and exclusion duration were both 15 s. The acid cleavage sample was analyzed with the same LC-MS/MS method, except that none of the mass-and-time targeted inclusion list was used. Instead, the MS/MS precursors were selected according to their intensities.
Manual Verification of Cross-linked Relationships-A third LC-MS/MS was performed on the same column with the same gradient as the first and second runs. A mass-and-time targeted list of the identified cross-linked parent ions was used in this third run to trigger three MS/MS events for each MS event. The dynamic exclusion repeat and exclusion duration were both set as 5 s and the CID energy as 27.
Data Analysis-The peak lists were generated with Hardklor (version 1.34) (16). The LC-MS/MS data were searched against the E. coli K-12 database with Mascot (Version: 2.3.01). Three possible missed cleavages were allowed, the precursor error tolerance was 25 ppm and the fragmentation error tolerance was 0.6 Da. The remaining tag from the PIR cross-linker (mass ϭ 99.0320) was treated as a variable modification on lysine residues or protein N terminus. Carbamidomethylation on cysteine is considered as fixed modification. Oxidation on methionine, loss of water on N-terminal glutamine and glutamate are considered as variable modifications. All MS/MS data were searched against E. coli database (Release date, 02-19-2009). In total, 4178 sequences were included in the database. All peptide IDs were filtered with expect score threshold 0.1. Minimum 2 unique peptides were required for each protein ID. Then the software tools X-links and BLinks were used to search for the identification of relationships. X-links parameters were: filter m/z range 300 -2000, filter isotopic fit 0.0 -0.2 and mass tolerance 10 ppm. BLinks parameters were: PPM tolerance 10ppm, p value threshold 0.1 and minimum data points 5. Finally manual inspection of the results was required to confirm that all the identified cross-links were correct.
Protein Docking-The OmpA dimeric structural model was computed with tools on the Symmdock web server (17). The starting structural file was model P0A910 from the SWISS-MODEL Repository containing residues from position 208 to position 337. A symmetry order of 2 was used. The top 100 models from the docking results were manually filtered with cross-linking data. Thirty-five Å was chosen as the distance threshold based on the distance distribution of our cross-linking results (supplemental Fig. S7). Only two very similar models were left after evaluation with distance constraints and accessibility of the labeled sites. These two models are shown in Fig. 4 and supplemental Fig. S8.
The heterodimeric complex model of FKBP-type peptidyl-prolyl cis-trans isomerase FkpA (FkpA) and 30S ribosomal protein S6 (RpsF) was computed with Hex 6.1 (18). Structural files 1Q6H and 2QAL from RCSB Protein Data Bank (PDB) were used as starting structures for FkpA and RpsF. RpsF with part of ribosome was used as the receptor and the FkpA dimer was used as the ligand. The starting points for docking experiments were determined with the cross-linking results. Shape and electrostatics were used as the correlation methods. Three dimensionality was used as FFT mode. The docking results were refined with MM minimization. The receptor range was set to 180 with step size 7.5. The ligand range was set to 180 with step size 7.5. The twist range was set to 360 with step size 5.5. The distance range was set to 40 with scan step 0.8. We have also computed the dimeric structure model of OmpA periplasmic domain with Hex 6.1 which produced results very similar to those obtained with Symmdock and described above.

RESULTS AND DISCUSSION
PIR Technology-The PIR approach has been described in detail previously (9,11,13,19) and successfully applied on Shewanella oneidensis bacteria (9). Here, the principles of PIR technology will be only briefly discussed. The structure and cleavage characteristics of the PIR cross-linker used for these studies are presented in Fig. 1A. The employed PIR molecule contains two mass spectrometry-labile bonds, two reactive groups and biotin, which serves as an affinity purification tag. The two labile bonds are specifically cleaved during mass spectrometry analysis without fragmentation of other peptide bonds. All cross-linked product types, such as dead-end, intra-and intercross-linked peptides are cleaved at these labile bonds, yielding product type-specific mass relationships. For example, intercross-linked peptides can be cleaved to yield three components as illustrated in Fig. 1B (cleavage reaction). The observed fragment masses sum to match the measured intact cross-linked product mass. Fig. 1C illustrates three PIR mass relationships that define each product type. The parent ion of the inter-cross-linked peptide pair (parent) is cleaved to yield a reporter ion and two released peptides ( Fig.  1C, intercross-link). The sum of these three fragment masses matches the inter-cross-linked parent mass (parent ion mass ϭ reporter ion mass ϩ released peptide A ϩ released peptide B). Analogous mass relationships also define deadend and intracross-linked PIR products. PIR cleavable features allow differentiation of cross-linked and noncross-linked peptides and identification of all cross-link product types.
A two-step mass spectrometric identification strategy ( Fig.  2) was developed to enable relationship and cross-linked protein sequence identification. Precursor and ISCID scans are alternated throughout the whole LC-MS run to separately detect cross-linked peptide pairs (parents) and cleavage products (reporter and released peptides). PIR mass relationships are identified with high mass measurement accuracy in neighboring scans (14). Upon relationship identification, a second LC-MS/MS experiment where ISCID activation was continuously applied to release all peptides was used to identify peptide sequences. Each of these peptides was selected for MS/MS analysis based on accurate mass and retention time. In the data analysis process, the residual PIR-induced mass modification was included during the database search accomplished with Mascot (20). Accurate and reliable crosslinked sequence identification can be achieved even with whole genome database searches since each query is based on a single peptide fragmentation pattern. In contrast, conventional noncleavable cross-linked peptides can be challenging to identify with whole genome database searches and proteome-wide samples because of the lack of individual peptide mass information and complex fragmentation patterns (9,14,21). Although smaller cross-linkers may yield tighter distance constraints that could benefit structural modeling, these molecules may not cross-link as many sites in vivo as are possible with larger, more flexible molecules that can span a range of distances. Thus, the competing goals of measurement of in vivo protein topological features and measurement of precise distance constraints underscore the need for development of an array of PIR molecules with a variety of physical properties. Advanced features of the PIR strategy enable application directly on whole cells without purifying targeted proteins/complexes or sample fractionation prior to cross-linking. Finally, the affinity tag on PIR molecules allows enrichment of labeled peptides from complex sample digests.
Cross-linking on E. coli Cells-The application of PIR technology in E. coli cells was achieved through the steps outlined in Fig. 2. Verification that cross-linking was achieved in cells included anti-biotin Western blot analysis, extensive cell washing after labeling before lysis, avidin-biotin affinity purification of PIR-labeled proteins and peptides after digestion and mass spectrometry identification of the cross-linked products. Anti-biotin Western blot analysis is a highly sensitive method to detect biotinylated (or PIR-labeled) proteins that are present within cells or in subsequent cell washing steps before lysis. Supplemental Fig. S1 shows an antibiotin Western blot comparison of E. coli cell lysates from cells treated with or without PIR reaction. Both the soluble and insoluble fractions of the lysates from labeled cells have significantly more biotin-containing protein bands than those from the control cell lysates. Cells can undergo unintended lysis as well as normal secretion of proteins during crosslinking reaction. A negative antibiotin Western blot analysis of PIR-labeled cell wash solutions (normally within 5 washes with PBS; see supplemental Fig. S2) was required before cell lysis and sample preparation for mass spectrometry experiments. This requirement allowed the greatest chance for observation of topological features of protein interactions that were present in cells during PIR reaction. Shotgun proteomics analysis of acid-cleaved PIR labeled species from E. coli cells yielded a set of 1503 peptide sequences from 416 PIR-reacted proteins. Analysis of these species to ascertain their subcellular location was performed with PSORT 2.0 (22) which illustrated that approximately half of the observed PIRlabeled E. coli peptides were derived cytoplasmic proteins (supplemental Fig. S3). Previous efforts with other Gramnegative bacteria indicated a similar fraction of cytoplasmic proteins are reactive with PIR molecules in cells and PIR reactive proteins were visualized on cell membranes and the cytoplasm with immunogold electron microscopic methods (23).
A summary of all the intercross-linked peptide pairs from E. coli cells is listed in Table I. PIR experiments on E. coli cells resulted in identification of 65 cross-linked pairs in total, which is the largest in vivo cross-linking data set produced to date. E. coli was chosen as a biological system for application and demonstration of PIR technology because many x-ray crystal, NMR and EM structures of E. coli proteins and protein complexes exist (24). PIR technology complements those efforts by enabling in vivo studies of protein complex topologies, supporting structural assignments in cells for cases where crystal structures are known, and by providing novel information for proteins and interactions without prior structural data. The PIR results in Table I include inter-cross-linked peptides from homo-and heterodimeric interactions, as well as many which likely result from cross-linking of intra-molecular interactions. Identified PIR cross-link relationships can be manual verified by selecting cross-linked products for MS/ MS. In 62 out of 65 cases, identified peptide relationships resultant from precursor and ISCID scan comparisons were verified and confirmed as correct assignments. The remaining 3 inter-cross-linked peptide pairs (indicated with * in Table I) were not observed in replicate samples prepared for the verification experiments which is likely due to biological variation among multiple cellular populations. The identified reactive sites are indicated in Table I with an underline on each sequence. The distances between reactive sites were measured from corresponding x-ray crystal or NMR structures if available. A few of the identified protein complexes are discussed in detail in the following paragraphs, including those with and without previously determined crystal structures. The known structures are compared with the in vivo cross-linking data which showed excellent agreement. Furthermore, this data set provides new information on protein complexes without known structures. These results demonstrate the unique util-ity of the PIR cross-linking strategy to enable protein complex topological studies in cells.
PIR Results of Protein Complexes With Known Structures-As Kuriyan and Eisenberg discussed in their insightful review (25), regulated protein interactions are the inevitable consequence of colocalization and adaptive mutation. Thus, the most prevalent interactions are likely to involve homomultimeric interactions, because the probability of colocalization is maximal for proteins of identical sequence. However, these can be challenging to identify with cross-linking approaches, because it can be difficult to unambiguously determine whether the two cross-linked peptides come from the same protein subunit or two separate subunits. However in many cases in this study, cross-linked peptides with identical amino acid sequences were observed that allow unambiguous identification of homomultimeric interactions, since both peptides can come only from the same protein sequence and occur only once in each sequence. An exciting example is the periplasmic chaperone Skp which is believed to form a homotrimeric protein complex (26,27). Skp is involved in the folding and insertion of proteins in the outer membranes of many Gram-negative bacteria, for example such as Outer membrane protein A (OmpA) (27)(28)(29). As such, Skp plays a critical role in prevention of misfolding and aggregation of  many outer membrane proteins. Improved knowledge of the in vivo structure and topology of Skp can increase understanding of this important process. The crystal structure of Skp illustrates one hairpin-shaped ␣-helical extension from each of three subunits that form the trimer (PDB entry 1SG2) (Fig. 3A). These extensions form the fingers or arms of a "three-pronged grasping forceps" model and form the cavity that serves in its chaperone function (26,30). However, the tips of the ␣-helical extensions are highly charged and flexible regions. As a result, only two of the three tips appear in the crystal structures, with the third tip disordered and not resolved (26). Interestingly, in vivo PIR results demonstrate that at least two tips of the "prongs" were cross-linked in cells by the PIR cross-linker (Fig. 3A red-red cross-link). The distance between the two cross-linked sites is ϳ17.2 Å based on measurements from the crystal structure. These data are the first such visualization of the Skp chaperone complex in cells and support the forceps model in vivo. In addition, this same site (Lys85) was found to be cross-linked to another peptide from the Skp complex ( Fig. 3A red-yellow cross-links). In this case, it is unclear whether this cross-link is within or between subunits, because the second cross-linked site (Lys97) may reside on the same subunit as the first, but may also reside on another subunit. According to the distance measurements from the crystal structure, the intra-subunit cross-link (distance 23.5 Å) is more likely than the inter-subunit cross-link (distances 31.9 Å). Further development and application of PIR technology can help resolve these sites. Nonetheless, the data presented here demonstrate that in vivo cross-linking can yield useful topological data in protein regions that, because of charge, flexibility or other factors, present difficulties with crystal structure measurements. Several other homomultimeric protein complexes were identified in these PIR studies on E. coli cells. For example, 5-keto-4-deoxyuronate isomerase (KduI) has been reported to exist as a homohexamer (31) (PDB entry 1XRU), and serves as an important metabolic enzyme that supports the use of pectin as a carbon source in some bacteria. Two identical peptides from KduI were identified as cross-linked species in the present in vivo PIR studies. The crystal structure for two subunits and the observed cross-linked sites are shown in Fig. 3B. The distance between these two sites is 21.8 Å based on the crystal structure model, which matches very well with the PIR cross-linker arm length range (15). Another example is murein lipoprotein (Lpp), which is an abundant homomultimer complex (32) (PDB entry 2GUV) in E. coli that is important for maintenance of cell envelope integrity. Two cross-linked sites were observed on the C termini of Lpp chains from in vivo PIR experiments. The cross-linking distance between the two adjacent subunits is 18.3 Å, and 28.4 Å between the nonadjacent subunits (Fig. 3C). The mass spectrometry data of all these protein complexes are presented in supplemental

PIR Results of Protein Complexes Without Known
Structure-The unbiased in vivo PIR cross-linking strategy is especially advantageous for discovering novel protein interactions along with topological information for the interacting interfaces. The PIR approach can also reveal topological information on protein regions where it may be difficult to acquire with other methods like x-ray crystallography or NMR. Two examples will be presented here. One is a novel interaction between FKBP-type peptidyl-prolyl cis-trans isomerase FkpA and 30S ribosomal protein RpsF. The other is the discovery of OmpA periplasmic region homodimer. The OmpA C-terminal domain has remained resistant to crystallization, despite more than 30 years of effort to study this protein (33). In vivo measurements present new opportunities to identify structural and topological features that can help improve understanding of function.
The first example is the study of FkpA topology and interactions. Mass spectrometry data for each identified cross-link pair in this complex is presented in supplemental Fig. 4E-4G. Proper folding of proteins in cells is essential to cell structures and functions. In this process, chaperones catalyze the correct polypeptide folding, prevent and remove incorrect or incompletely folded products and assist translocation of proteins. Generally, there are two types of chaperones; one that catalyzes the proper disulfide exchange and another to attain correct cis/trans isomerization of peptidyl-prolyl bonds (34). Chaperones can also be grouped based on their different subcellular localizations. The major cytoplasmic chaperones in E. coli include the well studied DnaK and GroEL, while the periplasmic chaperones are less well understood. FkpA is a periplasmic bifunctional chaperone and cis/trans peptidylprolyl isomerase. The structure and function of this protein have been studied due to its interesting combination of two functional roles and its localization in the periplasmic region of the cells (35,37). The present study provides new in vivo information on the structure and interactions of FkpA. Lys110 from FkpA was observed in several different cross-linked relationships, including a hetero-protein interaction, a homodimer protein interaction and an intra-FkpA cross-link. This kind of multiply cross-linked site appears hyper-reactive possibly due to accessibility, local protein disorder, or charge. Lys110 is labeled with red color in Fig. 3D-3E (PDB entry 1Q6H). FkpA is known to form a homodimeric complex from ultracentrifugation and crystallography measurements (35,36). The data presented here are the first to visualize such dimeric FkpA complex structures in vivo. The three helices on the N-terminal of each monomer were reported to be essential for maintenance of the dimeric structure (36) and are observed to be the most reactive regions for cross-linking in this study (Fig. 3D red spot). A pair of peptides from FkpA with identical sequence was cross-linked, indicating that the cross-link was formed between two subunits of this dimeric protein complex (Fig. 3D, red-red cross-link). The distance between these two sites was measured as 28.1 Å based on data from the crystallography structure. The same site (Lys110) was also cross-linked to a 30S ribosomal protein RpsF which represents a novel finding from these efforts. Using these identified sites and the known structures of each protein, the in vivo structure of an FkpA-RpsF protein complex was modeled with docking software Hex 6.1 (18). The PPIase, FkpA, is known to interact with ribosomal proteins (37); however interacting sites have remained elusive. The data presented here provides novel structural information for this interaction. Based on the cross-linking results, an FkpA-30s ribosome complex structure was predicted with Hex 6.1 (18) and is shown in Fig. 3E. The result shown here is the first ever prediction of the in vivo complex of FkpA dimer and RpsF as it resides within the 30S ribosome. For clarity, only RNA proximate to RpsF is displayed in this figure. In addition to its periplasmic functional role, FkpA is known to interact with ribosomes. Previous studies on the isomerase activity of the FkpA-ribosomal protein complex showed that in vitro FkpA demonstrated isomerase activity when associated with 50S or 70S ribosomal protein complexes (37). Other results from yeast-two-hybrid and high throughput experiments (E. coli interaction database, http://ecid.bioinfo.cnio.es/) demonstrated that FkpA can interact with 30S ribosomal protein RpsD. The results presented here support the interaction between FkpA and the 30S ribosomal protein complex, and demonstrate that an additional 30S subunit, RpsF, can interact with FkpA in vivo. In addition, another cross-link between this hyper-reactive site and a nearby lysine within the FkpA sequence was also identified (Lys110-Lys85). This is shown as the red-yellow cross-link on Fig. 3D. However, this pair is most likely an intra-protein cross-link (distance 18.7 Å), but could also be an inter-protein cross-link (29.9 Å). Further cross-linking studies with alternate PIR structures may resolve these two possibilities. However, the results presented here are the first ever to reveal regions within the sequences of the complex components that are close to one another in vivo. On FkpA, this region is located on the center of the protein, which is supported by previous protein functional studies (36) where the chaperone and isomerase functional regions of FkpA were reported to localize to the distinct Nand C terminus separately. Deletion experiments involving each terminal showed that these two domains can function independently, leaving the center of FkpA available for maintenance of the structure (37). Interestingly, many other lysine residues exist near this hyper-reactive site. However few were observed to be labeled or cross-linked in these experiments. All of these non-labeled sites are located on the opposite side of the protein complex. Such specific cross-linking and localization of these lysine residues indicate that, in addition to solvent accessibility and the presence of lysine residues, other factors affect the likelihood that reactive sites will be observed as cross-linked or labeled peptides.
A novel homomultimeric Outer membrane protein A (OmpA) interaction was also identified in the in vivo PIR studies presented here. OmpA in E. coli has been the subject of study for more than 30 years. Highly abundant and conserved among Gram-negative bacteria, OmpA is thought to play a key role in bacterial outer membrane structural stability, serve as a receptor for bacteriocins and phages, promote adhesion in pathogenic strains that transmit food-born infection (33,38,39). As suggested (40), OmpA is perhaps the most important structural protein in E. coli. The first crystal structure of the N-terminal 171 residues of OmpA showed that OmpA can form an 8-stranded anti-parallel ␤-barrel (41). The C-terminal domain of OmpA has been resistant to crystallization and its role in OmpA function has remained elusive. A structural model of this domain has been developed and accepted by many researchers, which is based on a highly homologous protein RmpM from Neisseria meningitides (42). Despite these efforts, the oligomeric state of OmpA in vivo has remained unclear (32,40,(43)(44)(45). In general, membrane proteins are the most difficult proteins to study structurally, because they can be strongly influenced by the presence of the lipid bilayer, are difficult to overexpress and fold properly, and frequently contain disordered regions (46).

FIG. 4. Prediction of disordered regions and binding sites in E. coli. A) Prediction of disorder in GapA; B), Prediction of disorder in OmpA; and C) Prediction of binding regions in
OmpA. Highly disordered regions are generally associated with prediction of 0.5 or greater. Identified cross-linked sites are indicated and appear in regions of higher predicted disorder. Blue and red from prediction algorithms VLXT and CAN_XT, respectively, that use various sequence lengths and datasets for training as described on website: www.pondr. com. Potential binding sites are predicted with tools provided on 'http:// anchor.enzim.hu/'. The shaded areas are predicted binding regions. 10.1074/mcp.M110.006841-10 In vivo PIR experiments with E. coli have yielded several cross-linked peptide pairs from OmpA. Interestingly, all the observed cross-linked sites are in the C-terminal domain of OmpA, including lysine residues 213, 294, and 338. These data are supportive of the RmpM-based structural model and can help answer key questions of in vivo OmpA topology. This model structure of the C-terminal domain is shown in Fig. 5D with the cross-linked sites superimposed with different colors.
Similar to examples above, a complete set of mass spectra for each cross-linked pair is shown in supplemental Fig. S5, to identify each cross-linking relationship with high confidence. Based on prediction of sequence disorder (47,48) and protein binding sites (49), the C-terminal domain of OmpA is highly disordered as shown in Fig. 4B and it also contains several potential binding sites as shown in Fig. 4C. This suggests that local protein flexibility of disordered regions and the proximity of binding sites affect which in vivo cross-linked sites are likely to be observed. These regions are most difficult to study structurally with most conventional techniques and PIR technology may provide unique insight on protein interactions that involve disorder. Other proteins investigated for disordered content also show good correlation between disorder and PIR reactive sites including, GAPDH (Fig. 4A), Skp and GroEL (46).
Another important advantage of PIR technology is the potential to learn about native protein structures and interactions during cross-linker application. This is especially advantageous for membrane proteins since in vivo crosslinking with PIR technology takes place prior to cell lysis while the proteins still reside within their native environment or cellular location. Therefore, the potential to obtain useful topological information on membrane proteins is a compelling aspect of this approach. As such, PIR technology enabled the first observation of the OmpA dimer in vivo and the visualization of the interacting interface. Among these data, the most interesting result is that a homodimeric cross-linked species with two identical peptides containing lysine 338 was observed (Fig. 5) and validated (supplemental Fig. 5A online). Previous efforts to purify outer membrane protein complexes from E. coli resulted in identification of OmpA dimer bands on non-denaturing gels (32). Folding studies of OmpA have also indicated higher order structures may exist (43). We also observed OmpA dimer bands with anti-OmpA Western blot analysis of samples from cross-linked E. coli cells (supplemental Fig. S6 online). The present PIR data are first to unambiguously demonstrate that OmpA can form homomultimeric To further evaluate this result, SymmDock (17) was used to compute the dimeric structure of the OmpA periplasmic domain. The model structure of the C-terminal domain was used as input in Symmdock which was used to compute putative structural orientation of the dimer. The top 100 models from docking results were evaluated against the cross-linking constraints from PIR studies. Using 35 Å as a distance threshold (supplemental Fig. S7) for cross-linked sites, 92 of the model structures were eliminated. Furthermore, only two very similar model structures remained after testing for accessibility of all reactive lysine residues observed in cross-linked pairs (discussed below). PIR data, together with computational analysis allows prediction of in vivo dimeric model structures of OmpA periplasmic domain ( Fig. 5 and supplemental Fig. S8). In addition to the homodimer cross-link, three other OmpA cross-links were also identified and validated (supplemental Fig. S5B-D). Among these cross-linked species, two were considered to be intra-protein, and the other one was ambiguous and could be either intra-or inter-protein cross-linked ( Fig. 5D and supplemental Fig. S6).
It is important to consider this new topological data in the context of previous results on OmpA. First, OmpA has been demonstrated to function as a porin with temperature-de-pendent pore size (50,51). However, the smaller 8-strand beta barrel observed at lower temperature has no free pathway for the ions to get through the outer membrane and the ion pair-gated model derived from this structure cannot be used to explain the temperature effect (52). Second, single channel conductance measurements (50) suggest full length OmpA can form a 16-strand beta barrel at physiological temperatures whereas truncated OmpA containing only the N-terminal domain yields only smaller channels. Incorporation eight additional beta strands from the C-terminal domain has been proposed to be the origin of the larger pore structure (44) although this was not supported by secondary structural measurements (45). The cross-linking data presented here show that OmpA can exist as a dimer in cells. Thus a dimeric beta barrel structure may be present, where 8 strands from each OmpA monomer combine to yield a larger channel. This has also been observed with other beta barrel containing multimeric proteins like Skp. Interactions within the Cterminal domain may be important for dimer stabilization, and if so, achieved crystal structures to date would fail to show the dimeric structures since full length OmpA crystals have not yet been obtained. Furthermore, this result is consistent with the in vitro observations that show larger OmpA channel formation only when the C-terminal domain is present (51).