Cell-free Determination of Binary Complexes That Comprise Extended Protein-Protein Interaction Networks of Yersinia pestis*

Binary protein interactions form the basic building blocks of molecular networks and dynamic assemblies that control all cellular functions of bacteria. Although these protein interactions are a potential source of targets for the development of new antibiotics, few high-confidence data sets are available for the large proteomes of most pathogenic bacteria. We used a library of recombinant proteins from the plague bacterium Yersinia pestis to probe planar microarrays of immobilized proteins that represented ∼85% (3552 proteins) of the bacterial proteome, resulting in >77,000 experimentally determined binary interactions. Moderate (KD ∼μm) to high-affinity (KD ∼nm) interactions were characterized for >1600 binary complexes by surface plasmon resonance imaging of microarrayed proteins. Core binary interactions that were in common with other gram-negative bacteria were identified from the results of both microarray methods. Clustering of proteins within the interaction network by function revealed statistically enriched complexes and pathways involved in replication, biosynthesis, virulence, metabolism, and other diverse biological processes. The interaction pathways included many proteins with no previously known function. Further, a large assembly of proteins linked to transcription and translation were contained within highly interconnected subregions of the network. The two-tiered microarray approach used here is an innovative method for detecting binary interactions, and the resulting data will serve as a critical resource for the analysis of protein interaction networks that function within an important human pathogen.

Plague is an infectious disease of catastrophic historical epidemics that continues to be a contemporary concern to public health, because of the global distribution of the etiological agent, Yersinia pestis. Reoccurring outbreaks of plague are frequent in Madagascar (1), and episodic infections are reported annually in the United States (2). Spread by rodent fleas, bubonic plague can be successfully treated with antibiotics if diagnosed early, whereas pneumonic plague can progress rapidly to death within 24 h of infection. Although most isolates are considered sensitive to standard treatments, an antibiotic resistant strain of Y. pestis was reported (3). The highly virulent plague bacillus evolved from the closely related species Yersinia pseudotuberculosis that usually causes a mild gastroenteritis in humans or a tuberculosis-like lung infection in animals (4). The chromosome of Y. pestis strain CO92 encodes ϳ3885 open-reading frames (ORFs) 1 , whereas an additional 181 are expressed by three plasmids. Although the theoretical proteome of Y. pestis can be predicted by analysis of genomic sequences, less than 40% of the ϳ4500 ORFs have been confirmed by direct detection of protein products (5). Further, the composition of the bacterial proteome varies in response to the fluctuating environmental signals that occur during the process of infection. Although the proteome describes the sum of all proteins within the organism, interaction networks are useful for representing the various intertwined roles of individual proteins that contribute to metabolism, signaling, structure, movement and other cellular pathways. The basic building blocks of proteomic interaction networks are binary protein-protein interactions (PPi). Proteome-scale PPi studies were described for E. coli (6 -11), and a few gram-negative pathogens (12)(13)(14)(15), although only limited data for Y. pestis and other highly virulent bacteria currently exists.
The loss of any interaction node that is a critical component of virulence, housekeeping and other cellular functions will result in greatly attenuated or avirulent bacteria (16). Consequently, PPi networks are potential targets for the development of new antimicrobials. Yeast two-hybrid (Y2H) and affinity purification coupled with mass spectrometry (AP-MS) are the most widely used experimental methods for highthroughput detection of PPi (6,7,17). Results from AP-MS studies favor the identification of stable multiprotein complexes with slow off-rates, whereas reporter gene activation in the Y2H assay also detects transient interactions that may have fast kinetics. Yet, results from Y2H and AP-MS studies cannot differentiate binary interactions from protein aggregates, and provide no information concerning kinetics of complex formation or stability. Further, there is often little agreement between independent studies (6,7,17), and false discovery or false negative rates are difficult to determine. As an alternative approach, we used a recombinant library of proteins representing 3552 ORFs that are encoded by the chromosome and plasmids of Y. pestis as a cell-free experimental platform to examine binary PPi and model interaction networks. The recombinant library included proteins that have no known function, and products from predicted ORFs that have no experimental proof of expression by Y. pestis. Based on the sequenced genomes of KIM and CO92 strains of Y. pestis, proteins were produced by in vitro translation, purified, and printed on microarrays. Solution-phase proteins were used to serially detect binding complexes with the protein microarrays and to measure interaction kinetics. We demonstrate that the experimental data can be used to build models of interaction networks for queries of specific functions.

MATERIALS AND METHODS
Y. pestis Proteome Microarrays-The Y. pestis protein microarray was constructed as described previously (18). Briefly, 3968 Y. pestis ORFs cloned into pENTR221 were sequenced, fully characterized, and recombined into the expression vector pEXP7-DEST GST using Gateway cloning methods (Life Technologies, Carlsbad, CA). The Y. pestis proteins were expressed in vitro using Expressway TM (Life Technologies) cell-free E. coli extract system and isolated by GSTbased affinity purification. Proteins that passed quality control criteria of quantity, solubility, and stability (3552 proteins out of 4146 total ORFs), representing ϳ85% total proteome coverage, were printed in microarrays on glass slides coated with a thin film of nitrocellulose (Gentel Biosciences, Madison, WI). The microarrays passing additional quality control measures for printing, as previously described (18), were cryo-preserved (Ϫ20°C) until use. Larger quantities of recombinant Y. pestis proteins were produced in E. coli BL21-AI cells (Life Technologies) for probes to determine PPi. The probe proteins were affinity purified on GSTrap HP columns by FPLC (AKTA, GE Healthcare Life Sciences, Pittsburg, PA), and stored (Ϫ20°C) in a final glycerol concentration of 25%. Insoluble proteins were recovered using Inclusion Body Solubilization Reagent (Thermo Scientific, Waltham, MA) followed by protein refolding using dialysis with a gradient of decreasing urea concentration, carried out at 4°C. Protein purity and concentration were analyzed with an Agilent Bioanalyzer 2100 Protein 230 kit (Agilent Technologies, Santa Clara, CA).
The PPi probe proteins were biotinylated with a maleimide-activated, sulfhydryl-reactive biotinylation reagent containing an extended spacer arm (EZ Link Biotin BMCC; Thermo Scientific). The conditions used favored biotinylation of all four reduced Cys groups presented by the GST tag, facilitating optimal orientation of the analyte for measuring protein interactions. Biotinylation was quantified by comparison to a standard curve, using a dot blot method employing streptavidin-horse radish-peroxidase conjugate and substrate. The biotinylated proteins were stored (Ϫ20°C) in a final glycerol concentration of 25% by volume. The biotinylated GST Y. pestis recombinant proteins were conjugated to either Streptavidin Alexa Fluor 532 or 647 (Life Technologies) in a 1:1 molar ratio, using 2.5 M of protein per 2.5 M fluorescent dye. A Tecan HS Pro400 (Tecan, San Jose, CA) hybridization station was used to automate the interaction studies with the protein microarrays, and all procedures were performed at 22°C. Dilutions and wash steps used PBS (pH 7.4), 5 mM MgCl 2 , and 0.05% Triton X-100, 5% glycerol, 1% BSA, and 0.5 mM DTT. Briefly, the microarrays were blocked for 1 h in PBS (pH 7.4), 0.1% Tween-20, and 1% BSA. The interaction studies were performed in duplex by combining one probe protein that was conjugated to Alexa Fluor 532 and another probe protein conjugated to Alexa Fluor 635, and immediately transferring for equilibration (1 h, 22°C) on the surface of the Y. pestis protein microarray. The microarrays were washed and dried before further processing.
Proteome Microarray Data Analysis-The microarrays were scanned with a confocal laser scanner (GenePix 4000B; Molecular Devices, Sunnyvale, CA) and analyzed using GenePix Pro 6.0 software. Raw pixel counts were generated by scanning simultaneously at 635 nm and 532 nm, using the highest photomultiplier tube gain setting that did not produce saturated signals, and a power setting of 100%. The scanner settings minimized background signals and were optimal for detecting fluorescence from specific PPi events. Acquired data were analyzed separately for 635 nm and 532 nm wavelengths with the ProtoArray Prospector v5.1 program (Life Technologies) and quantile normalized. Outliers among the replicate protein features on individual arrays were identified using a modified Z-score (median absolute deviation Ͼ3.5) and removed from further analysis. The averaged and condensed normalized data were then log 10 -transformed. Based on the Gaussian distribution in histogram plots obtained for data from each wavelength scan, the significant binding partners were determined by standard deviations from the mean signal (ϳ280 relative fluorescence units (RFU)) for each microarray, using Origin Pro (OriginLab, Boston, MA) v.8.5 software. Interactions were ranked according to the following: high intensity interaction, Ն 3 S.D.; medium intensity interaction, 2-3 S.D.; low intensity interaction, 1-2 S.D.; insignificant signal, Ͻ1 S.D.
Interaction Kinetics-Binding interaction kinetics of binary complexes were determined by surface plasmon resonance imaging (SPRi) of protein microarrays, using a Flexchip instrument (GE Healthcare Life Sciences). The chip surface was cleaned by UV-Ozone treatment (UVO-Cleaner, Jelight, Irvine, CA) for 5 min. A self-assembled monolayer was formed by incubating a gold-coated slide surface with 11-mercaptoundecanoic acid (11-MUA) in ethanol for 2 h. The 11-MUA grafted slides were washed with ethanol, and activated with 200 nM 1-ethyl-3-(3-dimethylaminopropyl) carbodiimide (EDC) and 50 nM N-hydroxy succinimide (NHS). NeutrAvidin (100 g/ml) was coupled to the surface by incubation for 20 min, and surfaces were blocked with ethanolamine (1 M). The Y. pestis proteins were biotin labeled as described above, adjusted to concentrations of 150, 125, 100, 75, 50, 25 g/ml in 1ϫ PBS containing 20% glycerol, and printed in duplicate on the NeutrAvidin immobilized gold surface by using a BioOdyssey TM Calligrapher TM printer (Bio-Rad, Hercules, CA). The 24 ϫ 24 microarray was printed in contact mode using Stealth TM pin SMP-5B (Arrayit, Sunnyvale, CA) with 70% humidity, resulting in 160 m spots. A gasket cover was placed on the slide over the microarray to create a microfluidic chamber, and SPRi data were collected. Surface areas of the printed proteins were assigned for analysis, whereas reference spots were used for correction of bulk refractive index changes and instrumental drift. The microarray surface was blocked in Biacore Flexchip blocking buffer (GE Healthcare Life Sciences). Nonbiotinylated proteins (1 M) were flowed across the surface in running buffer (0.01 M HEPES buffered saline, pH 7.4, 0.005% Tween-20), injected at a flow rate of 500 l/min for 15 min, followed by a 15 min dissociation phase in running buffer. All binding events were performed at 25°C. The microarray surfaces were regenerated after each experiment with 10 mM Gly-HCl pH 2.2 buffer for reuse up to seven times. To determine k a and k d values, a global fit analysis was carried out for a 1:1 interaction model with mass transport corrections, using Flexchip evaluation software Version 2.1 (GE Healthcare Life Sciences) for data analysis.
Interactomes-The sequence composition of protein binding partners was determined by mapping protein interactions as vectors of amino acid triplets. A set of 3,485 experimentally determined PPi from E. coli (8) were used as a positive interaction training set for the machine learning algorithm used to predict Y. pestis PPi in common with E. coli. Using gene ontology annotations with strong evidence codes (19), we collected proteins from E. coli that were found in the cytoplasm, outer membrane or periplasm. Out of this set of proteins, we generated 3536 non-interacting pairs of E. coli proteins, because paired proteins were located in different compartments, as a negative interaction training set. Grouping amino acids in seven classes (20), the frequencies of all 7 ϫ 7 ϫ 7 ϭ 343 combinations of classes in a protein were calculated. Specifically, the frequency of a combination k over all 343 class combinations in a protein i of a given interaction was defined as f ik ϭ n ik iϭ1 343 n il , where n ik was the occurrence of combination k in protein i, scanning over all consecutive amino acid triplets. An interaction between a protein i and j was represented by a 343dimensional vector where each vector unit held the frequency difference of a combination k, ⌬ ijk ϭ ͉ f ik Ϫ f jk ͉. A random forest method was used to evaluate the likelihood of the predicted interactions (21). For each of 1000 decision trees M ϭ ͙ N variables out of all n ϭ 343 triplet combinations and 2 ⁄3 rd of all protein pairs were sampled. Pairs of proteins were classified as interacting if more than half of all trees considered them interacting. Network Analysis-Topological analysis of the network of Y. pestis PPi and random graphs were performed using the Network Analyzer application for Cytoscape (22). Random graphs that were composed of the same number of observed nodes and edges were generated using the Network Analysis Tool (NeAT) (23). Clusters of highly intraconnected nodes within the network were identified using the Cytoscape application MCODE (Molecular Complex Detection) (24). Node proteins within the network were characterized according to KEGG pathways using the Database for Annotation, Visualization and Integrated Discovery (DAVID) (25), and pathway enrichment was determined using the entire Y. pestis proteome as background. Node proteins were also categorized by family according to the ontologybased classification system utilized in PANTHER (26).
Protein Abundance Estimations-E. coli BL21-AI cells were transformed with pEXP7-DEST harboring the GST-tagged recombinant proteins, cultured (37°C) to mid-log phase in a Bioscreen C MBR (Growth Curves USA, Piscataway NJ) with constant shaking. Protein expression was induced for 4 h with 0.2% arabinose, bacteria were harvested, pelleted, and frozen (Ϫ80°C). The bacterial cell pellets were lysed in 50 l of Bacterial Protein Extraction Reagent (B-PER, Thermo Scientific), and the soluble fraction was retained. Glutathioneconjugated magnetic beads (Genscript USA, Inc Piscataway NJ) were used to capture the plasmid-expressed proteins for quantification. The magnetic beads (25 l) were washed twice in 50 l 1ϫ PBS and added to 10 l of serially diluted lysate supernatant (neat, 1:5, 1:25 in 1ϫ PBS). Samples were incubated for 1 h at room temperature with constant agitation. A magnetic plate was used to separate the GSTtagged protein bound magnetic beads from unbound supernatant material. The magnetic beads were washed twice in wash buffer (100 l 1ϫ PBS, 1% BSA) and incubated with rabbit anti-GST antibody (1 g/ml) diluted in wash buffer. Beads were washed twice, and incubated for 1 h incubation with an Alexa 488 conjugate of goat antirabbit IgG (1:1000 in wash buffer). Finally, beads were washed three times, fluorescence was read on a Safire II plate reader (MTX Lab Systems, Inc. Vienna, VA), and protein abundance was estimated in comparison to a standard curve established with a calibrated amount of GST.

RESULTS
Binary Protein-Protein Interactions-We utilized a protein microarray (18) that comprised Ͼ85% of the predicted Y. pestis proteome to identify protein interaction partners (Fig.  1A). The microarrayed proteins were expressed in vitro as recombinant GST-fusions from DNA sequence-confirmed plasmid clones and printed on glass slides that were coated with a thin layer of nitrocellulose, as previously described (18). Because the proteomes of Y. pestis and E. coli share ϳ50% of proteins with Ͼ60% amino acid identity (28,29), we examined two reported AP-MS data sets of E. coli PPi (6, 7) to select protein probes that were common to both species. There were 212 E. coli proteins that were involved in 239 interactions in common between the two E. coli AP-MS studies, and 179 of these proteins were also conserved within the predicted Y. pestis proteome (e-value Ͻ10 Ϫ6 by BLASTp) (28). Genetic clones for each of the 179 Y. pestis orthologs were used to express proteins in E. coli for use as interaction probes, resulting in 118 purified proteins that were stable in solution. The probe proteins were expressed as GST-fusions, and biotinylated for conjugation to streptavidin fluorescent molecules (Fig. 1B). The microarray surface was incubated with a fixed concentration (300 nM) of biotin-labeled analytes, and interactions were detected by laser-scanning fluorescence (Fig. 1B). We used experimental conditions that favored detection of interactions with slow dissociation rates, similar to the stable protein complexes detected by AP-MS or ELISA.
The fluorescent signals from each microarray exhibited a Gaussian distribution (Fig. 1C). A total of 60,604 provisional interactions (supplemental Tables S1-S3)  as low-intensity interactions (n ϭ 49,477). Based on these criteria, the average signals for high, medium, and low-intensity interactions were 6612, 2047, and 939 RFU, respectively. To increase the diversity of recorded PPis, we randomly selected an additional 26 proteins from the high-signal intensity data set to probe the microarray, allowing us to detect 16,762 new binary interactions (77,366 total). Protein pairs were then randomly selected from each of the three binding affinity groups (low-high signal intensity) for examination of interaction kinetics. The GST-tagged Y. pestis proteins were biotinylated and printed onto a NeutrAvidin-coated gold surface ( Fig. 2A). The unlabeled protein analytes were flowed across the microarray surface (Fig. 2B), and the resulting interaction kinetics were assessed by SPRi (Fig. 2C). A total of 1637 interactions were identified, and 444 were successfully modeled (supplemental Fig. S1) to obtain K D values ranging from 0.26 -186 nM (k a ϭ 2.78 ϫ 10 2 -4.95 ϫ 10 5 M Ϫ1 s Ϫ1 ; k d ϭ 1.4 ϫ 10 Ϫ6 -3.23 ϫ 10 Ϫ3 s Ϫ1 ) (supplemental Table S4). A 1:1 Langmuir binding model was used to evaluate kinetic constants, and interactions with more complex binding kinetics were not further considered. In the representative SPRi binding curves shown in Fig. 2C, binding interactions between the hypothetical protein y1097, a putative thioredoxin-like protein, and the hypothetical protein encoding y3300, which is a putative ligase, display a relatively fast association rate and moderate dissociation rate, resulting in a K D of 13.1 nM (Fig.  2C, top). In contrast, the complex of methyltransferase protein MenG and the transcription termination factor Rho (Fig.  2C, bottom) is formed by an intermediate-level association rate that is countered by a very slow dissociation rate, resulting in a more stable high affinity interaction (K D ϭ 2.66 nM). Approximately 32% of the SPRi interactions were also observed in PPi detected with the higher-content microarray and labeled protein probes (Fig. 1). Furthermore, we detected 46% of the orthologous Y. pestis interactions that were predicted from the previously reported E. coli AP-MS studies (6,7).
Operon Organization of Interacting Y. pestis Proteins-Because bacterial proteins that form functional units are often clustered into operons that are regulated by a single promoter, we examined the possibility that Y. pestis operons were enriched for binary interactions. Approximately 52% (2064 ORFs) of Y. pestis genes are located in 743 operons (supplemental Table S5), varying from 2-11 genes (average ϳ3 genes) per transcriptional unit, consistent with reported estimates for other bacteria (30). Examining 13,633 interac- GST-tagged proteins were affinity purified, and quality control (QC) of each protein was evaluated based on protein stain and Western blotting using anti-GST antibody. Approximately 85% of the 4164 potential ORFs from Y. pestis were arrayed on to nitrocellulose-coated glass slides and visualized using a rabbit anti-GST antibody bound to Cy5-labeled anti-rabbit antibody (bottom). B, Specific protein-protein interaction (PPi) binding events were detected as fluorescence signal, originating from analyte proteins conjugated to fluorescent tags (red circles). Example control proteins, as well as a specific PPi, are shown (yellow and green boxes). C, Normalized fluorescent signals from each whole-proteome Y. pestis array exhibited a Gaussian distribution when plotted as a histogram and significant binding partners were determined based on standard deviation (S.D.) away from the mean signal of each individual array: Ն3 S.D. ϭ high-intensity interaction (dark blue shading on Gaussian and pie graphs, n ϭ 1980), between 2 and 3 S.D. ϭ medium-intensity interaction (medium blue shading, n ϭ 11,653), between 1 and 2 S.D. ϭ low-intensity interaction (light blue shading, n ϭ 66,733), Ͻ1 S.D. ϭ insignificant signal (NS, unshaded). tions between 1175 proteins (high and medium-intensity interactions), we only found 0.3% of experimentally detected interacting pairs of proteins fell within the same transcriptional unit in Y. pestis (supplemental Table S6). We also examined the gene organization for interacting proteins of E. coli that were previously reported (8). Specifically, we observed that only 1.6% of the E. coli interactions fell within the same operon for ϳ5100 interactions with both partners mapping to an annotated transcription unit.
Quality Assessment of the Interaction Network-Because the proteomes of Y. pestis and E. coli share ϳ50% of proteins with Ͼ60% amino acid identity (28,29) we reasoned that the composition of protein interactions of E. coli and Y. pestis are highly similar as well. As a consequence, we applied a machine learning approach to assess the quality of protein interactions in Y. pestis by training a random forest algorithm (21) (Fig. 3) where 3485 previously reported PPi from E. coli (8) were used as a positive interaction training set. Based on corresponding gene ontology (GO) terms with strong evidence codes (19), we compiled a negative interaction training set of 3536 noninteracting E. coli pairs of proteins that were experimentally located in different cellular compartments (Fig.   3, cytoplasm, outer membrane, or periplasm). Each pair of proteins was mapped as a vector of amino acid triplet frequencies (20) by grouping amino acids in to one of seven groups based on physicochemical properties, and by considering any three contiguous amino acids as a unit. In this approach, amino acids within the same group likely involve synonymous mutations because of their similar characteristics. In particular, the classification method considers the dipoles and volumes of amino acid side chains, assuming that electrostatic and hydrophobic interactions direct PPi. Each amino acid triad can be differentiated based on the classes of amino acids it contains, therefore important information concerning the PPi is retained. To determine the classifiers ability to distinguish interacting from non-interacting protein pairs, we trained our random forest with these training sets. Using classification thresholds as previously described (31), a receiver operating characteristic (ROC) curve was constructed based on a "held out" set of E. coli training data (Fig. 4A). For Ͼ50% of all trees classifying a given interaction as positive, we obtained a true positive rate of 88.5% and a false positive rate of 1.6%, indicating high model performance. We next determined the frequency distributions of the fraction of trees that confirmed the measured SPRi and fluorescent microarray interactions. SPRi results were confirmed for 78.4% of interactions (Fig. 4B). Futhermore, we measured confirmation rates of 68% for high, 51.0% for medium, and 46.7% for low-intensity interactions that we obtained from the fluorescent microarray results. Using data sets with the highest level of confirmed interactions (Ͼ50% for SPRi and fluorescent microarray interactions), we assembled a network of interactions of Y. pestis, consisting of 314 proteins and a total of 2344 interactions (Fig. 4C). Such a web of interactions formed a single connected network component with no isolated nodes. Furthermore, the network features a minority of highly connected nodes, while the majority of nodes have few interaction partners, as exhibited by a node degree distribution that fits a power-law (Fig. 4D) (24,32). The network architecture suggests that the underlying network is more robust to perturbations and random loss of nonhub proteins (32). The networks average clustering coefficient C i , defined as the ratio between the number of edges linking nodes adjacent to a given node i and the total possible number of edges (33), is high compared with a random network that consists of the same number of nodes and edges (C i,Y. pestis ϭ 0.232; C i,random ϭ 0.043).
We investigated the substructure of the interaction network by functional annotation clustering of the 314 proteins comprising the set of protein interactions. We identified 11 KEGG pathways (25) that were statistically enriched (Fisher exact p value Յ0.04) in the network by using the whole Y. pestis proteome as background ( Fig. 5 and supplemental Table S6). The largest number of binary interaction proteins (n ϭ 26) belonged to the ribosomal complex, confirming the direct physical contacts predicted within this functional module (

. Application of a machine learning algorithm reveals a network of bacterial protein interactions that exhibits topological features of biological networks.
A, A receiver operating characteristic (ROC) curve was constructed based on the performance of the trained RFA on a held out test set of E. coli protein pairs. Provided that half of all trees classified a given interaction as positive, we obtained a TPR ϭ 88.5% and a FPR ϭ 1.6% (square). In (B), we determined the frequency distributions of the fraction of trees that confirmed measured SPRi and fluorescent microarray (FMA) interactions. The dashed line represents our classification threshold when more than half of all trees indicated a positive given interaction. In particular, we observed that SPRi and FMA high-intensity interactions largely were confirmed by our computational approach. C, The Y. pestis network of protein-protein interactions exhibits a high average clustering coefficient (C i ϭ 0.232) compared with a random network comprised of the same number of nodes and edges (C i ϭ 0.043), and has a small radius and characteristic path length. D, The node degree distribution of the network (filled circles) approximates a power-law distribution typical of scalefree networks, but not random networks (open squares). module connections were found for a subset of functionally associated groups of nodes (Fig. 6B-6F), whereas several previously reported interactions were recapitulated in the Y. pestis PPi network (Fig. 6B, 6D, 6E) (35)(36)(37). Of the annotated proteins within the network (n ϭ 218), one-quarter could be mapped to a protein class (Fig. 7) based on biological function ontology classification (26). The majority of proteins belong to a single protein class, but there are several multiclass proteins as well.

Orphan and Hypothetical Proteins Within the Y. pestis PPi
Network-Approximately 30% of ORFs represented in the network encode orphan proteins with no known biochemical function, or are hypothetical proteins predicted solely by gene annotation algorithms that identified similarities with another hypothetical or expressed protein (Fig. 5 and 7). The abundance of orphans in the PPi network (supplemental Table S8), as well as the large number of direct physical contacts, suggests that many of these proteins are essential to the integrity of the network, although direct interactions between unique orphan proteins were not common. We observed that 16% (n ϭ 679) of the total predicted Y. pestis proteome consists of unique orphan ORFs found only in this pathogen, based on comparisons of genomes of Y. pestis KIM10ϩ, Y. pseudotuberculosis YPIII, and E. coli K-12 substr. W3110. We note that many of the ORFs used to synthesize products for inclusion in the microarrays encoded proteins that have never been shown to exist in nature. However, 22 orphan proteins that were previously identified as targets of antibody responses to Y. pestis (supplemental Table S9) (18) were associated with fluorescently detected interactions. We further note that orphan proteins recognized by antibodies from acutely infected rhesus macaques were involved in more connections within the Y. pestis PPi network in comparison with all other antibody-target orphan proteins, though the significance of this observation is not clear.
Modular Topology of the Y. pestis PPi Network and Multiassembly Connections-We identified clusters of highly intraconnected nodes by clustering coefficients using the MCODE algorithm (24). Heavily weighted nodes comprised clusters in the network that may represent macromolecular protein assemblies or functional modules (Fig. 8). Although biological functions of orphan proteins are not clear (Fig. 8A), their prevalance within the clusters provides further evidence that   A total of 314 proteins comprise the Y. pestis interaction network, of which ϳ70% are annotated, and 25% were successfully mapped to a protein family, based on biological functional ontology classifications (26). The functionally classified proteins belong to diverse biological categories and highlight the importance of protein-protein interactions in various metabolic and cellular pathways. these proteins are essential to the network integrity. For example, clusters comprising 264 interactions between 24 proteins (Fig. 8A), and 155 interactions between 28 proteins (Fig.  8B), contain annotated node proteins with functions involved in either transcription or translation. Network connections between transcription and translation clusters as well as downstream biochemical pathways were also evident, suggesting that there are direct interactions between the macromolecular assemblies involved in these essential cellular processes.
Relationships Between Cellular Proteins, Interactions, and Network Connections-It has been suggested that expanded surface areas and conformational flexibilities of intrinsically disordered segments of proteins permit interactions with multiple binding partners (38). To explore the potential contribution of intrinsically disordered proteins to Y. pestis PPi, we assessed the secondary structure and network connections of a statistical sampling of proteins (44 randomly selected) within the larger network of 13,633 interactions between 1175 proteins, which includes core interactions and those that may be unique to Y. pestis. We identified intrinsically disordered regions within the proteins (39), ranging from completely structured to 33% disordered. The number of interaction partners within the network (supplemental Fig. S2) from this limited data sampling did not correlate with disorder of proteins. However, these results are only a preliminary snapshot, and further detailed studies will be required to draw a definitive conclusion.
The primary determinants of binary complex formation within the bacterium are interaction affinities, protein concentrations, compartmentalization within the cell, and protein crowding (40). Total protein concentrations are controlled by rates of transcription, translation, or degradation, and are regulated by the cell in response to environmental signals. To assess perturbations of the Y. pestis PPi network, the steadystate levels of select Y. pestis proteins were measured and the potential effects of changes in transcription were simulated. The 171 Y. pestis ORFs examined were individually expressed in E. coli from plasmids under the uniform control of a T7 promoter, allowing us to measure protein concentrations that were independent of most bacterial regulatory controls except intrinsic rates of translation. The plasmid transformed host cells were cultured at 37°C to mid-log phase, and concentrations of Y. pestis protein were measured by a quantitative immunoassay. The protein abundance levels produced under these conditions were normally distributed (Fig. 9A, horizontal histogram), whereas apparent protein mass (range of 32-195 kDa) did not appear to impact protein abundance in cells. The Y. pestis ORFs we examined were involved in 316 binary interactions, and 90% were found to have protein levels that exceeded the experimentally determined K D . To approximate the altered protein concentrations that occur during infection, we used mRNA transcription data from a previously reported study of Y. pestis KIM5 replicating in macrophages (41). We compared protein abundance levels with transcription levels of Y. pestis mRNA (Fig. 9A, vertical histogram) during cell infection, assuming that the average signal intensity of mRNA from control bacteria cultured without cells correlated with the abundance of transcripts within the population of bacteria. Based on the comparison that is illustrated in Fig. 9A, overall levels of gene-specific mRNA appeared to be independent of protein levels. A comparison of steady-state protein abundance levels and changes in mRNA transcript levels 1.5 h and 8 h after infection of cells (Fig. 9B)  FIG. 8. Highly-interconnected clusters represent potential connections between macromolecular assemblies. Two clusters of highly intra-connected node proteins (A and B) were identified within the Y. pestis network (24). Biologically relevant enrichment could not be identified within the clusters (25) because of a large number of poorly annotated orphan proteins (gray nodes). Clusters A and B contain a large number of proteins involved in either transcription or translation, which may represent binary connections within these molecular assemblies.
exhibited Ͻ2-fold change in transcription levels regardless of the time point post-infection, in comparison to bacteria cultured without host cells. These slight variations are unlikely to significantly perturb overall protein abundance levels, and therefore may have little effect on potential PPi. In contrast, several proteins exhibit Ͼ2-fold changes in transcription levels. For example, the mRNA encoding the cold shock-like protein CspI (y0224) exhibited ϳ7-fold increase between 1.5-8 h post-infection (Fig. 9C, red). Similarly, the methyltransferase ribosomal protein L11 (PrmA) shown in Fig. 9C (blue) exhibited a 6.8-fold increase in transcript levels by 8 h but only a 1.2-fold increase at 1.5 h post-infection, suggesting that binary interactions for both CspI and PrmA may increase later in infection. Alternatively, a subset of proteins exhibit significant decreases in mRNA levels in comparing the 1.5 h and 8 h post-infection time points to transcript levels of Y. pestis cultured without host cells. For example, the y0862 locus encodes a transcriptional regulator of sugar metabolism and exhibits an 8-fold increase in transcript levels at 1.5 h after infection, but only a 3-fold increase at 8 h (Fig. 9C, green), suggesting that y0862 protein abundance, and also PPi, may decrease at later time points post-infection. Overall, time-dependent changes after infection were observed for ϳ10% of proteins, although the impact of these changes in mRNA transcription and protein abundance on PPi will require additional study. We also noted that transcriptional changes were not biased toward lower abundance proteins (Fig. 9B), and no correlations were observed between protein or transcript abundance and interaction affinities (supplemental Fig.S1B and S1C). DISCUSSION We report the first large-scale analysis of binary protein interactions for the highly pathogenic bacterium Y. pestis. A microarray encompassing 85% (3,552 proteins) of the bacterial proteome facilitated the cell-free detection of Ͼ77,000 binary complexes, and kinetic interaction data for Ͼ1600. Networks formed by Y. pestis binary interactions were enriched for regulatory and metabolic pathways that are essential for cell maintenance and survival. We also noted direct connections between ribosomes and RNA polymerase, functional protein clusters, and other multisubunit assemblies. Approximately 30% of proteins within the Y. pestis network were orphan gene products (8), suggesting prospective functions for many of these previously uncharacterized and diverse proteins. The networks inferred from these newly described binary protein interactions are roadmaps of dynamic cellular processes that will be useful for systems biology studies, and provide an extensive number of targets for the development of antimicrobials. New molecular targets for therapeutic interventions are needed to counter the growing threats caused by antibiotic resistance among gram-negative pathogens and the potential dissemination of highly patho- genic bacteria by disease outbreaks or through potential acts of bioterrorism.
In contrast to results from cell-based studies, binary interactions that are measured with isolated proteins are more likely to register formation of true complexes (42). Measuring PPi in vitro is the simplest way to detect binary complexes and eliminate potential multimeric complexes, trapping of proteins by macromolecular aggregates, and the protein crowding effects that occur within cells, as well as overcoming the limitations introduced by fusion of bacterial proteins to heterologous anchor proteins, and localization of proteins to nonnatural cellular compartments (43). Stable interactions (i.e. interactions with slow dissociation rates) are readily detected by the microarray method used here, whereas interactions with fast dissociation rates are more difficult to capture. Although estimating false discovery rates for our data is complicated because of lack of a gold standard PPi data set for Y. pestis, as well as the novelty of the experimental methods, there was 32% agreement between the two microarray methods utilized in our experiments, exceeding the ϳ5% of interactions in common between previously reported AP-MS and Y2H studies (6,7,17). We suggest that the majority of the interactions detected in our study are physiologically relevant. Although high-affinity interactions were also reported for proteins that did not coexist within the same Yersinia species (42), these nonphysiological complexes are likely to occur only as rare events. Formation and stability of the reported binary complexes may be controlled by accessibility or fluctuations in the abundance of either protein component within the bacterium. However, the cellular mechanisms that regulate protein levels specifically for control of binary complexes are not clear. Our analysis suggests that transcription levels appear to play a minor role in regulating protein abundance. Taniguchi et al. also reported no correlation between protein and mRNA copy numbers for any given gene in a study of both mRNA and proteins at the singlebacterium level with single-molecule sensitivity (44). We observed that only 0.3% of Y. pestis binary interactions fell within the same operon compared with an equally low frequency of 1.6% for E. coli interactions. We further note a recent report indicating that operon organization facilitates cotranslational association of oligomeric interactions between bacterial luciferase subunits LuxA and LuxB that are encoded by the luxCDABE operon (45). A focused study of operons by employing the methods described here may be useful to identify additional proteins that assemble into translation-controlled complexes.
Reconstructing the networks of protein interactions within cells illuminates the inner workings of cellular machinery and allows us to gain an understanding of complex diseases. New techniques are also needed to detect the dynamic changes in PPi resulting from the spatial and temporal changes that occur inside cells. The network of Y. pestis PPis that we present provides a framework for understanding dynamic bi-ological processes, contains subnetworks that may represent macromolecular assemblies or functional modules, and presents opportunities for selection of protein targets for further evaluation in a hypothesis-driven manner. For example, the experimentally determined protein interactome from Y. pestis may be a rich source of targets for development of new antimicrobials. Most antibiotics that are directed toward commonly exploited biochemical targets, including chloramphenicol and beta-lactams (46), are becoming ineffective because of the spread of resistance. There is a constant need for synthesis of new antibiotic variants that can circumvent common microbial resistance pathways, while future efforts must consider alternative targets that may be less prone to drive selection of resistance. Beta-lactam antibiotics (ampicillin, meropenem, imipenem), fluoroquinolones (ciprofloxacin, moxifloxacin), or aminoglycosides (streptomycin, gentamicin) are commonly used to treat Y. pestis infections (47). Fluoroquinolones and aminoglycosides disrupt the initiation and elongation steps of protein synthesis by binding to the 30S and 50S ribosomal proteins, thereby inhibiting cell growth (48). The experimentally determined protein interactome from Y. pestis may be a rich source of novel targets for development of antimicrobials. In particular, we conclude that network hubs and individual bacterial protein interactions provide opportunities for targeted disruption of essential cell functions or virulence mechanisms. In the assembled network, we observed that functionally important proteins, including potential drug targets, exhibit topological importance. For example, the ribosomal protein Hfq is known to facilitate small RNA-mRNA interactions in gram-negative bacteria, and is a post-transcriptional regulator of the stress response (49). Further, Hfq is required for virulence in Y. pseudotuberculosis (50), and Y. pestis strains are strongly dependent on Hfq for growth at 37°C (51). Within the PPi network, Hfq is a hub protein that interacts with 14 other proteins, including five orphans, offering several potential targets for antimicrobial intervention. Thus, the results presented here provide opportunities for study of virulence pathways and the development of new antimicrobial strategies.