Introduction

Infection is a leading cause of death around the world, which has especially become a growing threat for developing countries. More than 50% of emerging infectious diseases are caused by bacteria or rickettsia, including a large number of drug-resistant microbes [1]. Clinical microbial identification includes the confirmation of bacterial, viral, fungal, and parasitic agents that cause human disease. The precise identification for microbial pathogen provides diagnostic and therapeutic support for the clinical management of patients, surveys local and global epidemiology, as well as helps to prevent the infectious diseases transmission [2]. With the emergence of resistant strains and the release of large amounts of antimicrobials, anti-infection drugs are being severely tested [3]. Microbial resistance to antibiotics is on the rise and, yet, few new antibiotics active against multiresistant bacteria are being explored [4, 5]. New antibiotic agents against microbial infections need to be developed to tide over this crisis [6]. Microbial physiology usually focuses on biofilms and cell-wall biosynthesis, protein biosynthesis, DNA and RNA replication, folate metabolism, cell-surface decoration, and isoprenoid biosynthesis, from which researchers discern microbial molecular behaviors to explore drug targets for antimicrobial therapy [7, 8].

Proteomic studies are currently being greatly engaged in the microbial field [8, 9]. Proteomics could yield not only the qualitative information on proteins, including the identification, distribution, posttranslational modifications, interactions, structure, and function, but also quantitative information, like abundance, distribution within different localizations, and temporal changes in abundance due to synthesis and degradation or both [10, 11]. Microbial proteomic research is aimed at identifying proteins associated with microbial activity. By using gel-free and gel-based methods in combination with liquid chromatography (LC) and mass spectrometry (MS)-based techniques, it has become a formidable tool for deciphering microbial proteins [12]. By identifying the resistance genes towards antibiotics using the comparative proteome analysis for model strains and resistant mutants, microbial proteomic investigation would be helpful not only in instructing the clinical application, but also in the screening of potential bioactive compounds and new antimicrobial drugs [7, 13]. The proteomic analysis for biofilm provides a new idea of an antibiotic cocktail therapy strategy for infection [14]. Current MS-based proteomics technologies have advanced to the point where they are amenable to any biological system [15]. For example, protein isolation approaches, including affinity purification and tandem affinity purification, combined with MS are powerful tools to decipher new protein–protein interactions [16]. The renewed interest in microbial proteome profiling is to reveal the dynamics of microbiome [17]. So, here we summarize and present an overview of proteomic progress towards host–microbial pathogen interactions at different levels, and MS-based microbial identification for clinical diagnosis and antimicrobial therapy as follows.

New insights into host–microbial pathogen interactions by proteomic tools

Interactions between the host and microbial pathogen are crucial for infections caused by microorganisms. Knowledge of these interactions, such as how microbial pathogens display their virulence to the host and develop their resistance, is, therefore, essential in order to better understand and develop strategies to fight infections. The new insights into host–microbial pathogen interactions by proteomic tools will be discussed at different levels, including molecular, single-cell, organism, and population levels (Fig 1).

Fig. 1
figure 1

Host–microbial pathogen interactions from proteomics dissection

Identifying microbial virulence proteins and protein modifications

There are complex and dynamic interactions between pathogens and host immune defense mechanisms during the course of invasive infection, which could determine the fate of the host at the outset of the infection process [18]. Microbial pathogens subvert various molecules for their adhesion and invasion to host cells, infection of neighbor cells, dissemination into host systemic circulation, and evasion of host defense mechanisms. Proteomic profiling of the outer and inner membrane proteins and secreted proteins, such as siderophores, provided new insights into host–pathogen interactions [19]. Virulence proteins, like proteoglycans [20], mediate protein–pathogen interactions, to affect the onset, progression, and outcome of infection [21, 22]. Accumulating evidences indicate that microbial virulence contributes to host response and the outcome of severe infections [23]. For example, Staphylococcus aureus, a Gram-positive commensal bacterium, which has an extensive arsenal of virulence factors, is a major threat to modern healthcare systems. Moreover, some pathogens acquire the capacity to communicate with each other and sense the host’s vulnerabilities [24].

Moreover, protein modifications, including glycosylation, phosphorylation, and acetylation, seem to confer virulence, which can be rapidly identified by MS [2527]. One popular human pathogen, Mycobacterium tuberculosis, has been investigated as a model microorganism using proteomic methods for over a decade; especially, hundreds of putative virulence determinants and posttranslational modifications have been identified [28]. Recently, PhoP, a highly conserved virulence-regulating protein in bacteria, has been confirmed to acetylate at the lysine residue 201 in Salmonella typhimurium, and it is deacetylated by deacetylase CobB enzymatically. Also, its acetylation causes significantly attenuated intestinal inflammation and systemic infection in the mouse model [29]. These findings on bacterial protein modifications ultimately lead to better management of the related disease.

It is noticed that a new branch of host–pathogen interactions at the atomic level is attempted to explore more microcosmic changes. The pioneer, Salgado, tried to determine the assembly and structure of the mature S-layer in Clostridium difficile to discover host–pathogen interactions at the atomic level [30]. The atomic level insight to microbial physiology will greatly enlarge our understanding for infectious disease.

Exploring microbial resistance genes at the single-cell level

A single cell represents the basic unit of a living organism. To avoid heterogeneity in the function and fate of cell populations, it is vital to measure quantity and dynamic processes in single cells [31, 32]. Ultimately, the cellular plasticity depends on changes in protein expression levels and proteomic methods allowed to measure many proteins in parallel [33]. The emergence of resistant strains and the release of large amounts of antimicrobials are serious problems. To fill the multiple gaps that remain in understanding microbial resistance, proteomic tools have also been used to study microbial physiology in response to antibiotic stress [34]. Identification of the resistance genes against antibiotics by comparative proteome analysis of model strains and resistant mutants would be helpful not only in instructing the clinical application, but also in the evaluation of new drugs [7]. Moreover, proteomics technologies have also successfully unraveled the drug resistance mechanisms of microbial biofilms and possibly contributed to the new knowledge for future development in the field [13]. Our recent study identified the changed bacterial proteins of host strain S. aureus in response to daptomycin antibiotic treatment, which disrupts bacterial physiology at multiple levels [35]. And the findings help to develop novel daptomycin derivatives against the upcoming antibiotic-resistant bacterial infection.

Microbial proteomics also offer new approaches to develop potential bioactive compounds. The specific enzymes and proteins, non-ribosomal peptide synthetases and polyketide synthases, which are involved in the synthesis of natural products, are rapidly identified by proteomic analysis [31]. Proteomic methodologies contribute towards determining antimicrobial resistance genes; novel antibiotics designed targeting resistance genes will bring an important breakthrough of antibiotic development [36]. Especially, single-cell proteomics can identify proteins and measure protein concentrations directly in a single cell [37], which is a more powerful tool to pursue microbial resistance dynamics from a complex sample.

Profiling microbial proteome at the organism level

As an important non-invasive body fluid source for diagnostic and prognostic biomarkers of human diseases, urine may contain whole human cells shed into the urine from anatomically proximal tissues and organs (e.g., kidney, prostate, bladder, urothelium, and genitals) [38, 39]. The cells derived from such tissues can viruses and microbial organisms which caused the urogenital tract infection. Identification of the function, abundance, and tissue of origin of such proteins could help to understand the host–pathogen interaction process, including the cause of urinary tract infection, and the human immune response to the infection-associated pathogen(s) [40]. For example, a study has reviewed the proteomic results of Shigella dysenteriae, Shigella flexneri, enterohemorrhagic Escherichia coli, and uropathogenic E. coli [41]. It showed that the nutrient availability and oxygen had dynamic adaptations to changes, including the increased anaerobic respiration and mixed acid fermentation in vivo. And the host model investigated mainly determined the utilization of carbon and nitrogen resources by the bacteria.

Investigating the community physiology and pathogenicity at the population level

Recent studies have shown that the local contact or social population structure of the host may cause large shifts in virulence in pathogen populations as a result of a bistability in evolutionary dynamics [42]. Mixtures of thousands of different phylotypes interact with each other and with their environment [40, 43].

Advances in host–pathogen interactions by proteomic tools at the population level were well illustrated by the development of metaproteomics. The emerging field of metaproteomics aims at analyzing the proteome profiles of mixed microbial communities, from which community physiology and pathogenicity are learnt [44, 45]. Metaproteomics analyzes the abundance and activity of enzymes during nutrient cycling to their phylogenetic origin at the protein level [46, 47]. Metaproteomics opens a door to capture the natural products from uncultivated microorganisms into model production host strains, recognizing the metabolic spectrum of microbes that are not fully expressed in laboratory culture and which could not be sampled by classical means [40, 46]. Multispecies bacterial biofilms of the catheter were dissected by a metaproteomics approach [48], which unraveled the bacterial community structure and function of the related biofilm, elucidating the interplay between bacterial virulence and the human immune system within the urine.

Metaproteomics technology has made a direct impact on our understanding of microbial diversity, ecology, and secondary metabolism, which would provide an efficient guide to the access of numerous non-culturable microorganisms for their associated prosperity for potential applications in clinical biomarkers screening and natural product antibiotic discovery. One group has adopted shotgun metaproteomic approaches combined with metagenomics to identify potential functional signatures of Crohn’s disease in stool samples [49]. Their study revealed the genes, proteins, and pathways that primarily differentiated subjects with Crohn’s disease in the ileum from the healthy patients and underscored the link between the gut microbiota and functional alterations in the pathophysiology of Crohn’s disease, aiding the identification of novel diagnostic targets and disease-specific biomarkers. Similarly, metabolomics has also been applied to discover the biomarkers of hepatocellular carcinoma [50] and catheter-associated urinary tract infections. For example, proteins related to pathogenicity and resistance/survival, beta-lactamase and TetR, are detected by metaproteome analysis, which may assume special relevance in terms of pathogenesis and resistance to host defenses and treatment [45].

More and more investigations into pathogens focus on gut microbiome and human health [51, 52]. Indeed, the microbiome is intrinsically complex, with many important functions. Mammalian gut microbiota is considered to be a novel type of “organ” [53, 54]. A more recent article has shown how fundamentally important the intestinal bacteria are to the rest of our mental and physical health, affecting almost everything from our appetite to our state of mind [55]. Study in microbial populations opens up a new research area in which researchers can get more relevant details. With this trend, the White House Office of Science and Technology Policy, in collaboration with federal agencies and private-sector stakeholders, announced the National Microbiome Initiative (NMI) on May 13, 2016. The NMI will launch with a combined federal agency investment of more than $121 million in fiscal year 2016 and 2017 funding for cross-ecosystem microbiome studies, aiming to foster the integrated study of microbiomes across different areas, such as healthcare, food production, and environmental restoration.

However, there are still some critical obstacles that need to be addressed. Proteomics identified peptides by matching MS/MS spectra against theoretical spectra of all candidate peptides represented in a reference protein sequence database [56]. The subsequent inference of the protein identity and protein quantification using the sequences and abundances of the identified peptides is based on a reference protein sequence database, such as Ensembl, RefSeq, and UniProtKB [57]. Nevertheless, these databases may not contain all the peptides and many peptides may not present in any reference database. Besides, peptides may contain mutations and may represent novel protein coding loci or alternative splice forms [58]. Alternatively, the proteogenomics approach was introduced in 2004 [59], using proteomic data derived from MS to improve and refine genome annotation. A number of automated softwares for proteogenomic analyses have been developed for the integration of MS-based proteomic data into genome databases [6063]. For example, a one-stop open-source software termed GAPP, which applies the target-decoy search strategy to calculate the false discovery rate (FDR) for all employed algorithms’ results, provides a large-scale posttranslational modifications analysis on a proteome-wide level against prokaryotes [64].

MS-based proteomics progresses on microbial identification and antimicrobial therapy discovery

MS-based methods for rapid identification of clinical microbials

Traditional identification of bacterial isolates has long relied on a combination of biochemical properties such as oxygen requirement, Gram staining, carbohydrate metabolism, and the presence of specific enzymes [3]. Nowadays, MS-based proteomic approaches have been used regularly in routine clinical diagnostic procedures, including the comprehensive characterization, classification, and identification of microorganisms [6567]. Matrix-assisted laser desorption/ionization time-of-flight MS (MALDI-TOF MS) has been broadly adopted by many clinical microbiology laboratories over the past decade [68, 69]. The general schematic for the analysis of microbiological isolates and clinical material is illustrated in Fig. 2. MALDI-TOF MS is a very reproducible and reliable tool for microbial identification and can identify bacterial isolates in a few minutes and with low costs, with high efficiency from both a diagnostic and a cost-per-analysis point of view [70, 71]. Currently, a commercial MALDI-TOF MS system (VITEK® MS) has been approved by the U.S. Food and Drug Administration after extensive and successful clinical trials predominantly [72].

Fig. 2
figure 2

A simplified illustration for the general analysis of clinical microbials by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). MALDI-TOF MS allows the identification of microbial pathogens cultured on agar, in blood culture bottles, or directly from urine samples. After being spotted on the plate, the sample is covered with a matrix and then desorbed and ionized by a laser to generate a specific fingerprint. To improve the spectral generation, formic acid and ethanol-based methods are optional. A fingerprint pattern is searched against a microbial standard library for the most matching spectra

Two-dimensional gel electrophoresis (2-DE), which is no longer the exclusive separation tool used in the field of proteomics but still offers the highest resolution in protein separation, was typically combined with MALDI-TOF MS for microbial protein identification by peptide mass fingerprinting [7375]. Moreover, in combination with isolation techniques, MALDI-TOF MS can be used to identify bacteria directly from blood culture bottles and urine samples [72, 7678]. The MS identification of Candida species directly from blood culture bottles within 30 min was concordant with the conventional culture-based method for 95.9% of C. albicans and 86.5% of Candida non-albicans [72]. Moreover, mass cytometry, namely flow cytometry coupled with mass spectrometry, has been applied to rapidly process urine samples [77, 78].

Stable isotope labeling with amino acids in cell culture (SILAC), a widely used in vivo metabolic labeling method, incorporates a stable isotope into the proteins in vivo by adding an isotope like 13C, 15N, or 18O as salts or amino acids to the growth media [79]. Of course, new SILAC-based approaches have been updated to improve identification efficiency. Through the E. coli cell-free protein expression system, named PURE (protein synthesis using recombinant elements) [80], the preparation of stable isotope-labeling reference peptides is performed in a 96-well plate within a short period. This SILAC labeling system, based on the reconstituted E. coli translation machinery, offers a general and rapid cell-free SILAC approach, which is also applicable for microbial MS identification. With extensive modification of the SILAC method, a pulsed SILAC (pSILAC) has been developed to monitor modest changes of proteins during de novo protein synthesis by metabolic pulse labeling of cells using two different heavy isotopic forms of arginine and lysine [81]. Meanwhile, the triple SILAC method, accomplished by SILAC in a triple labeling format (Fig 3), allows to study proteins derived from three samples or the time dimension of the proteome [82, 83]. These methods widely broaden the scope of SILAC-based proteomics.

Fig. 3
figure 3

Stable isotope labeling with amino acids in cell culture (SILAC)-derived metabolic techniques. a pSILAC (pulsed SILAC) incorporated the stable isotope into proteins by adding “heavy” amino acids into the growth media. Cells are experimentally manipulated while growing in “light” (Lys0, Arg0) SILAC medium. Subsequently, the treated and control samples are transferred to distinctly labeled SILAC media, “heavy” (Lys8, Arg10) and “medium” (Lys4, Arg6). After one or a few doublings, samples are harvested and combined at the ratio 1:1. Proteins present before treatment will show up as a “light” peak (L) in the mass spectrograph and can be ignored. The effect of the treatment on protein production rates can be calculated as the ratio of signal at the “medium-heavy” (M) and “heavy” (H) peaks. b In triple SILAC, three samples can be analyzed at the same time, labeled with “light” (Lys0, Arg0), “medium” (Lys4, Arg6), and “heavy” (Lys8, Arg10) SILAC medium. Proteins were then combined and analyzed together by liquid chromatography tandem mass spectrometry (LC-MS/MS). In the MS spectra, each peptide appears as a triplet with distinct mass differences. The ratios between the samples are calculated directly by comparing the differences in the intensities of the peaks. “Lys0, Arg0”: unlabeled lysine and arginine; “Lys4, Arg6”: 2H4-lysine and 13C6-arginine; “Lys8, Arg10”: 13C6 15N2-L-lysine and 13C6 15N4-L-arginine

Quantitative proteomics techniques applied for monitoring antimicrobial therapy

Advanced methods of quantitative proteomics are capable of quantifying proteins and peptides from microbial strains with high resolution, which is available for dynamically monitoring microbial changes and drug efficiency. The selected reaction monitoring (SRM) is a greatly effective method, in which an ion of a particular mass is selected in the first stage of a tandem MS and an ion product of a fragmentation reaction of the precursor ion is selected in the second MS stage for detection [84]. Multiple reaction monitoring (MRM) is the application of selected reaction monitoring to multiple product ions from one or more precursor ions. MRM/SRM techniques are a key operating mode for target compound quantitation with a triple quadrupole MS, providing sensitive and precise quantitative results by monitoring one or several primary ion transitions per targeted compound [8587].

Besides, isobaric chemical labeling approaches employed multiplex isobaric mass tags and, thus, benefited from increased throughput potential (Fig 4). Isobaric tags for relative and absolute quantification (iTRAQ) [88] has become a popular method for quantitative proteomic labeling, in which trypsin-digested peptides were labeled separately with different isotopic variants of iTRAQ tags. Thus, the labeled peptides contained three functional parts: a reporter ion group, a mass normalization group, and an amine-reactive group [89]. By using an iTRAQ-based proteomic analysis, our group addressed the differential bacterial proteome of S. aureus to daptomycin antibiotic exposure [37], from which bacterial NDK and NT5 genes are indicators in response to the antibiotic treatment. This iTRAQ method could also be combined with SRM/MRM to perform a large-scale phosphoproteome analysis [90, 91].

Fig. 4
figure 4

Isobaric chemical labeling method. Isobaric chemical labeling, including isobaric tags for relative and absolute quantification (iTRAQ) and tandem mass tag (TMT), labeled the N-termini and the lysine-side chains in the digested peptides with different isobaric compounds, which have the same mass and chemical structure but contain different numbers and combinations of 13C and 15N isotopes in the mass reporter. Then, the different tags were identified and the relative peptide abundances estimated. Because the masses of all of the tags are the same, identical peptides from different samples are co-eluted and selected by MS. After tag cleavage and another round of MS, the tags are used to quantitate relative peptide intensities, while the peptide fragment ions are sequenced for protein identification. The isobaric chemical labeling based multiplexing comparison is used to compare up to four, six, eight, or ten samples, depending the isobaric tags used (i.e., 4-plex iTRAQ, 6-plex TMT, 8-plex iTRAQ, or 10-plex TMT)

The capability of multiplexing is its unique advantage of isobaric chemical labeling in comparison to the metabolic labeling techniques. Currently, iTRAQ 4-plex and 8-plex labeling reagents are commercially available, allowing to compare 2–8 samples in a single LC-MS/MS analysis. Similarly, another commercially available reagent termed tandem mass tag (TMT) is also ideal for multiplexed protein quantitation [92]. Moreover, TMT has currently been developed to a 10-plex set of tags. When analyzing more than eight samples, TMT 10-plex has the advantage of comparing up to ten samples simultaneously [93, 94]. One recent publication applied SPS-MS3 TMT10-plex analysis to investigate the proteomic alterations in S. cerevisiae resulting from the adaptation of yeast from glucose to nine different carbon sources [95], and over 5000 yeast proteins across ten growth conditions were quantified in a single experiment.

Comparing with the labeling methods, label-free quantification is simpler, more economical, and more applicable, without the requirement for extra preparation steps for labeling and the limitation for materials that cannot be directly labeled [96]. Label-free quantification tries to find the differences in protein abundances by integrating the aligned peak intensity profiles from LC-MS/MS analyses. One previous report compared the membrane proteomes between virulent M. tuberculosis H37Rv and the Mycobacterium bovis BCG vaccine strain by using label-free quantitative proteomics [97]. As a result, 2203 membrane-associated proteins were identified in high confidence and 294 of them showed statistically significant differences of at least two-fold in relative abundance, which is helpful to investigate mechanisms underlying M. tuberculosis H37Rv virulence and identify new targets for therapeutic intervention. After that, the role of the M. tuberculosis SecA2 pathway in exporting solute binding proteins and Mce transporters to the cell wall has been revealed recently [98].

However, some major bottlenecks still remain for this approach, such as the need for measuring samples under strict standard procedures, the restricted specific quantification not suitable as generic tools at a proteome scale, and the modest accuracy of the quantitative readouts not capable of the detection of small changes [99, 100]. Nevertheless, new algorithms like MaxLFQ [99] and aLFQ [101] were developed to solve those problems and to achieve the highest possible accuracy of quantification. Promisingly, sequential window acquisition of all theoretical mass spectra (SWATH) MS, a data-independent workflow that uses a first quadrupole isolation window to step across a mass range, collecting high-resolution full-scan composite MS/MS at each step and generating an ion map of fragments from all detectable precursor masses, is optimally suited to acquire proteome-wide quantitative data over many samples with a high degree of reproducibility, large dynamic range, and low limit of detection [102, 103]. A recent study applied SWATH MS to examine proteomic reorganization of M. tuberculosis during exponential growth, hypoxia-induced dormancy, and resuscitation. A dataset was obtained covering >2000 proteins revealing how protein biomass is distributed among cellular functions and the investigators provided a quantitative description of microbial states [104]. Alternatively, robust, highly parallel procedures to generate peptide mixtures are critical to increase effectiveness. For example, a method termed filter-aided sample preparation (FASP), combined with nano-LC in one dimension followed by online MS/MS analysis on a Q-exactive MS, can distinguish more than 1000 distinct microbial proteins and 1000 distinct human proteins from urine in a single experiment [105, 106].

Perspective

With the development of proteomics and MS technology, more high-efficiency and high-throughput methods can be available for microorganism investigation in the future. Microbial proteomics provides a powerful tool for microbial basic research and translational applications, not only profiling the mechanism and microbial physiology research, but also giving a clue in clinical diagnosis and antimicrobial therapy. Moreover, it will be helpful for a better understanding of microbial community functions and microbial physiology, which provides tools to exploit novel bioactives and new antibiotics for clinical antimicrobial therapy.